# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [1]:
import pandas as pd
import numpy as np
import scipy
from scipy import stats
df = pd.read_csv('data/human_body_temperature.csv') #import as dataframe
df.head()

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


#### [QUESTION 1] Is the distribution of body temperatures normal?
* Test the null hypothesis that a sample comes from a normal distribution.


In [2]:
temp_all = np.array(df['temperature']) # get temperature data
scipy.stats.mstats.normaltest(temp_all) # normality test

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

* Null hypothesis is that a sample comes from a normal distribution. Since the pvalue is high (~0.259), we CANNOT REJECT the null hypothesis.
* Body temperatures seem to be normally distributed.

#### [QUESTION 2] Is the sample size large? Are the observations independent?
* We have sample size of 130, which is large enough for t-distribution to be very close to normal.
* In theory, there is no way we can be certain that observations are independent. However, there are ways to check if the observations are correlated in a certain way. For example, if we also have race of age, we can check if temperature and heart rate different across those variables.   

#### [QUESTION 3] Is the true population mean really 98.6 degrees F?

In [3]:
df['temperature'].mean()

98.24923076923078

* The average is 98.25, which is less than 98.6.
* (3.1) We want to use one-sample t-test.
* (3.2) Ideally, we should use t-test because we do not know population standard deviation.
* Nevertheless, with sample size of 130, z-test would also give the almost identical result as t-test.

In [4]:
# Perform t-test with null hypothesis that mean = 98.6
scipy.stats.ttest_1samp(temp_all, 98.6)

Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.4106320415561276e-07)

* Since pvalue from t-test is extremely low, we conclude that sample mean is statistically different from 98.6

#### [QUESTION 4] At what temperature should we consider someone's temperature to be "abnormal"?
* First, we compute margin of error and confidence interval. 

In [5]:
sample_size = len(temp_all)
t_critical = stats.t.ppf( q = 0.975, df = sample_size-1 )

sigma = temp_all.std()/np.sqrt(sample_size) # apply CLT
print("estimated std of sampling mean = " + "{0:.4f}".format(sigma))

margin_of_error = t_critical * sigma
print("margin of error (95% CI) = " + "{0:.4f}".format(margin_of_error))

CI = (temp_all.mean() - margin_of_error, temp_all.mean() + margin_of_error)
print("confidence interval (95% CI) = [" + "{0:.4f}".format(CI[0]) + ', ' + "{0:.4f}".format(CI[1]) + ']')

estimated std of sampling mean = 0.0641
margin of error (95% CI) = 0.1267
confidence interval (95% CI) = [98.1225, 98.3760]


* We are 95% confident that someone temperature should be in the range of [98.14,98.35]

#### [QUESTION 5] Is there a significant difference between males and females in normal temperature?
* First, we check mean of male and female 

In [6]:
by_gender = df.groupby('gender')
by_gender['temperature'].describe()

gender       
F       count     65.000000
        mean      98.393846
        std        0.743488
        min       96.400000
        25%       98.000000
        50%       98.400000
        75%       98.800000
        max      100.800000
M       count     65.000000
        mean      98.104615
        std        0.698756
        min       96.300000
        25%       97.600000
        50%       98.100000
        75%       98.600000
        max       99.500000
Name: temperature, dtype: float64

* Sample mean for female is slightly higher.
* Next, we perform two-sample t-test (95% confidence interval = pvalue of 0.025).

In [7]:
temp_male = np.array(df[df['gender']=='M']['temperature']) 
temp_female = np.array(df[df['gender']=='F']['temperature'])
stats.ttest_ind(temp_male,temp_female)

Ttest_indResult(statistic=-2.2854345381656103, pvalue=0.023931883122395609)

* pvalue is 0.0239 which is lower than 0.025.
* This is an evidence (although quite weak) that female has higher temperature.

* Next, we check if female sample has mean temperature of 98.6

In [8]:
scipy.stats.ttest_1samp(temp_female, 98.6)

Ttest_1sampResult(statistic=-2.2354980796784965, pvalue=0.028880450789682037)

* pvalue is 0.0289 which is higher than 0.025. So female temperature is close to 98.6.

Conclusion: 
1. Our sample indicates that average human temperature is 98.25, which is statistically different from what Carl Wunderlich had previously reported at 98.6.
* However, we cannot reject that female temperature is statistically different from 98.6
2. On average, female has higher temerature than male (98.39 and 98.10 respectively). 