# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for the Central Limit Theorem to hold (read the introduction on Wikipedia's page about the CLT carefully: https://en.wikipedia.org/wiki/Central_limit_theorem), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    <li> Think about the way you're going to check for the normality of the distribution. Graphical methods are usually used first, but there are also other ways: https://en.wikipedia.org/wiki/Normality_test
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the Central Limit Theorem, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> First, try a bootstrap hypothesis test.
    <li> Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both frequentist tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach.
    <li> Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What testing approach did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
# Your work here.
df.head(5)

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


In [3]:
len(df)

130

In [4]:
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df.temperature,bins=np.arange(min(df.temperature), max(df.temperature) + 0.5, 0.2))
plt.show()

<Figure size 640x480 with 1 Axes>

## 1.The histogram looks like a bell shape. The distribution is roughly normal.

## 2.The sample size is 130 which is considered to be large enough and the observations are considered to be independent.

## 3.The following is a bootstap hypothesis test.

In [5]:
np.mean(df.temperature)

98.24923076923078

## Since the mean of the temperatures is 98.25 which is less than 98.6, we choose the null hypothesis as H0: the true mean >98.6 against the alternative hypothesis: the true mean <=98.6.

In [38]:
def draw_data_reps(df_col,func,size=130):
    perm_replicates = np.empty(size)
    for i in range(size):
        perm_replicates[i]=func(np.random.choice(df_col,5))
    return perm_replicates

In [39]:
bootstrap_samples = draw_data_reps(df['temperature'],np.mean,130)
bootstrap_samples

array([98.34, 98.32, 98.16, 98.1 , 98.3 , 97.94, 98.08, 98.1 , 97.88,
       98.4 , 97.86, 98.2 , 98.36, 98.74, 98.24, 98.28, 98.56, 98.22,
       98.02, 98.88, 98.18, 98.72, 97.78, 98.44, 98.36, 99.14, 98.4 ,
       97.98, 98.14, 98.52, 97.38, 98.2 , 98.2 , 98.36, 98.36, 97.9 ,
       98.72, 98.54, 98.76, 98.48, 98.74, 98.12, 98.66, 98.26, 98.64,
       98.88, 97.94, 98.2 , 98.28, 98.54, 97.9 , 98.48, 98.42, 98.  ,
       98.36, 98.18, 98.48, 98.68, 98.18, 97.98, 98.48, 98.08, 98.7 ,
       98.28, 98.04, 98.12, 97.98, 98.7 , 98.52, 98.3 , 97.48, 98.22,
       99.12, 97.74, 98.58, 98.14, 97.84, 98.46, 98.54, 98.36, 98.24,
       98.3 , 98.54, 97.88, 98.94, 98.1 , 98.72, 98.22, 98.2 , 98.3 ,
       98.26, 98.34, 98.2 , 98.  , 98.18, 97.38, 98.14, 98.54, 97.58,
       97.94, 98.4 , 98.5 , 98.34, 97.7 , 98.12, 98.62, 97.64, 98.44,
       98.04, 98.72, 98.42, 97.74, 98.5 , 97.78, 98.4 , 98.18, 99.18,
       97.34, 98.54, 98.22, 98.58, 98.94, 97.94, 98.1 , 98.2 , 97.98,
       97.92, 98.06,

In [40]:
translated_temp = df.temperature-np.mean(df.temperature)+98.6
translated_temp = translated_temp.values
translated_temp

array([ 99.65076923,  98.75076923,  98.15076923,  99.55076923,
        98.35076923,  99.55076923,  98.35076923,  99.15076923,
        98.75076923,  98.95076923,  99.15076923,  97.05076923,
        98.55076923,  99.05076923,  98.15076923,  99.15076923,
        98.65076923,  98.55076923,  97.55076923,  99.75076923,
        98.65076923,  98.55076923,  98.95076923,  98.75076923,
        98.15076923,  98.35076923,  98.15076923,  98.55076923,
        98.75076923,  98.45076923,  98.65076923,  97.95076923,
        98.85076923,  98.95076923,  99.65076923,  99.85076923,
        99.45076923,  98.65076923,  98.25076923,  96.75076923,
        98.75076923,  98.75076923,  97.25076923,  97.55076923,
        99.35076923,  98.25076923,  97.75076923,  97.75076923,
        98.25076923,  97.45076923,  99.25076923,  98.65076923,
        98.85076923,  98.95076923,  98.55076923,  98.95076923,
        99.15076923,  98.55076923,  98.55076923,  97.95076923,
        99.45076923,  98.75076923,  98.55076923,  98.95

In [41]:
p = np.sum(bootstrap_samples>=translated_temp)/len(translated_temp)
p

0.3384615384615385

## 33.85% of bootstrapped temperatures > translated temperatures centered at 98.6. We do not have enough evidence to reject the null hypothesis and conclude the temperature is not centered at 98.6.

## Question:Now, let's try frequentist statistical testing. Would you use a one-sample or two-sample test? Why? 
## I would use a one-sample test since we are testing if the sample mean is equivalent to a specific numeric value:98.6. A two-sample test is to test whether two sets of datas have the same mean. The bootstrap method above is a one-sample bootstrap hypothesis testing.

## Question: In this situation, is it appropriate to use the t or z statistic? Ans: z-statistic, since the sample size is large enough(n=130>30)

In [10]:
#import scipy.stats as stats
#print('t-statistic = %6.3f p-value = %6.4f' %  stats.ttest_1samp(df.temperature.values, 98.6))

In [11]:
(np.mean(df.temperature)-98.6)/(np.std(df.temperature.values)/np.sqrt(130))

-5.475925202078115

## We can perform a two-sided z-test. The p-value corresponds to the above z-score is 0. Reject the null hypothesis that the mean is 98.6

## Question:Now try using the other test. How is the result be different? Why?

In [22]:
sample1 = df.temperature.values
sample1

array([ 99.3,  98.4,  97.8,  99.2,  98. ,  99.2,  98. ,  98.8,  98.4,
        98.6,  98.8,  96.7,  98.2,  98.7,  97.8,  98.8,  98.3,  98.2,
        97.2,  99.4,  98.3,  98.2,  98.6,  98.4,  97.8,  98. ,  97.8,
        98.2,  98.4,  98.1,  98.3,  97.6,  98.5,  98.6,  99.3,  99.5,
        99.1,  98.3,  97.9,  96.4,  98.4,  98.4,  96.9,  97.2,  99. ,
        97.9,  97.4,  97.4,  97.9,  97.1,  98.9,  98.3,  98.5,  98.6,
        98.2,  98.6,  98.8,  98.2,  98.2,  97.6,  99.1,  98.4,  98.2,
        98.6,  98.7,  97.4,  97.4,  98.6,  98.7,  98.9,  98.1,  97.7,
        98. ,  98.8,  99. ,  98.8,  98. ,  98.4,  97.4,  97.6,  98.8,
        98. ,  97.5,  99.2,  98.6,  97.1,  98.6,  98. ,  98.7,  98.1,
        97.8, 100. ,  98.8,  97.1,  97.8,  96.8,  99.9,  98.7,  98.8,
        98. ,  99. ,  98.5,  98. ,  99.4,  97.6,  96.7,  97. ,  98.6,
        98.7,  97.3,  98.8,  98. ,  98.2,  99.1,  99. ,  98. , 100.8,
        97.8,  98.7,  98.4,  97.7,  97.9,  99. ,  97.2,  97.5,  96.3,
        97.7,  98.2,

In [23]:
sample2 = np.random.normal(98.6, np.std(df.temperature), 130)
sample2

array([ 98.88060673,  98.28480332,  98.68450733,  98.88909781,
        98.82625396,  97.71768843,  98.33716081,  99.15665608,
        98.58991005,  98.71217593,  98.29168444,  97.96032743,
        99.22691474,  98.84889234,  98.60363806,  98.54149784,
        98.35254123,  98.15632424,  98.48872626,  98.74928273,
        98.60704614,  98.25006912,  97.98914286,  99.75257704,
        99.14697649,  99.07935578,  96.94025669,  98.96040674,
        98.61325421,  98.27237034,  99.18992123,  98.10357155,
        99.32648377,  97.61042185,  98.75971867,  98.91107219,
        98.40468309,  98.42086531,  98.43961913,  98.58698238,
        98.06443091,  99.39407425,  98.99345315,  98.72111829,
        98.45166651,  98.63178019,  99.17372087,  98.58776833,
        98.72794792,  98.5671222 ,  99.5691258 ,  98.73444379,
        98.55266343,  97.89930749,  97.76502243,  97.50114077,
        99.40265613,  98.30147131,  98.9748996 ,  98.29845959,
        98.17234099,  97.09633331,  98.02195667,  99.68

In [24]:
diff = sample1-sample2
diff

array([ 0.41939327,  0.11519668, -0.88450733,  0.31090219, -0.82625396,
        1.48231157, -0.33716081, -0.35665608, -0.18991005, -0.11217593,
        0.50831556, -1.26032743, -1.02691474, -0.14889234, -0.80363806,
        0.25850216, -0.05254123,  0.04367576, -1.28872626,  0.65071727,
       -0.30704614, -0.05006912,  0.61085714, -1.35257704, -1.34697649,
       -1.07935578,  0.85974331, -0.76040674, -0.21325421, -0.17237034,
       -0.88992123, -0.50357155, -0.82648377,  0.98957815,  0.54028133,
        0.58892781,  0.69531691, -0.12086531, -0.53961913, -2.18698238,
        0.33556909, -0.99407425, -2.09345315, -1.52111829,  0.54833349,
       -0.73178019, -1.77372087, -1.18776833, -0.82794792, -1.4671222 ,
       -0.6691258 , -0.43444379, -0.05266343,  0.70069251,  0.43497757,
        1.09885923, -0.60265613, -0.10147131, -0.7748996 , -0.69845959,
        0.92765901,  1.30366669,  0.17804333, -1.08571715,  0.52303159,
       -2.01890946,  0.11303068, -0.73062649, -0.37353783,  1.08

## By the central limit theorem, under the null hypothesis that the mean of df.temperature is 98.6, diff should follow a normal distribution with mean 0.

In [25]:
np.mean(diff)/(np.std(diff)/np.sqrt(130))

-3.3609501330443203

## The above z-statistic corresponds to a p-value of 0.0008 for a two-sided test, we can reject the null hypothesis that the mean of df.temperature is 98.6 if we set the confidence level of to be 99.9%.

## The result is still to reject the null hypothesis of the mean is 98.6 but the p-value is slightly larger. The reason for the larger p-value is that in the previous one-sample test, we actually compared  the orginal sample to a constant 98.6 while in the two-sample test, we compared the original sample to a normal distribution. The variance of the generated comparative sample in the two-sample test makes the difference.

## 4.Draw a small sample of size 10 from the data and repeat both frequentist tests. 

In [42]:
df_sample = np.random.choice(df.temperature,10)
df_sample

array([98.8, 97.7, 97.6, 98.9, 97.9, 97.9, 98.3, 98.7, 97.4, 97.8])

## Question: Which one is the correct one to use? 

## Answer: A one-sample test is the correct one to use. A two-sample test is to test the difference between two statistical quantities. First I will do a one-sample test. Since the statement in part 4 also requests us to "repeat both frequentist tests", I assume this is refered to "one-sample test" and "two-sample test". I will do a two-sample test for completion.

## The following is a one-sample two-sided t-test. Since the sample size is 10 < 30, a t-test is applied. The null hypothesis is H0: the mean is 98.6.

In [44]:
t_statistic = (np.mean(df_sample)-98.6)/(np.std(df_sample)/np.sqrt(10))
t_statistic

-3.100868364730211

## The t-statistic corresponds to a p-value of 0.0127 for a two-sided t-test with 9 degrees of freedom. We will reject the null hypothesis at a 95% confidence level and cannot reject the null hypothesis at a 99% confidence level.

## The following is a two-sample two-sided t-test. A comparison sample needs to be generated first. We are testing if the mean of the difference is zero.

In [45]:
df_sample2 = np.random.normal(98.6, np.std(df_sample), 10)
df_sample2

array([99.55243231, 99.27439344, 98.45319629, 99.29307772, 98.66978306,
       98.81966305, 98.47700862, 98.20945467, 99.00297341, 98.42232248])

In [46]:
diff = df_sample - df_sample2
t_statistic = np.mean(diff)/(np.std(diff)/np.sqrt(10))
t_statistic

-3.8639836471888542

## The t-statistic corresponds to a p-value of 0.0038. We will reject the null hypothesis that the original has a mean of 98.6 at a 99.5% confidence level.

## Question: What do you notice? What does this tell you about the difference in application of the t and z statistic? 

## Ans: The t-statistic gives a higher p-value in both one-sample and two-sample tests than the counterparts of the z-statistic. 

## 5.At what temperature should we consider someone's temperature to be "abnormal"? 
### As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach. 
### Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws. 

## The following is the Bootstrap approach.

In [47]:
bootstrap_samples

array([98.34, 98.32, 98.16, 98.1 , 98.3 , 97.94, 98.08, 98.1 , 97.88,
       98.4 , 97.86, 98.2 , 98.36, 98.74, 98.24, 98.28, 98.56, 98.22,
       98.02, 98.88, 98.18, 98.72, 97.78, 98.44, 98.36, 99.14, 98.4 ,
       97.98, 98.14, 98.52, 97.38, 98.2 , 98.2 , 98.36, 98.36, 97.9 ,
       98.72, 98.54, 98.76, 98.48, 98.74, 98.12, 98.66, 98.26, 98.64,
       98.88, 97.94, 98.2 , 98.28, 98.54, 97.9 , 98.48, 98.42, 98.  ,
       98.36, 98.18, 98.48, 98.68, 98.18, 97.98, 98.48, 98.08, 98.7 ,
       98.28, 98.04, 98.12, 97.98, 98.7 , 98.52, 98.3 , 97.48, 98.22,
       99.12, 97.74, 98.58, 98.14, 97.84, 98.46, 98.54, 98.36, 98.24,
       98.3 , 98.54, 97.88, 98.94, 98.1 , 98.72, 98.22, 98.2 , 98.3 ,
       98.26, 98.34, 98.2 , 98.  , 98.18, 97.38, 98.14, 98.54, 97.58,
       97.94, 98.4 , 98.5 , 98.34, 97.7 , 98.12, 98.62, 97.64, 98.44,
       98.04, 98.72, 98.42, 97.74, 98.5 , 97.78, 98.4 , 98.18, 99.18,
       97.34, 98.54, 98.22, 98.58, 98.94, 97.94, 98.1 , 98.2 , 97.98,
       97.92, 98.06,

In [68]:
np.percentile(bootstrap_samples,0.5)

97.36580000000001

In [69]:
np.percentile(bootstrap_samples,99.5)

99.1542

In [63]:
for i in range(len(bootstrap_samples)):
    if bootstrap_samples[i]<np.percentile(bootstrap_samples,0.5) or bootstrap_samples[i]>np.percentile(bootstrap_samples,99.5):
        print(bootstrap_samples[i])

99.17999999999999
97.34


In [71]:
for i in range(len(df.temperature.values)):
    if df.temperature.values[i]<np.percentile(bootstrap_samples,0.5) or df.temperature.values[i]>np.percentile(bootstrap_samples,99.5):
        print(df.temperature.values[i])

99.3
99.2
99.2
96.7
97.2
99.4
99.3
99.5
96.4
96.9
97.2
97.1
99.2
97.1
100.0
97.1
96.8
99.9
99.4
96.7
97.0
97.3
100.8
97.2
96.3


## In the Bootstrap approach, the .5% percentile and the 99.5% percentile is computed for the Bootstrap sample and a 99% confidence interval is obtained which is [97.34,99.18]. The records that fall outside of this range is considered to be abnormal. 

## The following is the frequentist approach which is based on the central limit theorem and the z-statistic.

In [92]:
np.mean(df.temperature.values)

98.24923076923075

In [96]:
lower = np.mean(df.temperature.values)-2.576*np.std(df.temperature.values)/np.sqrt(130)
lower
#-2.576 is the 0.5% percentile for the standard normal distribution

98.08422092977592

In [97]:
upper = np.mean(df.temperature.values)+2.576*np.std(df.temperature.values)/np.sqrt(130)
upper
#2.576 is the 99.5% percentile fot the standard normal distribution

98.41424060868557

In [100]:
count = 0
for i in range(len(df.temperature.values)):
    if df.temperature.values[i]<lower or df.temperature.values[i]>upper:
        count +=1
        print(df.temperature.values[i])
print('The number of abnormalies is ', count)

99.3
97.8
99.2
98.0
99.2
98.0
98.8
98.6
98.8
96.7
98.7
97.8
98.8
97.2
99.4
98.6
97.8
98.0
97.8
97.6
98.5
98.6
99.3
99.5
99.1
97.9
96.4
96.9
97.2
99.0
97.9
97.4
97.4
97.9
97.1
98.9
98.5
98.6
98.6
98.8
97.6
99.1
98.6
98.7
97.4
97.4
98.6
98.7
98.9
97.7
98.0
98.8
99.0
98.8
98.0
97.4
97.6
98.8
98.0
97.5
99.2
98.6
97.1
98.6
98.0
98.7
97.8
100.0
98.8
97.1
97.8
96.8
99.9
98.7
98.8
98.0
99.0
98.5
98.0
99.4
97.6
96.7
97.0
98.6
98.7
97.3
98.8
98.0
99.1
99.0
98.0
100.8
97.8
98.7
97.7
97.9
99.0
97.2
97.5
96.3
97.7
97.9
98.7
The number of abnormalies is  103


## In the frequentist approach, the .5% percentile and the 99.5% percentile is computed for the original sample and a 99% confidence interval is obtained which is [98.08,98.41]. The records that fall outside of this range is considered to be abnormal. 

## http://www.stat.yale.edu/Courses/1997-98/101/confint.htm uses a t-test to find a 95% confidence interval:(98.123,98.375). 115 records of the data falls outside of the 95% confidence interval based on the t-test.

## 6.Is there a significant difference between males and females in normal temperature? 
### What testing approach did you use and why? 
### Write a story with your conclusion in the context of the original problem.

## I will separate the original sample into two groups according to the gender. Then a two-sided t-test will be performed to determine whether the two groups have the same mean.

In [103]:
df_male = df.loc[df.gender=='M']
df_male

Unnamed: 0,temperature,gender,heart_rate
2,97.8,M,73.0
5,99.2,M,83.0
6,98.0,M,71.0
7,98.8,M,78.0
12,98.2,M,72.0
17,98.2,M,64.0
19,99.4,M,70.0
21,98.2,M,71.0
22,98.6,M,70.0
23,98.4,M,68.0


In [104]:
df_female = df.loc[df.gender=='F']
df_female

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
3,99.2,F,66.0
4,98.0,F,73.0
8,98.4,F,84.0
9,98.6,F,86.0
10,98.8,F,89.0
11,96.7,F,62.0
13,98.7,F,79.0
14,97.8,F,77.0


In [106]:
male = df_male.temperature.values
female = df_female.temperature.values
s1 = np.std(male)
s2 = np.std(female)
t = (np.mean(male)-np.mean(female))/np.sqrt(s1*s1/65+s2*s2/65)
t

-2.3032202891943516

## The degree of freedom is 64. The t-statstic corresponds to a p-value of 0.0245 for a two-sided t-test. We will reject the null hypothesis that both genders have the same mean of temperatures at the confidence level of 95% and fail to reject the null hypothesis at the confidence level of 99%. 

## My conclusion is the actual temperature for a normal human body is not 98.6F but is somewhat lower than that, at approximately 98.245F. Body temperatures in male bodies seem to be a little lower than body temperatures in female bodies.