# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Ideas</h3>

<p>In this project, we will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>We will answer the following questions. </p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for the Central Limit Theorem to hold (read the introduction on Wikipedia's page about the CLT carefully: https://en.wikipedia.org/wiki/Central_limit_theorem), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    <li> Graphical methods are usually used to check for the normality of the distribution, but there are also other ways: https://en.wikipedia.org/wiki/Normality_test
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the Central Limit Theorem, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> First, try a bootstrap hypothesis test.
    <li> Now, let's try frequentist statistical testing. Should we use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both frequentist tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach.
    <li> Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that we should use the appropriate formula for one draw, and not N draws.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> Write a story with our conclusion in the context of the original problem.
    </ul>
</ol>


#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
# Your work here.
df.head(5)

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


In [3]:
len(df)

130

In [4]:
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df.temperature,bins=np.arange(min(df.temperature), max(df.temperature) + 0.5, 0.2))
plt.show()

<Figure size 640x480 with 1 Axes>

## 1.The histogram looks like a bell shape. The distribution is roughly normal.

## 2.The sample size is 130 which is considered to be large enough and the observations are considered to be independent.

## 3.The following is a bootstap hypothesis test.

In [5]:
np.mean(df.temperature)

98.24923076923075

## Since the mean of the temperatures is 98.25 which is less than 98.6, we choose the null hypothesis as H0: the true mean >98.6 against the alternative hypothesis: the true mean <=98.6.

In [6]:
def draw_data_reps(df_col,func,size=130):
    perm_replicates = np.empty(size)
    for i in range(size):
        perm_replicates[i]=func(np.random.choice(df_col,5))
    return perm_replicates

In [7]:
bootstrap_samples = draw_data_reps(df['temperature'],np.mean,130)
bootstrap_samples

array([97.96, 98.28, 97.8 , 98.86, 98.76, 98.2 , 98.38, 98.32, 97.84,
       98.1 , 98.  , 98.46, 97.84, 97.62, 98.04, 98.82, 98.32, 98.24,
       98.34, 98.48, 98.58, 98.34, 98.24, 98.28, 98.52, 97.88, 97.9 ,
       98.16, 98.66, 98.98, 98.06, 98.56, 98.3 , 98.42, 98.4 , 98.18,
       98.58, 98.44, 98.04, 98.2 , 98.58, 97.98, 98.06, 98.22, 97.8 ,
       98.22, 98.  , 98.48, 97.86, 98.52, 97.76, 98.32, 98.82, 97.98,
       98.46, 98.26, 98.02, 98.24, 98.42, 98.8 , 98.36, 98.46, 97.84,
       97.9 , 97.96, 97.78, 97.86, 98.4 , 98.04, 98.14, 98.1 , 98.12,
       97.82, 98.08, 98.64, 98.4 , 98.9 , 98.4 , 98.26, 98.14, 98.66,
       97.9 , 98.12, 98.3 , 98.7 , 97.78, 98.86, 98.56, 98.76, 98.74,
       98.98, 98.24, 97.84, 98.4 , 98.12, 98.2 , 97.94, 97.82, 98.48,
       98.3 , 98.06, 98.22, 98.8 , 97.78, 97.98, 98.48, 97.7 , 98.32,
       98.02, 97.54, 98.6 , 97.92, 98.86, 97.68, 98.54, 98.36, 98.58,
       97.88, 98.38, 98.5 , 98.46, 97.6 , 97.96, 98.34, 98.1 , 98.44,
       97.98, 98.28,

In [8]:
translated_temp = df.temperature-np.mean(df.temperature)+98.6
translated_temp = translated_temp.values
translated_temp

array([ 99.65076923,  98.75076923,  98.15076923,  99.55076923,
        98.35076923,  99.55076923,  98.35076923,  99.15076923,
        98.75076923,  98.95076923,  99.15076923,  97.05076923,
        98.55076923,  99.05076923,  98.15076923,  99.15076923,
        98.65076923,  98.55076923,  97.55076923,  99.75076923,
        98.65076923,  98.55076923,  98.95076923,  98.75076923,
        98.15076923,  98.35076923,  98.15076923,  98.55076923,
        98.75076923,  98.45076923,  98.65076923,  97.95076923,
        98.85076923,  98.95076923,  99.65076923,  99.85076923,
        99.45076923,  98.65076923,  98.25076923,  96.75076923,
        98.75076923,  98.75076923,  97.25076923,  97.55076923,
        99.35076923,  98.25076923,  97.75076923,  97.75076923,
        98.25076923,  97.45076923,  99.25076923,  98.65076923,
        98.85076923,  98.95076923,  98.55076923,  98.95076923,
        99.15076923,  98.55076923,  98.55076923,  97.95076923,
        99.45076923,  98.75076923,  98.55076923,  98.95

In [9]:
p = np.sum(bootstrap_samples>=translated_temp)/len(translated_temp)
p

0.2846153846153846

## 33.85% of bootstrapped temperatures > translated temperatures centered at 98.6. We do not have enough evidence to reject the null hypothesis and conclude the temperature is not centered at 98.6.

<b> Question</b>: <p>Now, let's try frequentist statistical testing. Should use a one-sample or two-sample test? Why?</p>
<b> Answer</b>:
<ul> I would use a one-sample test since we are testing if the sample mean is equivalent to a specific numeric value:98.6. A two-sample test is to test whether two sets of datas have the same mean. The bootstrap method above is a one-sample bootstrap hypothesis testing
<\ul> 

<b>Question</b>: In this situation, is it appropriate to use the t or z statistic? 
<b>Ans</b>: z-statistic, since the sample size is large enough(n=130>30)

In [10]:
(np.mean(df.temperature)-98.6)/(np.std(df.temperature.values)/np.sqrt(130))

-5.4759252020785585

## We can perform a two-sided z-test. The p-value corresponds to the above z-score is 0. Reject the null hypothesis that the mean is 98.6

## Question:Now try using the other test. How is the result be different? Why?

In [11]:
sample1 = df.temperature.values
sample1

array([ 99.3,  98.4,  97.8,  99.2,  98. ,  99.2,  98. ,  98.8,  98.4,
        98.6,  98.8,  96.7,  98.2,  98.7,  97.8,  98.8,  98.3,  98.2,
        97.2,  99.4,  98.3,  98.2,  98.6,  98.4,  97.8,  98. ,  97.8,
        98.2,  98.4,  98.1,  98.3,  97.6,  98.5,  98.6,  99.3,  99.5,
        99.1,  98.3,  97.9,  96.4,  98.4,  98.4,  96.9,  97.2,  99. ,
        97.9,  97.4,  97.4,  97.9,  97.1,  98.9,  98.3,  98.5,  98.6,
        98.2,  98.6,  98.8,  98.2,  98.2,  97.6,  99.1,  98.4,  98.2,
        98.6,  98.7,  97.4,  97.4,  98.6,  98.7,  98.9,  98.1,  97.7,
        98. ,  98.8,  99. ,  98.8,  98. ,  98.4,  97.4,  97.6,  98.8,
        98. ,  97.5,  99.2,  98.6,  97.1,  98.6,  98. ,  98.7,  98.1,
        97.8, 100. ,  98.8,  97.1,  97.8,  96.8,  99.9,  98.7,  98.8,
        98. ,  99. ,  98.5,  98. ,  99.4,  97.6,  96.7,  97. ,  98.6,
        98.7,  97.3,  98.8,  98. ,  98.2,  99.1,  99. ,  98. , 100.8,
        97.8,  98.7,  98.4,  97.7,  97.9,  99. ,  97.2,  97.5,  96.3,
        97.7,  98.2,

In [12]:
sample2 = np.random.normal(98.6, np.std(df.temperature), 130)
sample2

array([ 99.04890341,  97.95165936,  97.80674835,  99.42495515,
        99.08927164,  97.690477  ,  98.66592763,  98.82045447,
        98.62765386,  97.84563575,  98.94428354,  98.86536557,
        99.24182472,  98.78158349,  98.81553033,  98.97112087,
        98.247148  ,  99.29852479,  99.11869042,  99.21433902,
       100.654198  ,  97.62832167,  98.72378299,  98.25148873,
        98.17280649,  98.29519184,  97.93417911,  98.97419485,
        99.07985602,  98.39481283,  98.37556377,  99.83258112,
        98.75563539,  97.9444055 ,  98.58447374,  99.61775919,
        98.88916002,  97.85450664,  98.93105505,  98.50321631,
        97.84873772,  98.53872118,  98.71708636,  98.07746085,
        98.13948218,  98.10071844,  98.89454554,  98.48728542,
        98.4483422 ,  98.92955411,  99.69283715,  97.72255368,
        97.97045423,  98.03036962,  99.23371285,  99.18598296,
        98.59102904,  98.8095411 ,  96.9865492 ,  98.1989531 ,
        97.71545862,  99.58580333,  98.19270584,  99.25

In [13]:
diff = sample1-sample2
diff

array([ 0.25109659,  0.44834064, -0.00674835, -0.22495515, -1.08927164,
        1.509523  , -0.66592763, -0.02045447, -0.22765386,  0.75436425,
       -0.14428354, -2.16536557, -1.04182472, -0.08158349, -1.01553033,
       -0.17112087,  0.052852  , -1.09852479, -1.91869042,  0.18566098,
       -2.354198  ,  0.57167833, -0.12378299,  0.14851127, -0.37280649,
       -0.29519184, -0.13417911, -0.77419485, -0.67985602, -0.29481283,
       -0.07556377, -2.23258112, -0.25563539,  0.6555945 ,  0.71552626,
       -0.11775919,  0.21083998,  0.44549336, -1.03105505, -2.10321631,
        0.55126228, -0.13872118, -1.81708636, -0.87746085,  0.86051782,
       -0.20071844, -1.49454554, -1.08728542, -0.5483422 , -1.82955411,
       -0.79283715,  0.57744632,  0.52954577,  0.56963038, -1.03371285,
       -0.58598296,  0.20897096, -0.6095411 ,  1.2134508 , -0.5989531 ,
        1.38454138, -1.18580333,  0.00729416, -0.65782182, -1.28956901,
       -2.4297676 , -1.26838278, -0.32234103,  0.11103694, -0.08

## By the central limit theorem, under the null hypothesis that the mean of df.temperature is 98.6, diff should follow a normal distribution with mean 0.

In [14]:
np.mean(diff)/(np.std(diff)/np.sqrt(130))

-3.6104468653653012

## Answer: The above z-statistic corresponds to a p-value of 0.0008 for a two-sided test, we can reject the null hypothesis that the mean of df.temperature is 98.6 if we set the confidence level of to be 99.9%.

## The result is still to reject the null hypothesis of the mean is 98.6 but the p-value is slightly larger. The reason for the larger p-value is that in the previous one-sample test, we actually compared  the orginal sample to a constant 98.6 while in the two-sample test, we compared the original sample to a normal distribution. The variance of the generated comparative sample in the two-sample test makes the difference.

## 4.Draw a small sample of size 10 from the data and repeat both frequentist tests. 

In [15]:
df_sample = np.random.choice(df.temperature,10)
df_sample

array([97.4, 97.9, 99.2, 99.2, 98.7, 98.7, 97.7, 99.2, 98.8, 98.6])

## Question: Which one is the correct one to use? 

## Answer: A one-sample test is the correct one to use. A two-sample test is to test the difference between two statistical quantities. First I will do a one-sample test. Since the statement in part 4 also requests us to "repeat both frequentist tests", I assume this is refered to "one-sample test" and "two-sample test". I will do a two-sample test for completion.

## The following is a one-sample two-sided t-test. Since the sample size is 10 < 30, a t-test is applied. The null hypothesis is H0: the mean is 98.6.

In [16]:
t_statistic = (np.mean(df_sample)-98.6)/(np.std(df_sample)/np.sqrt(10))
t_statistic

-0.3060268703388876

## The t-statistic corresponds to a p-value of 0.0127 for a two-sided t-test with 9 degrees of freedom. We will reject the null hypothesis at a 95% confidence level and cannot reject the null hypothesis at a 99% confidence level.

## The following is a two-sample two-sided t-test. A comparison sample needs to be generated first. We are testing if the mean of the difference is zero.

In [17]:
df_sample2 = np.random.normal(98.6, np.std(df_sample), 10)
df_sample2

array([98.66412921, 98.33582635, 99.29858122, 98.36092712, 97.74519309,
       98.96410374, 98.50824161, 99.00908571, 99.05305942, 98.80878695])

In [18]:
diff = df_sample - df_sample2
t_statistic = np.mean(diff)/(np.std(diff)/np.sqrt(10))
t_statistic

-0.6657966725551016

## The t-statistic corresponds to a p-value of 0.0038. We will reject the null hypothesis that the original has a mean of 98.6 at a 99.5% confidence level.

## Question: What do you notice? What does this tell you about the difference in application of the t and z statistic? 

## Ans: The t-statistic gives a higher p-value in both one-sample and two-sample tests than the counterparts of the z-statistic. 

## 5.At what temperature should we consider someone's temperature to be "abnormal"? 
### As in the previous example, try calculating everything using the boostrap approach, as well as the frequentist approach. 
### Start by computing the margin of error and confidence interval. When calculating the confidence interval, keep in mind that you should use the appropriate formula for one draw, and not N draws. 

## The following is the Bootstrap approach.

In [19]:
bootstrap_samples

array([97.96, 98.28, 97.8 , 98.86, 98.76, 98.2 , 98.38, 98.32, 97.84,
       98.1 , 98.  , 98.46, 97.84, 97.62, 98.04, 98.82, 98.32, 98.24,
       98.34, 98.48, 98.58, 98.34, 98.24, 98.28, 98.52, 97.88, 97.9 ,
       98.16, 98.66, 98.98, 98.06, 98.56, 98.3 , 98.42, 98.4 , 98.18,
       98.58, 98.44, 98.04, 98.2 , 98.58, 97.98, 98.06, 98.22, 97.8 ,
       98.22, 98.  , 98.48, 97.86, 98.52, 97.76, 98.32, 98.82, 97.98,
       98.46, 98.26, 98.02, 98.24, 98.42, 98.8 , 98.36, 98.46, 97.84,
       97.9 , 97.96, 97.78, 97.86, 98.4 , 98.04, 98.14, 98.1 , 98.12,
       97.82, 98.08, 98.64, 98.4 , 98.9 , 98.4 , 98.26, 98.14, 98.66,
       97.9 , 98.12, 98.3 , 98.7 , 97.78, 98.86, 98.56, 98.76, 98.74,
       98.98, 98.24, 97.84, 98.4 , 98.12, 98.2 , 97.94, 97.82, 98.48,
       98.3 , 98.06, 98.22, 98.8 , 97.78, 97.98, 98.48, 97.7 , 98.32,
       98.02, 97.54, 98.6 , 97.92, 98.86, 97.68, 98.54, 98.36, 98.58,
       97.88, 98.38, 98.5 , 98.46, 97.6 , 97.96, 98.34, 98.1 , 98.44,
       97.98, 98.28,

In [20]:
np.percentile(bootstrap_samples,0.5)

97.5787

In [21]:
np.percentile(bootstrap_samples,99.5)

98.97999999999999

In [22]:
for i in range(len(bootstrap_samples)):
    if bootstrap_samples[i]<np.percentile(bootstrap_samples,0.5) or bootstrap_samples[i]>np.percentile(bootstrap_samples,99.5):
        print(bootstrap_samples[i])

97.53999999999999


In [23]:
for i in range(len(df.temperature.values)):
    if df.temperature.values[i]<np.percentile(bootstrap_samples,0.5) or df.temperature.values[i]>np.percentile(bootstrap_samples,99.5):
        print(df.temperature.values[i])

99.3
99.2
99.2
96.7
97.2
99.4
99.3
99.5
99.1
96.4
96.9
97.2
99.0
97.4
97.4
97.1
99.1
97.4
97.4
99.0
97.4
97.5
99.2
97.1
100.0
97.1
96.8
99.9
99.0
99.4
96.7
97.0
97.3
99.1
99.0
100.8
99.0
97.2
97.5
96.3


## In the Bootstrap approach, the .5% percentile and the 99.5% percentile is computed for the Bootstrap sample and a 99% confidence interval is obtained which is [97.34,99.18]. The records that fall outside of this range is considered to be abnormal. 

## The following is the frequentist approach which is based on the central limit theorem and the z-statistic.

In [24]:
np.mean(df.temperature.values)

98.24923076923075

In [25]:
lower = np.mean(df.temperature.values)-2.576*np.std(df.temperature.values)/np.sqrt(130)
lower
#-2.576 is the 0.5% percentile for the standard normal distribution

98.08422092977592

In [26]:
upper = np.mean(df.temperature.values)+2.576*np.std(df.temperature.values)/np.sqrt(130)
upper
#2.576 is the 99.5% percentile fot the standard normal distribution

98.41424060868557

In [27]:
count = 0
for i in range(len(df.temperature.values)):
    if df.temperature.values[i]<lower or df.temperature.values[i]>upper:
        count +=1
        #print(df.temperature.values[i])
print('The number of abnormalies is ', count)

The number of abnormalies is  103


## In the frequentist approach, the .5% percentile and the 99.5% percentile is computed for the original sample and a 99% confidence interval is obtained which is [98.08,98.41]. The records that fall outside of this range is considered to be abnormal. 

## http://www.stat.yale.edu/Courses/1997-98/101/confint.htm uses a t-test to find a 95% confidence interval:(98.123,98.375). 115 records of the data falls outside of the 95% confidence interval based on the t-test.

## 6.Is there a significant difference between males and females in normal temperature? 
### What testing approach did you use and why? 
### Write a story with your conclusion in the context of the original problem.

## I will separate the original sample into two groups according to the gender. Then a two-sided t-test will be performed to determine whether the two groups have the same mean.

In [28]:
df_male = df.loc[df.gender=='M']
df_male.head(10)

Unnamed: 0,temperature,gender,heart_rate
2,97.8,M,73.0
5,99.2,M,83.0
6,98.0,M,71.0
7,98.8,M,78.0
12,98.2,M,72.0
17,98.2,M,64.0
19,99.4,M,70.0
21,98.2,M,71.0
22,98.6,M,70.0
23,98.4,M,68.0


In [29]:
df_female = df.loc[df.gender=='F']
df_female.head(10)

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
3,99.2,F,66.0
4,98.0,F,73.0
8,98.4,F,84.0
9,98.6,F,86.0
10,98.8,F,89.0
11,96.7,F,62.0
13,98.7,F,79.0
14,97.8,F,77.0


In [30]:
male = df_male.temperature.values
female = df_female.temperature.values
s1 = np.std(male)
s2 = np.std(female)
t = (np.mean(male)-np.mean(female))/np.sqrt(s1*s1/65+s2*s2/65)
t

-2.3032202891943516

## The degree of freedom is 64. The t-statstic corresponds to a p-value of 0.0245 for a two-sided t-test. We will reject the null hypothesis that both genders have the same mean of temperatures at the confidence level of 95% and fail to reject the null hypothesis at the confidence level of 99%. 

## My conclusion is the actual temperature for a normal human body is not 98.6F but is somewhat lower than that, at approximately 98.245F. Body temperatures in male bodies seem to be a little lower than body temperatures in female bodies.