# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
df

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0
5,99.2,M,83.0
6,98.0,M,71.0
7,98.8,M,78.0
8,98.4,F,84.0
9,98.6,F,86.0


In [16]:
#1
import numpy as np
import numpy.ma as ma
from scipy.stats import mstats

x = np.array(df['temperature']) 
#mx = ma.masked_array(x, nomask)
z,pval = mstats.normaltest(x)

print(pval)

if(pval < 0.055):
    print("Not normal distribution")
else:
    print("Normal distribution")

0.258747986349
Normal distribution


In [None]:
#2 - yes the sample is fairly large  and observations are independent as the're each derived from an individual

In [17]:
#3

#True mean:

df['temperature'].mean()

98.24923076923078

In [18]:
#3 - one sample test is appropriate as there is one group on individuals and observation is on one metric (temperature)

In [19]:
#3 - z test is more useful here since the sample size is large

In [23]:
#3 - zscores

df['zscore'] = (df['temperature'] - df['temperature'].mean())/df['temperature'].std(ddof=0)
df

Unnamed: 0,temperature,gender,heart_rate,zscore
0,99.3,F,68.0,1.438705
1,98.4,F,81.0,0.206432
2,97.8,M,73.0,-0.615083
3,99.2,F,66.0,1.301786
4,98.0,F,73.0,-0.341245
5,99.2,M,83.0,1.301786
6,98.0,M,71.0,-0.341245
7,98.8,M,78.0,0.754109
8,98.4,F,84.0,0.206432
9,98.6,F,86.0,0.480270


In [26]:
#male vs female temperatures - not significant difference

df.groupby(['temperature', 'gender'], as_index=False).mean().groupby('gender')['temperature'].mean()

gender
F    98.396296
M    98.082759
Name: temperature, dtype: float64