## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [17]:
import pandas as pd

In [18]:
df = pd.read_csv('data/human_body_temperature.csv')

Normality can be computed using D’Agostino and Pearson’s test that uses skewness and kurtosis to test normality. 

In [19]:
import scipy.stats as stats
print stats.normaltest(df['temperature'])

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)


Since the statistic is close to zero, and given the null hypothesis, that the value comes from a normal distribution, it can be assumed that the distribution is normal, since the p-value is high.

## True Population Mean

In order to find out whether the true population mean is really 98.6 degrees F, we perform a one sample hypothesis test.

First, we formulate the null hypothesis, H0, and the alternate hypothesis for the true population mean of the body temperature, T.

H0: T = 98.6 degrees F

H1: T =/= 98.6 degrees F

Next, we calculate the test statistic. Since there are over 100 samples, we use the z-test.

In [20]:
import math

mean = df.temperature.mean()
sd = df.temperature.std()
ssd = df.temperature.std()/math.sqrt(len(df.temperature))

z = (98.6 - mean) / ssd
print 'z = %f' % z

z = 5.454823


Using a significance level = 0.05, the rejection region is determined from the z-table to z > 1.96.

Since the z obtained is > 1.96, the null hypothesis can be rejected.

## Abnormal Body Temperature

Using a confidence level = 95%, the margin of error can be computed as:

In [21]:
e = 1.96 * ssd
e

0.12603665700226638

The lower bound for the confidence interval is then:

In [22]:
mean - e

98.12319411222852

And the upper bound is:

In [23]:
mean + e

98.37526742623304

Thus, someone's temperature can be considered to be abnormal if it is below 98.12 or above 98.38 degrees F.

## Difference Between Male and Female Body Temperature

To determine whether there is a significant difference between male and female body temperature, we set up a two sample hypothesis test.

First determine the mean and standard deviation in body temperature in males:

In [24]:
Mm = df[df.gender=='M'].temperature.mean()
SDm = df[df.gender=='M'].temperature.std()
print Mm
print SDm

98.1046153846
0.698755762327


Then determine the mean and standard deviation in body temperature in females: 

In [25]:
Mf = df[df.gender=='F'].temperature.mean()
SDf = df[df.gender=='F'].temperature.std()
print Mf
print SDf

98.3938461538
0.743487752731


The difference in the means is:

In [26]:
Diff = Mf - Mm
Diff

0.289230769230727

Setting up the null hypothesis and the alternate hypothesis:

H0: Mf - Mm = 0

H1: Mf - Mm > 0

Using a significance level of 0.05, the critical z-value is 1.65. 

Then the critical value, in order for the difference to be significant, can be computed as:

In [27]:
SDf_m = math.sqrt((SDf**2 / len(df[df.gender=='F'].temperature)) + (SDm**2 / len(df[df.gender=='M'].temperature))) 
1.65*SDf_m

0.20881401819271359

Since the computed difference is greater than this critical value, then we can reject the null hypothesis, and accept the alternative, the difference in body temperatures between males and females is significant.