# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('human_body_temperature.csv')

In [7]:
df.head()

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


## Is the sample sample body temperatures normally distributed?


In [8]:
df.temperature.hist(normed=True)
mean = df.temperature.mean()
n = df.temperature.count()
sigma = np.std(df.temperature)
mean, n, sigma

(98.24923076923078, 130, 0.7303577789050377)

# Observations
* The chart demostrate a normal distribution. 
* There are 130 samples, which is greater than 30, and is sufficient to apply CLT

## Is the population mean really 98.6 degrees F?


In [9]:
sample_mean = df.temperature.mean()
sigma = np.std(df.temperature)/np.sqrt(n)
t_stat = (sample_mean - 98.6)/sigma
print("t_stat = ", t_stat)


t_stat =  -5.47592520208


In [10]:
# We don't know the standard deviation of the population so we calculate the T-statistic 
# Instead of sigma we divide the sample standard deviation by the sqrt of number of samples
# It is appropriate to use the Z-table over T-table because the sample size is large (130), and therefor CLT applies (Z-stat, and T-stat are virtually equivalent)

# It is a two tailed test because we are testing if the temperature is greater or less than the mean


# Observations
* From the results of our sample, we would reject the null hypothesis, and conclude the population mean is not 98.6

## What would you consider a normal body temperature?

In [11]:
#95% Confidence Interval
z_stat = 1.96 #From Z-table
mu = df.temperature.mean()
sigma = np.std(df.temperature)/np.sqrt(n)

lcl = mu - (z_stat*sigma)
ucl = mu + (z_stat*sigma)

print("With a 5% margin of error, population's mean body temperature is between ", lcl, " and ", ucl)

With a 5% margin of error, population's mean body temperature is between  98.1236798044  and  98.374781734


In [None]:
#

## Is there a difference between Female and Male

In [12]:
df_female = df[df.gender == "F"].temperature
df_female.hist(normed=True)
n1 = df_female.count()
n1

65

In [13]:
x1 = df_female.mean()
s1 = np.std(df_female)

x1, s1

(98.39384615384613, 0.7377464486428966)

In [14]:
df_male = df[df.gender == "M"].temperature
df_male.hist(normed=True)
n2 = df_male.count()
n2

65

In [15]:
x2 = df_male.mean()
s2 = np.std(df_male)

x2, s2

(98.1046153846154, 0.6933598841828696)

In [16]:
#T-test of two populations because the population std. deviation is not known
#95% CI interval of difference between the male and female populations
mu = x1 - x2
sigma = np.sqrt((s1**2/n1)+(s2**2/n2))
z_stat = 1.96 #from z-table

lcl = mu - (z_stat*sigma)
ucl = mu + (z_stat*sigma)

lcl, ucl

(0.043100466214595207, 0.5353610722468588)

# Observations
* Women have a higher mean body temperature. We can be 97.5% confident it is at least .043 degrees higher
* Women show a wider distribution of body temperatures than males