# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [1]:
import pandas as pd

df = pd.read_csv('data/human_body_temperature.csv')

In [2]:
# Your work here.
df.count()


temperature    130
gender         130
heart_rate     130
dtype: int64

In [3]:
#sample size is > 30 and observations are indepdendent. satifies CLT

In [4]:
#1, #2
#determine if the distribution is normal
import scipy.stats as stats
import numpy as np
n=[]
y=[]
for i in range(0,1000):
    x = stats.norm.rvs(size = 130)
    n.append(stats.normaltest(x))

for i in range(0,len(n)):
    y.append(n[i][1])

np.mean(y)


0.52238683179020728

p value  is large enough- which means that its likely the null hypothesis is true (sample comes from a normal distribution)

In [6]:
#3
#one sample z test where H0: 98.6 and H1: != 98.6
temp=np.array(df['temperature'])
sample_meantemp = temp.mean()
sample_std = temp.std()
count=np.size(temp)


In [7]:
#get length of sample and calculate standard error:
standard_error=sample_std/np.sqrt(count)

#calculate the z score:

z_score = (sample_meantemp - 98.6) / standard_error
z_score

-5.4759252020785585

In [112]:
stats.norm.cdf(z_score)


2.1761575829356528e-08

given that the null hypothesis is true, it is unlikely that we will get a result as extreme or greater than the sample mean, so we reject the null hypotheses


In [19]:
#4- confidence interval:
#sample mean +/- Zcritical (alpha/2) times sample std dev 
zcrit=stats.norm.ppf(.975)
margin_of_error=zcrit*sample_std
lower_interval=sample_meantemp-margin_of_error
higher_interval=sample_meantemp+margin_of_error
print(lower_interval, higher_interval)


96.8177558267 99.6807057117


given result above, anything that falls outside that interval indicates an abnormal result

In [42]:
#5
df_female=df[df.gender=='F']
df_male=df[df.gender=='M']
male_mean=df_male['temperature'].mean()
female_mean=df_female['temperature'].mean()
male_stdev=df_male['temperature'].std()
female_stdev=df_female['temperature'].std()

In [45]:
#calculate the difference of the means
#H0: difference is 0, H1: difference is not 0. Use a 2 sample Z test 
diff_means=abs(male_mean-female_mean)
diff_stdev=np.sqrt(male_stdev**2/len(df_male) + female_stdev**2/len(df_female))
z_score_diff = (diff_means-0)/diff_stdev
z_score_diff

2.2854345381652741

In [48]:
p_value_diff = 1-stats.norm.cdf(z_score_diff)
p_value_diff

0.011143680380338639

Based on this P-Value, we notice that given there is no difference between the male and female means, it is unlikely to get a result as large as observed from the data, so we can reject this null hypothesis and have confidence that the observed effect is real