## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [14]:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

In [15]:
df = pd.read_csv('data/human_body_temperature.csv')

In [16]:
%matplotlib inline
stats.normaltest(df.temperature)
#YES

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

In [17]:
len(df)

130

In [26]:
#z-test
pop_mean=98.6
z=(df.temperature.mean()-pop_mean)
z=z*np.sqrt(130)/df.temperature.std()
z

-5.4548232923640789

In [30]:
pval = 2 * (1 - stats.norm.cdf(-z))
pval

4.9021570136531523e-08

In [31]:
#reject Null Hypothesis

In [32]:
z_critical = stats.norm.ppf(q = 0.975) 

In [34]:
ME=z_critical*df.temperature.std()/np.sqrt(130)
ME

0.1260343410491174

In [46]:
df1=df['gender'] == "F"
male=0
female=0
for i in df1.values:
    if i == True:
        female+=1
    else:
        male+=1
male

65

In [48]:
sd1=df[df.gender=="M"].temperature.std()
sd2=df[df.gender=="F"].temperature.std()
sd1

0.6987557623265908

In [50]:
pooledSE = np.sqrt(sd1**2/male + sd2**2/female)
pooledSE

0.12655395041982642

In [52]:
z = ((df[df.gender=="M"].temperature.mean() - df[df.gender=="F"].temperature.mean()))/pooledSE
z

-2.2854345381652741

In [54]:
pval = 2*(1 - stats.norm.cdf(abs(z)))
pval

0.022287360760677277

In [None]:
#Reject Null Hypothesis