## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [3]:
import pandas as pd
import math

In [6]:
df = pd.read_csv('data/human_body_temperature.csv')
print df

     temperature gender  heart_rate
0           99.3      F        68.0
1           98.4      F        81.0
2           97.8      M        73.0
3           99.2      F        66.0
4           98.0      F        73.0
5           99.2      M        83.0
6           98.0      M        71.0
7           98.8      M        78.0
8           98.4      F        84.0
9           98.6      F        86.0
10          98.8      F        89.0
11          96.7      F        62.0
12          98.2      M        72.0
13          98.7      F        79.0
14          97.8      F        77.0
15          98.8      F        83.0
16          98.3      F        79.0
17          98.2      M        64.0
18          97.2      F        68.0
19          99.4      M        70.0
20          98.3      F        78.0
21          98.2      M        71.0
22          98.6      M        70.0
23          98.4      M        68.0
24          97.8      M        65.0
25          98.0      F        87.0
26          97.8      F     

In [1]:
#1) There are more than 30 entries. The distribution is normal and CLT applies

In [None]:
#2)There are more than 30 samples, so use the z test.
#Ho: True pop mean = 98.6
#Ha: True pop mean is not equal to 98.6.
#Assume alpha = 5%
#look at z-table for 97.5%, as its a two-tailed test. we get z = 1.96

In [3]:
print df["temperature"].mean()

98.2492307692


In [4]:
print df["temperature"].std()

0.733183158039


In [8]:
sampleStdDev =  df["temperature"].std()/math.sqrt(130) #we have 130 samples

In [9]:
z = (df["temperature"].mean()  - 98.6)/sampleStdDev
z
#You get z = -5.45, which is more extreme than the z = 1.96. So reject H0, accept Ha.

-5.454823292364079

In [10]:
#3) Confidence interval
#CI
lowerInterval = df["temperature"].mean() - 1.96*df["temperature"].std()/math.sqrt(130)
lowerInterval

98.12319411222852

In [11]:
higherInterval = df["temperature"].mean() + 1.96*df["temperature"].std()/math.sqrt(130)
higherInterval
#CI is in [lowerInterval,higherInterval]

98.37526742623304

In [13]:
#Question 4
#male pop. mean = mu1
#female pop. mean = mu2
#Ho: no sig diff -> u1 - u2 = 0
#Ha: sig diff-> u1 - u2 Not equal to 0
m1 = df[df.gender == 'M']
mmean1 = m1["temperature"].mean()
mstd1 = m1["temperature"].std()

In [14]:
f1 = df[df.gender == "F"]
fmean1 = f1["temperature"].mean()
fstd1 = f1["temperature"].std()

In [15]:
mmean1 - fmean1

-0.289230769230727

In [16]:
j = (mstd1**2)/len(m1) + (fstd1**2)/len(f1)
j

0.016015902366863885

In [17]:
k = math.sqrt(j)
k

0.12655395041982642

In [21]:
z1 = (mmean1 - fmean1- 0)/k                                                                                                                                                                                                                                                                                                                                                               

In [22]:
#compare to z = 1.96, 95% CI
z1
#got -2.28. value is more extreme, so reject h0, accept ha

-2.285434538165274

In [20]:
#alternatve way to do hypothesis testing
#compare to mu1 - mu2 = -0.289231
1.96*k
#so reject h0 accept ha

0.2480457428228598