## What is the true normal human body temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. In 1992, this value was revised to 36.8$^{\circ}$C or 98.2$^{\circ}$F. 

#### Exercise
In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions **in this notebook below and submit to your Github account**. 

1.  Is the distribution of body temperatures normal? 
    - Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply. 
2.  Is the true population mean really 98.6 degrees F?
    - Bring out the one sample hypothesis test! In this situation, is it approriate to apply a z-test or a t-test? How will the result be different?
3.  At what temperature should we consider someone's temperature to be "abnormal"?
    - Start by computing the margin of error and confidence interval.
4.  Is there a significant difference between males and females in normal temperature?
    - Set up and solve for a two sample hypothesis testing.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd

In [7]:
df = pd.read_csv('data/human_body_temperature.csv')


In [23]:
print 'Mean of Male Population ', df[df['gender']=='M']['temperature'].mean()

print 'Count of Male Population ', df[df['gender']=='M']['temperature'].count()
#print df
    

Mean of Male Population  98.1046153846
Count of Male Population  65


## Is the distribution of body temperatures normal?
Considering all the individual samples as Random Variables, arithmatic mean of all (large number of) samples would be distributed Normally (A Normal Distribution).  


## Is the true population mean really 98.6 degrees F?
Following steps need to be followed to determine if mean actually is 98.6.
H0: 98.6 is actually the Population mean.
H1: This is not Population mean. 
    1- Total number of samples under consideration are 130 and sample mean and sample standard deviation can be used to determine the population mean and population standard deviation, I would follow z-statistics.
    2- Compute sample mean(u0) and unbiased sample standard deviation(s) and standard deviation of sampled mean sd_m = (s/(n)^(1/2)). Assuming Null Hypothesis is true, U0 = 98.6. Sample under consideration is the sample mean (U1).
    3- Now consider margin of error as 5%, alpha = 0.05. This is an example of Double ended tail statistics. 
    Differece of (U1-U0) >= z*(sd_m) than NULL Hypothesis (H0) is Rejected and alternate Hypothesis is expected.

In [35]:
U1=sample_mean = df['temperature'].mean()
print 'sample mean is ', sample_mean

sample mean is  98.2492307692


In [36]:
sample_sd = df['temperature'].std()
print 'sample standard deviation is ', sample_sd

sample standard deviation is  0.733183158039


In [37]:
import math
n = df['temperature'].count()
print 'Total number of samples is ', n
sample_mean_sd = sample_sd/(math.sqrt(130))
print 'Mean of Standard Deviation ', sample_mean_sd
z = 1.96 #2 tail, alpha = 0.05

Total number of samples is  130
Mean of Standard Deviation  0.0643044168379


In [46]:
U0=98.6
#U0=98.2
print "U0 = ", U0
print "U1 = ", U1
print "Z-Value is ",z

if abs(U1-U0) < z*sample_mean_sd :
    print "NULL Hypothesis is correct:", U0, "is the mean Human temperature."
else: print "NULL Hypothesis is incorrect:", U0, "is not the mean Human temperature"

U0 =  98.6
U1 =  98.2492307692
Z-Value is  1.96
NULL Hypothesis is incorrect: 98.6 is not the mean Human temperature


Because recently revised mean temperature (98.2) is also validated by our NULL Hpothesis for given sample statistics, Normal Temperature range would be 98.2 +/- z*sample_mead_sd   

In [48]:
min, max = [98.2-z*sample_mean_sd, 98.2+z*sample_mean_sd]

In [51]:
print "Normal Temperature Range is between ", min, " to ", max 

Normal Temperature Range is between  98.073963343  to  98.326036657


## Is there a significant difference between males and females in normal temperature?

In [52]:
df_male = df[df['gender']=='M']['temperature']

In [53]:
df_female = df[df['gender']=='F']['temperature']

In [54]:
mean_m = df_male.mean()
mean_f = df_female.mean()

In [59]:
s_sd_m = df_male.std() #sample standard distribution
s_sd_f = df_female.std() #sample standard distribution
n_m = df_male.count()
n_f = df_female.count()
sd_mean_m = s_sd_m/(math.sqrt(n_m))
sd_mean_f = s_sd_f/(math.sqrt(n_f))
print "Male: Mean: ",mean_m, "SD: ", s_sd_m, "SD of Mean of Samples: ", sd_mean_m
print "Female: Mean: ",mean_f, "SD: ",s_sd_f, "SD of Mean of Samples: ", sd_mean_f

Male: Mean:  98.1046153846 SD:  0.698755762327 SD of Mean of Samples:  0.0866699855229
Female: Mean:  98.3938461538 SD:  0.743487752731 SD of Mean of Samples:  0.0922183060804


#### R_MALE : Random Variable representing Male Statistics:
#### R_FEMALE: Random Variable representing Female Statistics:
#### Assume a new Random Variable Z = R_FEMALE - R_MALE   
#### UZ =  UF - UM
#### SD_Z = SD_M + SD_F

In [65]:
sd_mean_z = sd_mean_m + sd_mean_f
UZ = abs(mean_f - mean_m)
print "Mean of Z: ", UZ, "SD of Mean of Samples:", sd_mean_z


Mean of Z:  0.289230769231 SD of Mean of Samples: 0.178888291603


In [67]:
z_val = 1.96
if UZ < z_val*sd_mean_z:
    print "NULL HYPOTHESIS in Valid: Male Statistics is equivalent to female."
else: print "NULL HYPOTHESIS in not valid: Male Statistics is different from female HYPOTHESIS."

NULL HYPOTHESIS in Valid: Male Statistics is equivalent to female.
