# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  Draw a small sample of size 10 from the data and repeat both tests. 
    <ul>
    <li> Which one is the correct one to use? 
    <li> What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [112]:
import pandas as pd

df = pd.read_csv('human_temp/data/human_body_temperature.csv')

In [113]:
# Your work here.
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import pprint

#1.
df = pd.read_csv('human_temp/data/human_body_temperature.csv')
temp_mean = np.mean(df.temperature)
temp_std = np.std(df.temperature)
samples = np.random.normal(temp_mean,temp_std,size=10000)

plt.hist(df.temperature, int(np.sqrt(len(df.temperature))),normed=True)
plt.xlabel('DF Temperature')
plt.ylabel('PDF')
plt.show()

plt.hist(samples,bins=int(np.sqrt(len(df.temperature))),normed=True)
plt.xlabel('Permutation Sample Temperature')
plt.ylabel('PDF')
plt.show()

In [114]:
# 2. large sample size? independent observations?
len(df)
    
#seperating data into Male and Female subsets
sample_male = df[df.gender == 'M']
sample_female = df[df.gender == 'F']

#ensuring equal number of samples
print(len(sample_male), len(sample_female))

#
male_temp_mean = np.mean(sample_male.temperature)
male_temp_std = np.std(sample_male.temperature)

female_temp_mean = np.mean(sample_female.temperature)
female_temp_std = np.std(sample_female.temperature)

print(male_temp_mean,female_temp_mean)

print(-temp_mean+male_temp_mean, "for male")
print(-temp_mean+male_temp_std, "for male")

print(-temp_mean+female_temp_mean, "for female")
print(-temp_mean+female_temp_std, "for female")

65 65
98.1046153846154 98.39384615384613
-0.1446153846153777 for male
-97.55587088504791 for male
0.14461538461534929 for female
-97.51148432058788 for female


In [115]:
print("The true population mean is ", temp_mean)
print("sample size is greater than 30 for both genders so z is more appropriate")
test_samples = np.random.normal(temp_mean,temp_std,size=10000)
                                
print("use one sample test since we are comparing to a known value",
      "and there's no control group ",
      "and/or testing group and to maintain independence we can only use",
      "each measurement once")
print("null hypothesis:",
      temp_mean, "sample mean vs. 98.6 degree mean")
print("sample std dev", temp_std,
      "# of observations", len(df))

The true population mean is  98.24923076923078
sample size is greater than 30 for both genders so z is more appropriate
use one sample test since we are comparing to a known value and there's no control group  and/or testing group and to maintain independence we can only use each measurement once
null hypothesis: 98.24923076923078 sample mean vs. 98.6 degree mean
sample std dev 0.7303577789050377 # of observations 130


In [116]:
#t=np.sum(np.mean(df.temperature-98.6)/(np.std(df.temperature -98.6)/np.sqrt(len(df))))
a=np.array(df.temperature)

t = stats.ttest_1samp(a,98.6)
print(t, " t-statistic")
#s=np.random.standard_t(len(df.temperature),size=10000)
#p=np.sum(s<t)/float(len(s))
#print(p,"p-value")

z=np.sum(np.mean(df.temperature-98.6)/(np.std(df.temperature -98.6)))
zs=stats.zscore(a)
zp=np.sum(zs)/(np.std(a -98.6)/np.sqrt(len(df)))
print(np.sum(zs),"z stat")



Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.4106320415561276e-07)  t-statistic
3.60067531346e-12 z stat


In [117]:
resample_df=a[::13]

#t=np.sum(np.mean(resample_df-98.6)/(np.std(resample_df -98.6)/np.sqrt(len(df))))
#print(t, "t-statistic")
#s=np.random.standard_t(len(resample_df),size=10000)
#p=np.sum(s<t)/float(len(s))
#print(p,"p-value")

t = stats.ttest_1samp(resample_df,98.6)
print(t)
#z=np.sum(np.mean(resample_df-98.6)/(np.std(resample_df -98.6)/np.sqrt(len(df))))
zs= stats.zscore(resample_df)
zp=np.sum(zs)/(np.std(resample_df -98.6)/np.sqrt(len(resample_df)))
print(np.sum(zs),"z stat","pvalue=",zp)


Ttest_1sampResult(statistic=-1.5401210485736909, pvalue=0.15791725790106409)
-2.89213097915e-14 z stat pvalue= -9.20622072052e-14


In [118]:
print(stats.sem(a))
percentiles=np.array([2.5,25,50,75,97.5])

# Compute percentiles: ptiles_vers
ptiles_a=np.percentile(a,percentiles)

print(ptiles_a)
print(stats.sem(resample_df))
ptiles_resample_df=np.percentile(resample_df,percentiles)
print(ptiles_resample_df)
print("96.7 or 99.4 would be abnormal")

0.0643044168379
[ 96.7225  97.8     98.3     98.7     99.4775]
0.331142802361
[ 96.625   97.45    97.8     98.65    99.8425]


In [120]:
a_m=np.array(sample_male.temperature)
a_f=np.array(sample_female.temperature)

def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1) - np.mean(data_2)

    return diff

# Compute difference of mean impact force from experiment: empirical_diff_means
empirical_diff_means = diff_of_means(a_f, a_m)



# Compute p-value: p
p = np.sum(temp_mean >= empirical_diff_means) / len(df)

# Print the result
print('p-value =', p)

p-value = 0.00769230769231
