# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [3]:
import pandas as pd

import first
first.head()

df = pd.read_csv('data/human_body_temperature.csv')

ModuleNotFoundError: No module named 'first'

In [65]:
# Your work here.
# import the necessary packages first, numpy and matplotlib are commonly used
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
# import all the stat methods needed!
df.head()
# df.head() returen the first n lines of the data and help you get the header names 

Unnamed: 0,temperature,gender,heart_rate
0,99.3,F,68.0
1,98.4,F,81.0
2,97.8,M,73.0
3,99.2,F,66.0
4,98.0,F,73.0


t = df.temperature
binwidth = 0.2
## plt.hist(array_like/data input, bins/number of bins, normed)
n, bins, patches = plt.hist(t, bins=np.arange(min(t), max(t) + binwidth, binwidth), normed=1)
mu = df.temperature.mean()
var = df.temperature.var()
sigma = np.sqrt(var)
plt.plot(bins, stats.norm.pdf((bins-mu)/sigma), color='r')



In [66]:
print ("Sample mean is: ", df.temperature.mean())
print ("Sample variance is: ", df.temperature.var())

Sample mean is:  98.24923076923078
Sample variance is:  0.5375575432319613


In [67]:
#Q1 is the distribution of the human temperature normal?
#make the null hypothesis: H0: the human temperature fails in the a normal distribution
# We assume each temperature is an independent normal variable, and thus Chi test is applied
#cdf= cumulative distribution function
expected = (stats.norm.cdf((bins[1:]-mu)/sigma) - stats.norm.cdf((bins[:-1]-mu)/sigma))
chi = sum((n-expected)**2/expected)
degree = len(bins)-1
print ("the value of chi is:", chi)
print ("the degree of the freedom is:", degree)


the value of chi is: 121436377.258
the degree of the freedom is: 23


In [68]:
stats.chisquare(n, expected)

Power_divergenceResult(statistic=121436377.25844333, pvalue=0.0)

In [69]:
stats.normaltest(df.temperature)

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

In [70]:
#The sample p-value is larger than the normal test(table) pvalue
#Thus, we cannot reject the null hypothesis

In [71]:
# Q2 is the sample large enough
#Return the dimmension of the dataframe
print ("the sample size is :", df.shape[0])

the sample size is : 130


In [72]:
# CLT (central limit therory) works when the sample size is larger than 30, thus the sample
# is large enough

In [73]:
# Q3 whether the true population mean is 98.6 degrees F
# This is a one sample test, b/c the mean is used to infer the whole population
# The null hypothesis is H0 mu=98.6, H1: mu~=98.6, the significance level is set as 5%

In [74]:
mu = df.temperature.mean()
s = df.temperature.std()
n = len(df.temperature)
s_mean= s/(np.sqrt(n))

In [75]:
print ("standard error of the mean:",s_mean)

standard error of the mean: 0.0643044168379


In [76]:
# calculate the Z value
z = (mu-98.6)/s_mean
print ("the Z value is:", z)

the Z value is: -5.45482329236


In [77]:
# calculat the corresponding p value
p = stats.norm.sf(abs(z))*2
print("the p value is equal to:",p)

the p value is equal to: 4.90215701411e-08


In [78]:
# the p value is much smaller than the significance level 0.05, thus reject the null hypothesis.

In [79]:
# Q4 at what temperature should we consider the person's temperature as abnormal?
# if the temperature is outside the confidence interval, we may assume the person's 
# temperature is abnormal. Take 95% confidence interval
conf =0.95
conf_z = 1-(1-conf)*2
conf_mean  = stats.norm.ppf(conf_z)*s_mean
CI = (mu-conf_mean, mu+conf_mean)
print("The confidence interval is:", CI,"." )
print ("If the temperature is outside this range, it is abnormal.")

The confidence interval is: (98.166821343160748, 98.331640195300807) .
If the temperature is outside this range, it is abnormal.


In [80]:
#Q5 Is there a significant difference between males and females in normal temperature?
mean_female = df[df.gender == "F"].temperature.mean()
mean_male = df[df.gender =="M"].temperature.mean()
var_female = df[df.gender == "F"].temperature.var()
var_male = df[df.gender =="M"].temperature.var()
n_female =len(df[df.gender == "F"].temperature)
n_male =len(df[df.gender == "M"].temperature)
SE = np. sqrt(var_female/n_female+var_male/n_male)
print ( "The standard error for difference between males and females'temperature is:", SE)

The standard error for difference between males and females'temperature is: 0.12655395042


In [81]:
z = (mean_female-mean_male)/SE
print ("The Z valus is:",z)

The Z valus is: 2.28543453817


In [82]:
p = stats.norm.sf(abs(z))**2
print ("The p value for the difference temperature between male and female is:", p)

The p value for the difference temperature between male and female is: 0.000124181612419


In [83]:
# The p value is much smaller than the significance value 5%, thus reject the null hypothesis
# There is difference between male and female temperature