# What is the True Normal Human Body Temperature? 

#### Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

<div class="span5 alert alert-info">
<h3>Exercises</h3>

<p>In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.</p>

<p>Answer the following questions <b>in this notebook below and submit to your Github account</b>.</p> 

<ol>
<li>  Is the distribution of body temperatures normal? 
    <ul>
    <li> Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
    </ul>
<li>  Is the sample size large? Are the observations independent?
    <ul>
    <li> Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
    </ul>
<li>  Is the true population mean really 98.6 degrees F?
    <ul>
    <li> Would you use a one-sample or two-sample test? Why?
    <li> In this situation, is it appropriate to use the $t$ or $z$ statistic? 
    <li> Now try using the other test. How is the result be different? Why?
    </ul>
<li>  At what temperature should we consider someone's temperature to be "abnormal"?
    <ul>
    <li> Start by computing the margin of error and confidence interval.
    </ul>
<li>  Is there a significant difference between males and females in normal temperature?
    <ul>
    <li> What test did you use and why?
    <li> Write a story with your conclusion in the context of the original problem.
    </ul>
</ol>

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources

+ Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****
</div>

In [24]:
import pandas as pd
import matplotlib.pyplot as plt
import math
% matplotlib inline

df = pd.read_csv('data/human_body_temperature.csv')

In [26]:
#

In [4]:
print ("Sample size - " + str(len(df)))

Sample size - 130


### 1. Is the distribution of body temperatures normal?

In [9]:
data = df['temperature']

In [10]:
#importing stats
import scipy.stats as stats

According the the documentation, there is a test in this module that assumes normal distribution as the null hypothesis and performed chi-square test. The output contains two values -

1. chi-squared statistic
2. p-value for the chi-squared statistic - if it is very low, we have to reject our hypothesis that the distribution is normal

In [11]:
stats.normaltest(data)

NormaltestResult(statistic=2.7038014333192031, pvalue=0.2587479863488254)

So, there is a 25.87% probability that this sample of the data belongs to a normal distribution. Let's not reject our assumption of normality.

** Answer - Yes **

### 2. Is the sample size large? Are the observations independent? 

Sample size is 120. Which is a reasonably large sample size.

In [22]:
df.corr()['temperature']['heart_rate']

0.25365640272076428

Also, there is very weak correlation between temperature and heart rate. Let's assume the obervations to be independent.

### 3. Is the true population mean really 98.6 degrees F?
1. Would you use a one-sample or two-sample test? Why?
2. In this situation, is it appropriate to use the *t* or *z* statistic? 
3. Now try using the other test. How is the result be different? Why?

I would use one-sample test here as we are trying to estimate the true population mean of the whole population. Two sample tests are used when we want to compare, say, means across two different populations such male and female.

We have a sample size of 120, which is way above the sample size of 30. So, we can comfortably assume a normal distribution for the sampling distribution and use z statistic here.

** One Sample Z test **

Hypothesis - The true population mean is 98.6 degrees F

Which makes -

Mean for the sampling distribution - 98.6

Standard deviation for the sampling distribution - std for sample / square root of sample size

In [58]:
print ("Mean - 98.6")
mean = 98.6
sample_std = data.std()
print ("sample std - {}".format(sample_std))
std = data.std() / math.sqrt(120)
print ("Std - " + str(std))

print ("Let's find out the sample mean first.")
sample_mean = data.mean()
print ("Sample mean - " + str(sample_mean))

print ("Now we can calculate Z-statistic as follows = (sampling distribution mean - sample mean)")
print ("                                                / sampling distribution standard deviation")
Z = (mean - sample_mean)/std
print ("Z statistic - " + str(Z))

Mean - 98.6
sample std - 0.7331831580389454
Std - 0.0669301590734676
Let's find out the sample mean first.
Sample mean - 98.24923076923078
Now we can calculate Z-statistic as follows = (sampling distribution mean - sample mean)
                                                / sampling distribution standard deviation
Z statistic - 5.240824698835482


In [42]:
#Useing statsmodels
from statsmodels.stats.weightstats import ztest, ttest_ind
ztest(data, x2=None, value=98.6, alternative='two-sided', usevar='pooled', ddof=1.0)

(-5.4548232923645195, 4.9021570141012155e-08)

As we can see, the probability of true population mean being exactly 98.6 is very low. So we have to reject our hypothesis. Alternate hypothesis that true population mean is not 98.6, should be preferred.

** Let's try using t-test **

In [52]:
stats.ttest_1samp(data,98.6)

Ttest_1sampResult(statistic=-5.4548232923645195, pvalue=2.4106320415561276e-07)

pvalue, as expected is still very low.

### 4. At what temperature should we consider someone's temperature to be "abnormal"?

*Start by computing the margin of error and confidence interval.*

Assumptions -

1. Sample mean is equal to the true population mean (98.25)
2. Sample standard deviation is equal to the true population standard deviation (0.73)
3. 5% is the threshold for the 'abnormal'

5% corresponds to (approx) Z-value of 1.64. So, the difference betweeen a temperature value and mean is greater than 1.64*standard_deviation, we will consider it as 'abnormal'.

Let's get to the calculations

In [57]:
zintostd = 1.64 * data.std()

lowthreshold = round(data.mean() - zintostd,2)
highthreshold = round(data.mean() + zintostd,2)

print ("The body temperatures lower than {} degree F and higher than {} degree F can be considered as abnormal.".format(lowthreshold,highthreshold))

The body temperatures lower than 97.05 degree F and higher than 99.45 degree F can be considered as abnormal.


### 5. Is there a significant difference between males and females in normal temperature?

We will use 2-sample Z-test for this purpose. We'll also try T-test later.

*Null Hypothesis* - There is no difference between males and females in mean body temperatures

*Alternate Hypothesis* - There is a difference between males and females in mean body temperatures

Threshold = 5%

In [67]:
df.head()
male = df['temperature'][df['gender'] == 'M']
female = df['temperature'][df['gender'] == 'F']

#Z-Test
z = ztest(male,female, value=0, alternative='two-sided', usevar='pooled')
probability = round(z[1]*100,2)
print("The probability value for Z-test - {} %. Hence, we can reject our null hypothesis.".format(probability))

The probability value for Z-test - 2.23 %. Hence, we can reject our null hypothesis.


In [71]:
#T-Test
t = ttest_ind(male,female)
probability = round(z[1]*100,2)
print("The probability value for T-test - {} %. Hence, we can reject our null hypothesis.".format(probability))

The probability value for T-test - 2.23 %. Hence, we can reject our null hypothesis.


As we have rejected the hypothesis that men and women have the same mean body temperature. We cannot possibly define a single true mean temperature for total population, as it will contradict with our most recent hypothesis. We need to define two universal mean body temperatures, one for men and one for women.