# What's Normal? -- Temperature, Gender, and Heart Rate

## The Data

- We will study with a dataset on body temperature, gender, and heart rate.
- We'll try to understand the concepts like 
    - true means, 
    - confidence intervals, 
    - one sample t test, 
    - independent samples t test,
    - normality check, 
    - homogenity of variance check (Levene's test),
    - correlation,
    - regression.


- The data were derived from an article in the Journal of the American Medical Association entitled "A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich" (Mackowiak, Wasserman, and Levine 1992).
- Source: http://jse.amstat.org/v4n2/datasets.shoemaker.html

## Data Column Reference

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0lax{text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg">
<thead>
  <tr>
    <th class="tg-0lax">Variable</th>
    <th class="tg-0lax">Type</th>
    <th class="tg-0lax">Explanation</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td class="tg-0pky">temperature</td>
    <td class="tg-0pky">Numeric</td>
    <td class="tg-0pky">Body Temperature (degrees Fahrenheit)</td>
  </tr>
  <tr>
    <td class="tg-0pky">gender</td>
    <td class="tg-0pky">Categorical</td>
    <td class="tg-0pky">Gender (1=Male, 2=Female)</td>
  </tr>
  <tr>
    <td class="tg-0lax">heart_rate</td>
    <td class="tg-0lax">Numeric</td>
    <td class="tg-0lax">Heart Rate (beats per minute)</td>
  </tr>
</tbody>
</table>

## Data Preparation

⭐ Import **pandas**, **scipy.stats**, **seaborn**, and **matplotlib.pyplot** libraries

In [None]:
import pandas as pd
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

⭐Ignore warnings

In [None]:
import warnings
warnings.filterwarnings('ignore')

⭐Run the following code to read in the "normtemp.dat.txt" file.

In [None]:
df = pd.read_csv('http://jse.amstat.org/datasets/normtemp.dat.txt', delim_whitespace=True, names=["temperature", "gender", "heart_rate"])

In [None]:
df.head()

In [None]:
df.info()

⭐Replace the gender levels [1, 2]  with ["male", "female"]

In [None]:
df["gender"] = df["gender"].replace(to_replace=[1,2], value=["male", "female"])

In [None]:
df.head()

## Task-1. Is the *body temperature* population mean  98.6 degrees F?

⭐What is the mean for body temperature?

⭐What is the standard deviation for body temperature?

⭐What is the standard error of the mean for body temperature?

⭐Plot the distribution of body temperature. You can either use *Pandas* or *Seaborn*.

**KEY NOTE:**

The objective of Statistics often is <u>*to make inferences about unknown population parameters*</u> based on information contained in sample data. 

These inferences are phrased in one of two ways: 

 - as estimates of the respective parameters or 
 - as tests of hypotheses about their values.

**Statistical inference**. How to estimate population attributes, based on sample data. How to test statistical hypotheses.

### Confidence Interval using the t Distribution

**Key Notes about Confidence Intervals**

💡A point estimate is a single number. 

💡A confidence interval, naturally, is an interval.

💡Confidence intervals are the typical way to present estimates as an interval range.

💡The point estimate is located exactly in the middle of the confidence interval. 

💡However, confidence intervals provide much more information and are preferred when making inferences.  

💡The more data you have, the less variable a sample estimate will be.

💡The lower the level of confidence you can tolerate, the narrower the confidence interval will be.

⭐Investigate the given task by calculating the confidence interval for this sample of 130 subjects. (Use 90%, 95% and 99% CIs)

In [None]:
#95% Confidence Interval


Our confidence interval for μ is (98.12200290560803, 98.3764586328535).

In repeated sampling, approximately 95% of all intervals of the form X̄ ± t*(s/√n) include μ, the true mean body temperature.

In [None]:
#Write a for loop calculate 90%, 95% and 99% CIs around the sample mean

lower = []
upper = []



### One Sample t Test

⭐**Investigate the given task by using One Sample t Test.**

**Key Notes about Hypothesis Testing (Significance Testing)**

💡Assumptions 

💡Null and Alternative Hypothesis

💡Test Statistic 

💡P-value

💡Conclusion

___🚀First, check the normality. *Use scipy.stats.shapiro

<i>H<i/><sub>0</sub>: "the variable is normally distributed"<br>
<i>H<i/><sub>1</sub>: "the variable is not normally distributed"

___🚀Then, conduct the significance test. *Use scipy.stats.ttest_1samp*

The sample standard deviation is .73, so the standard error of the mean is .064. Thus the calculated t (using the sample mean of 98.25) is -5.45.

In [None]:
# Compare p-value and alpha
alpha = 0.05

if oneSamp.pvalue < alpha:
    print("Reject the null")
else:
    print("Fail to reject the null")

## Task-2. Is There a Significant Difference Between Males and Females in Normal Temperature?

H0: µ1 = µ2 ("the two population means are equal")

H1: µ1 ≠ µ2 ("the two population means are not equal")

⭐Show descriptives for 2 groups

⭐Plot the histogram for both groups side-by-side.

⭐Plot the box plot for both groups side-by-side.

⭐**Investigate the given task by using Independent Samples t Test.**

___🚀First, check the normality for both groups. *Use scipy.stats.shapiro*

In [None]:
#Check the normality for male group

stat, p = 

print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
	print('Sample looks Gaussian (fail to reject H0)')
else:
	print('Sample does not look Gaussian (reject H0)')

In [None]:
#Check the normality for female group

stat, p = 

print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
	print('Sample looks Gaussian (fail to reject H0)')
else:
	print('Sample does not look Gaussian (reject H0)')

___🚀Test the assumption of homogeneity of variance
*Hint: Levene’s Test*

The hypotheses for Levene’s test are: 

<i>H<i/><sub>0</sub>: "the population variances of group 1 and 2 are equal"
    
<i>H<i/><sub>1</sub>: "the population variances of group 1 and 2 are not equal"

In [None]:
stat, p = 

print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
	print('The population variances of group 1 and 2 are equal (fail to reject H0)')
else:
	print('The population variances of group 1 and 2 are not equal (reject H0)')

___🚀Conduct the significance test. Use scipy.stats.ttest_ind

H0: µ1 = µ2 ("the two population means are equal")

H1: µ1 ≠ µ2 ("the two population means are not equal")

In [None]:
twosample = 

alpha = 0.05
p_value = twosample.pvalue

if p_value<alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of the alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

## Task-3. Is There a Relationship Between Body Temperature and Heart Rate?

⭐Plot the scatter plot

⭐Check the normality for heart rate variable

In [None]:
stat, p = 

print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
	print('Sample looks Gaussian (fail to reject H0)')
else:
	print('Sample does not look Gaussian (reject H0)')

⭐**Conduct a correlation test**, report Pearson’s correlation coefficient and two-tailed p-value. *Use scipy.stats.pearsonr*

Two-tailed significance test:

H0: ρ = 0 ("the population correlation coefficient is 0; there is no association")

H1: ρ ≠ 0 ("the population correlation coefficient is not 0; a nonzero correlation could exist")

In [None]:
r = 

⭐**Find a regression equation** to predict heart rate from body temperature.

⭐Calculate the predicted heart rate of a person at the temperature 97 F.

⭐How much of the variation of the heart_rate variable is explained by the temperature variable? *Coefficient of determination (R-squared):*