# Hypothesis Testing (One Sample)

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

### One Sided Hypothesis Tests

#### Example: Pharmaceutical Company

A pharmaceutical company is trying out a medication for lowering blood sugar and managing diabetes. It is known that any level of Hemoglobin A1c below 5.7% is considered normal. The drug company has treated 100 study volunteers with this medication and would like to prove that after treatment their mean A1c is below 5.7%.

In [2]:
pop_mean = 5.7
sample_mean = 5.1
sample_std = 1.6
n = 100
statistic = (sample_mean - pop_mean)/(sample_std/np.sqrt(n))
pval = stats.t.sf(np.abs(statistic), n-1)
print(statistic)
print(pval)

-3.750000000000003
0.0001489332089038242


In [4]:
# Confidence Interval
stats.t.interval(0.95, df=n-1, loc=sample_mean, scale=(sample_std/np.sqrt(n)))

(4.78252528775861, 5.417474712241389)

#### Example: Municipal Children's Home

Boys of a certain age are known to have a mean weight of μ = 85 pounds. A complaint is made that the boys living in a municipal children's home are underfed and thus underweight (one-sided test!!). As one bit of evidence, n = 25 boys(of the same age) are weighed and found to have a mean weight of 80.94 pounds. It is known that the population standard deviation σ is 11.6 pounds (the unrealistic part of this example!).  
Based on the available data, what should be concluded concerning the complaint?

In [5]:
# your code here

0.046447544473094286


In [6]:
# Confidence Interval


(80.27955246027904, 81.60044753972096)

### Two-sided Hypothesis Tests

#### Example: Honolulu Heart Study

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

In [7]:
pop_mean = 120
sample_mean = 130.1
sample_std = 21.21
n = 100
statistic = (sample_mean - pop_mean)/(sample_std/np.sqrt(n))
pval = stats.t.sf(np.abs(statistic), n-1)*2 # for two-sided: *2 !!
print(statistic)
print(pval)

4.761904761904759
6.562701817208617e-06


In [8]:
# Confidence Interval
stats.t.interval(0.95, df=n-1, loc=sample_mean, scale=(sample_std/np.sqrt(n)))

(125.89147584585008, 134.30852415414992)

## Using data arrays

#### Generating 1000 draws from a standard normal random variable

In [9]:
X = stats.norm(0, 1).rvs(size = 10)

#### Test if the sample average of X is equal to 0

In [24]:
stats.ttest_1samp(X, 0)

Ttest_1sampResult(statistic=-1.376167707188766, pvalue=0.20204040098102202)

#### Using actual data

In [None]:
data = pd.read_csv('../../../03_data-visualization/02_lab-matplotlib-seaborn/your-code/Fitbit2.csv') 
data.head()

In [None]:
data.describe()

In [None]:
ttest_1samp(data['Distance'], 8.5)