# Hypothesis Testing
Hypothesis testing is a critical tool in inferential statistics, for determing what the value of a population parameter could be.

Statistical hypothesis testing reflects **the scientific method, adapted to the setting of research involving data analysis**. In this framework, a researcher makes a precise statement about the population of interest, then aims to falsify the statement. 

In statistical hypothesis testing, the statement in question is the **null hypothesis**. If we reject the null hypothesis, we have falsified it (to some degree of confidence). 

According to the scientific method, falsifying a hypothesis should require an overwhelming amount of evidence against it. If the data we observe are ambiguous, or are only weakly contradictory to the null hypothesis, we do not reject the null hypothesis.

Basis of hypothesis testing has two attributes:

**Null Hypothesis: $H_0$**

**Alternative Hypothesis: $H_a$**

Various cases which are generally used in hypothesis testing are:

* One Population Proportion
* Difference in Population Proportions
* One Population Mean
* Difference in Population Means

The equation to compute the ***test statistic*** is:

$$test\ statistic = \frac{Best\ Estimate - Hypothesized\ Estimate}{Standard\ Error\ of\ Estimate}$$

After computing this _test statistic_, we ask ourselves, "How likely is it to see this value of the test statistic under the Null hypothesis?" i.e. we compute a probability value.

Depending on that probability, we either **reject or fail to reject the null hypothesis**. Note, we **do not accept the alternate hypothesis** because we can never ovserve all the data in the universe.

### Type-I and Type-II errors

The framework of formal hypothesis testing defines two distinct types of errors. A **type I error (false positive)** occurs when the null hypothesis is true but is incorrectly rejected. A **type II error** occurs when the null hypothesis is not rejected when it actually is false. 

Most traditional methods for statistical inference aim to strictly control the probability of a type I error, usually at 5%. While we also wish to minimize the probability of a type II error, this is a secondary priority to controlling the type I error.

Now let us see some widely used hypothesis testing types:

- **T-test (Student test)**
- **Z-test**

**t-test**: The t test (also called Student’s T Test) compares two averages (means) and tells you if they are different from each other. The t test also tells you how significant the differences are.

There are two types of t-test:

- **One sampled t-test**
- **Two sampled t-test**

**One sample t-test**: The One Sample t Test determines whether the sample mean is statistically different from a known or hypothesized population mean. 

Let's create some dummy age data for the population of voters in the entire country (Senegal) and a sample of voters in Dakar and test the whether the average age of voters Dakar differs from the population:

In [1]:
%matplotlib inline
import statsmodels.api as sm
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
np.random.seed(6)

population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))

Dakar_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
Dakar_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
Dakar_ages = np.concatenate((Dakar_ages1, Dakar_ages2))

print( population_ages.mean() )
print( Dakar_ages.mean() )

43.000112
39.26


Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis that the sample comes from the same distribution as the population. To conduct a one sample t-test, we can the _**stats.ttest_1samp()**_ function:

In [3]:
stats.ttest_1samp(a= Dakar_ages,               # Sample data
                 popmean= population_ages.mean())  # Pop mean

Ttest_1sampResult(statistic=-2.5742714883655027, pvalue=0.013118685425061678)

As the p value = 0.013 < 0.05, we can reject the null hypothesis

**Two-Sample T-Test**

A two-sample t-test investigates whether the means of two independent data samples differ from one another. In a two-sample test, the null hypothesis is that the means of both groups are the same. Unlike the one sample-test where we test against a known population parameter, the two sample test only involves sample means. You can conduct a two-sample t-test by passing with the _**stats.ttest_ind()**_ function.

#### Difference in Population mean

Let's generate a sample of voter age data for Kaolack and test it against the sample we made earlier:

In [4]:
np.random.seed(12)
Kaolack_ages1 = stats.poisson.rvs(loc=18, mu=33, size=30)
Kaolack_ages2 = stats.poisson.rvs(loc=18, mu=13, size=20)
Kaolack_ages = np.concatenate((Kaolack_ages1, Kaolack_ages2))

print( Kaolack_ages.mean() )

42.8


In [5]:
stats.ttest_ind(a= Dakar_ages,
                b= Kaolack_ages,
                equal_var=False)  

Ttest_indResult(statistic=-1.7083870793286842, pvalue=0.09073104343957748)

 If we were using a 95% confidence level we would fail to reject the null hypothesis, since the p-value is greater than the corresponding significance level of 5%.

### Difference in Population Proportions

#### Research Question

Is there a significant difference between the population proportions of parents of black children and parents of Hispanic children who report that their child has had some swimming lessons?
  
**Parameter of Interest**: p1 - p2, where p1 = black and p2 = hispanic  

**Null Hypothesis:** p1 - p2 = 0  
**Alternative Hypthosis:** p1 - p2 $\neq$ = 0  

**Data**: 247 Parents of Black Children. 36.8% of parents report that their child has had some swimming lessons. 
<br>308 Parents of Hispanic Children. 38.9% of parents report that their child has had some swimming lessons.

##### Use of `ttest_ind()` from `statsmodels`
Difference in population proportion needs t-test. Also, the population follow a binomial distribution here. We can just pass on the two population quantities with the appropriate binomial distribution parameters to the t-test function.

The function returns three values: (a) test statisic, (b) p-value of the t-test, and (c) degrees of freedom used in the t-test.

In [6]:
n1 = 247
p1 = .37

n2 = 308
p2 = .39

population1 = np.random.binomial(1, p1, n1)
population2 = np.random.binomial(1, p2, n2)

In [7]:
sm.stats.ttest_ind(population1, population2)

(-1.9078478143206699, 0.05692853139184596, 553.0)

#### Conclusion of the hypothesis test
Since the p-value = 0.68 > 0.05, we cannot reject the Null hypothesis in this case i.e. the difference in the population proportions are not statistically significant.

#### But what happens if we could survey much higher number of people?
We do not chnage the proportions, just the number of survey participants in the two population. The slight difference in the proportion could become statistically significant in this situation. There is no guarantee that when you run the code, you will get a p-value < 0.05 all the time as the samples are randomly generated each itme. But if you run it a few times, you will notice some p-values < 0.05 for sure.

In [8]:
n1 = 5000
p1 = .37

n2 = 5000
p2 = .39

population1 = np.random.binomial(1, p1, n1)
population2 = np.random.binomial(1, p2, n2)

In [9]:
sm.stats.ttest_ind(population1, population2)

(-3.1631433063323318, 0.0015654257479124668, 9998.0)

The p-value is less than 0.05, we reject the null hypothesis

### Z-test

A z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The test statistic is assumed to have a normal distribution, and nuisance parameters such as standard deviation should be known in order for an accurate z-test to be performed.

#### One sampled z-test

The one sampled z-test  is used to test wether the mean of the population is greater than, less than or not equal to a specific value.

### One Population Mean

#### Research Question 

Let's say a cartwheeling competition was organized for some adults. The data looks like following,

(80.57, 98.96, 85.28, 83.83, 69.94, 89.59, 91.09, 66.25, 91.21, 82.7 , 73.54, 81.99, 54.01, 82.89, 75.88, 98.32, 107.2 , 85.53, 79.08, 84.3 , 89.32, 86.35, 78.98, 92.26, 87.01)

Is the average cartwheel distance (in inches) for adults more than 80 inches?

**Population**: All adults  
**Parameter of Interest**: $\mu$, population mean cartwheel distance.

**Null Hypothesis:** $\mu$ = 80 
<br>**Alternative Hypthosis**: $\mu$ > 80

**Data**:
<br>25 adult participants. 
<br>$\mu = 83.84$
<br>$\sigma = 10.72$

In [10]:
cwdata = np.array([80.57, 98.96, 85.28, 83.83, 69.94, 89.59, 91.09, 66.25, 91.21, 82.7 , 73.54, 81.99, 54.01, 
                 82.89, 75.88, 98.32, 107.2 , 85.53, 79.08, 84.3 , 89.32, 86.35, 78.98, 92.26, 87.01])

In [11]:
n = len(cwdata)
mean = cwdata.mean()
sd = cwdata.std()
(n, mean, sd)

(25, 83.84320000000001, 10.716018932420752)

In [12]:
sm.stats.ztest(cwdata, value = 80, alternative = "larger")

(1.756973189172546, 0.039461189601168366)

### Conclusion of the hypothesis test
Since the p-value  (0.0394) is lower than the standard confidence level 0.05, we can reject the Null hypothesis that the mean cartwheel distance for adults (a population quantity) is equal to 80 inches.

### Difference in Population Means

#### Research Question 

Considering adults in the [NHANES data](https://www.cdc.gov/nchs/nhanes/index.htm), do males have a significantly higher mean [Body Mass Index](https://www.cdc.gov/healthyweight/assessing/bmi/index.html) than females?

**Population**: Adults in the NHANES data.  
**Parameter of Interest**: $\mu_1 - \mu_2$, Body Mass Index.  

**Null Hypothesis:** $\mu_1 = \mu_2$  
**Alternative Hypthosis:** $\mu_1 \neq \mu_2$

**Data:**

2976 Females 
$\mu_1 = 29.94$  
$\sigma_1 = 7.75$  

2759 Male Adults  
$\mu_2 = 28.78$  
$\sigma_2 = 6.25$  

$\mu_1 - \mu_2 = 1.16$

In [13]:
url = "https://raw.githubusercontent.com/kshedden/statswpy/master/NHANES/merged/nhanes_2015_2016.csv"
da = pd.read_csv(url)
da.head()

URLError: <urlopen error [Errno 11001] getaddrinfo failed>

In [None]:
females = da[da["RIAGENDR"] == 2]
male = da[da["RIAGENDR"] == 1]

In [14]:
n1 = len(females)
mu1 = females["BMXBMI"].mean()
sd1 = females["BMXBMI"].std()

(n1, mu1, sd1)

NameError: name 'females' is not defined

In [51]:
n2 = len(male)
mu2 = male["BMXBMI"].mean()
sd2 = male["BMXBMI"].std()

(n2, mu2, sd2)

(2759, 28.778072111846985, 6.252567616801485)

In [52]:
sm.stats.ztest(females["BMXBMI"].dropna(), male["BMXBMI"].dropna(),alternative='two-sided')

(6.1755933531383205, 6.591544431126401e-10)

### Conclusion of the hypothesis test
Since the p-value  (6.59e-10) is extremely small, we can reject the Null hypothesis that the mean BMI of males is same as that of females. Note, we used `alternative="two-sided"` in the z-test because here we are checking for inequality.

## References

https://www.datavedas.com/inferential-statistics-in-python/

http://hamelg.blogspot.com/2015/11/python-for-data-analysis-part-24.html

https://towardsdatascience.com/statistical-inference-in-pyhton-using-pandas-numpy-part-i-c2ac0320dffe

https://github.com/tirthajyoti/Stats-Maths-with-Python/blob/master/Resources/Introduction%20to%20Hypothesis%20Testing.pdf

https://machinelearningmastery.com/how-to-code-the-students-t-test-from-scratch-in-python/