# Hypothesis Testing

Hypothesis testing (more formally "null hypothesis significance testing") is a central tool of statistical inference.

A formal hypothesis test involves two hypotheses, which must be stated up-front by the researcher:

**Null Hypothesis: $H_0$**

**Alternative Hypothesis: $H_a$**

The hypothesis tests that we have discussed in lecture are:

* Single population proportion test
* Difference between two population proportions test
* Single population mean test
* Difference between two population means test

In this tutorial, we will introduce how to carry out these hypothesis tests in Python using the Statsmodels package.

In [1]:
import statsmodels.api as sm
import numpy as np
import pandas as pd
import scipy.stats.distributions as dist

### Single population proportion test

_Example:_ In previous years, 52% of parents reported a belief that social media is the cause of their teenager’s lack of sleep. Do more parents today believe that their teenager’s lack of sleep is caused by social media? 

**Population**: Parents with a teenager (age 13-18)  
**Parameter of Interest**: p  
**Null Hypothesis:** p = 0.52  
**Alternative Hypthosis:** p > 0.52 (note that this is a one-sided test)

We interview 1018 parents (who constitute an IID sample from the population of interest), and 56% of these parents report that they believe that their teenager’s lack of sleep is caused by social media.

In this test, we take the null hypothesis proportion (0.52) to be a fixed, known value, not an estimate obtained from data (that therefore would be uncertain and inexact). The Statsmodels [proportions_ztest](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_ztest.html) function returns a test statistic and corresponding p-value.  This test statistic is a Z-score, meaning that the further the test statistic falls from zero, the more evidence against the null hypothesis.  In this example, the p-value is much smaller than 0.05, and in most settings this would be considered to provide strong evidence against the null hypothesis.

In [4]:
n = 1018
pnull = .52
phat = .56
sm.stats.proportions_ztest(count=phat * n,       # number of success (1018 * 56/100)
                           nobs=n,               # number of trial or observations
                           value=pnull,          # null hypothesis
                           alternative='larger', # prop > value
                           prop_var=.52)         # variance of proportion

(2.5545334262132955, 0.005316510991822442)

### Difference between two population proportions test

_Example:_ Is there a difference between the population proportions of parents of Black children and parents of Hispanic children who report that their child has ever had swimming lessons?

**Populations**: All parents of Black children age 6-18 and all parents of Hispanic children age 6-18\
**Parameter of Interest**: p1 - p2, where p1 and p2 are the Black and Hispanic proportions, respectively \
**Null Hypothesis:** p1 - p2 = 0  \
**Alternative Hypthosis:** p1 - p2 $\neq$ 0

91 out of 247 (36.8%) sampled parents of Black children report that their child has had some swimming lessons.

120 out of 308 (38.9%) sampled parents of Hispanic children report that their child has had some swimming lessons.

The Statsmodels [test_proportions_2indep](https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.test_proportions_2indep.html) function returns a test statistic and p-value. In this example, the p-value is much greater than 0.05, so there is no evidence that the parents of Black and Hispanic children report differing levels of swimming lesson participation.