# Hypothesis Testing

In general, a hypothesis tests examines the validity of a null hypothesis versus and alternative hypothesis about some unknown parameter of interest.

For example, suppose I have a sample, $X_{1},\dots, X_{n}$ from some distribution parametrized by an unknown $\theta$, and we wish to test $$H_{0}: \theta = 0 \text{   vs.  } H_{1}: \theta \neq 0$$ 
Then, we establish some test statistic, $T(X_{1}, \dots, X_{n})$. If the observed test statistic looks unlikely to
have come from its null distribution (assuming $\theta = 0$), we reject in favor of the alternative. Otherwise, we fail to reject.

A simple hypothesis is one in which the parameter of interest can only take on one value. Above, the null hypothesis is a simple hypothesis.
A composite hypothesis is when a null or alternative hypothesis where the parameter of interest can take on more than one value. Above, the alternative was a composite hypothesis. 

## Power


When we make a decision based on this, there are two undeal situtations:

1. Reject $H_{0}$ when $H_{0}$ is true (**Type I Error**)
2. Fail to reject $H_{0}$ when the $H_{0}$ is false and should be rejected (**Type II Error**)

Fortunately, if with conduct a test with $\alpha$ significance level, the probability of committing a Type I Error is controlled by $\alpha$, $\mathbb{P}(\text{Type I Error}) \leq \alpha$.

Thus, when $\alpha$ is low enough, the probability that we reject the null when we should not is low. Now, we want the probability of committing a Type II Error to be as low as possible. The power of a test is $\text{Power} = \mathbb{P}(\text{reject } H_{0} | H_{1}) = 1 - \mathbb{P}(\text{fail to reject } H_{0}|H_{1}) = 1 - \mathbb{P}(\text{Type II Error})$; thus, we want our power to be high.


### Example

Let $X_{1}, \dots, X_{n} \sim \text{Bernoulli}(p)$, with $0 < p < 1$ unknown.

Let our hypotheses be:
$$H_{0}: p = p_{0}$$ 
$$H_{1}: p > p_{0}$$
and our test statistic under the $H_{0}$ is:
$$T(X_{1}, \dots, X_{n}) = \frac{\hat{p} - p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}$$
and $\hat{p} = \frac{1}{n}\sum_{i=1}^{n}X_{i}$
Under $H_{0}$, we can show that $T \sim N(0,1)$
What is the type I and type II error rate?

The type I error rate is $\alpha$.
The type II error rate is: $\mathbb{P}(\text{Type II error}) = \mathbb{P}(\text{fail to reject } H_{0}|H_{1})$. Under $H_{1}$, $p$ can take on any value in the interval $(p_{0}, 1]$

Then, we can write the probability of making a type II error as: $\mathbb{P}(\text{Type II Error}) = \mathbb{P}(\text{fail to reject } H_{0}|p = p_{1})$.  Since there are multiple $p_{1}$, we write the type II error as:

$$\mathbb{P}(\text{Type II Error}) = \mathbb{P}(T(X_{1},\dots,X_{n}) < z_{1-\alpha}|p=p_{1}) = \mathbb{P}\left( \frac{\hat{p} - p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}} < z_{1-\alpha}|p=p_{1}\right)$$

Under the specific alternative hypothesis $p = p_{1}$, we know that $\frac{\hat{p} - p_{1}}{\sqrt{\frac{p_{1}(1-p_{1})}{n}}} \sim N(0, 1)$

After some algebra, we can write, $\frac{\hat{p} - p_{1}}{\sqrt{\frac{p_{1}(1-p_{1})}{n}}} < \frac{(p_{0} - p_{1})\sqrt{n}}{\sqrt{p_{1}(1-p_{1})}} + z_{1-\alpha} \sqrt{\frac{p_{0}(1-p_{0})}{p_{1}(1-p_{1})}}$ and as a result,

$$\mathbb{P}(\text{Type II Error}) = \mathbb{P}\left(  \frac{\hat{p} - p_{1}}{\sqrt{\frac{p_{1}(1-p_{1})}{n}}} < \frac{(p_{0} - p_{1})\sqrt{n}}{\sqrt{p_{1}(1-p_{1})}} + z_{1-\alpha} \sqrt{\frac{p_{0}(1-p_{0})}{p_{1}(1-p_{1})}} \bigg| \thinspace p=p_{1}\right) = \Phi \left( \frac{(p_{0} - p_{1})\sqrt{n}}{\sqrt{p_{1}(1-p_{1})}} + z_{1-\alpha} \sqrt{\frac{p_{0}(1-p_{0})}{p_{1}(1-p_{1})}}  \right)$$

Then, also, power is

$$\text{power }  = 1 - \Phi \left( \frac{(p_{0} - p_{1})\sqrt{n}}{\sqrt{p_{1}(1-p_{1})}} + z_{1-\alpha} \sqrt{\frac{p_{0}(1-p_{0})}{p_{1}(1-p_{1})}}  \right)$$

# Types of tests

Notice that for the following, we can modify the p-value calculation based on the alternative hypothesis.


## Z-test


### One-sample z-test


Consider a sample $X_{1},..., X_{n}$, and we are given the population standard deviation.
We want to test for following hypothesis: $H_{0}: \mu = \mu_{0}$ and $H_{1}: \mu \neq \mu_{0}$. Then, we can perform a z-test where $Z = \frac{ {\bar X} - \mu_{0} }{ \sigma / \sqrt{n} }$ where, under the null hypothesis, follows a $N(0, 1)$.

The p-value is thus ${\mathbb P}\left( | Z |  > z_{\alpha/2} \bigg| H_{0} \right)$.




### Two-sample z-test

Consider a sample $X_{11},..., X_{1n}$ from one population and and $X_{21},..., X_{2n}$ from another population.

Use this test when we wish to determine if the means of two populations are equal and we know the population standard deviations.

Our test statistic is: $Z = \frac{ ({\bar X}_{1} - {\bar X}_{2}) - (\mu_{1} - \mu_{2} ) }{ \sqrt{ \frac{\sigma_{1}^{2}}{n_{1}} + \frac{ \sigma_{2}^{2} }{n_{2}} } }$

Under $H_{0}$, this follows a $N(0, 1)$.

The p-value is thus ${\mathbb P}\left( | Z |  > z_{\alpha/2} \bigg| H_{0} \right)$.


## T-test

Consider a sample $X_{1},..., X_{n}$, but now we do not know the population standard deviation. Since we introduce another measure of uncertainty, then we estimate $\sigma$ with $\hat \sigma$. Then, we can perform a t-test where $T = \frac{ {\bar X} - \mu_{0} }{ {\hat \sigma} / \sqrt{n} }$ where, under the null hypothesis, follows a $t_{n-2}$.


The p-value is thus ${\mathbb P}\left( | T |  > t_{\alpha/2, n-2} \bigg| H_{0} \right)$.


We can modify the p-value calculation based on the alternative hypothesis.

In order for the test statistic to have a t-distribution, the data needs to be normal, but if the data is not normal, the test statistic has an approximate t-distribution when we have a sample size of at least 30.


### Two-sample t-test (pooled variance)




### Two-sample t-test (unpooled variance)



## Likelihood Ratio Tests

In the case of simple hypotheses, we can create a likelihood ratio test between them. In other words, we can consider a simple-vs.-simple hypothesis test for parameter $\theta$: 

$H_{0}: \theta = \theta_{0}$

$H_{1}: \theta = \theta_{1}$

Then, we can write the likelihood-ratio test statistic as: $\Lambda = \frac{ {\mathcal L} ( \theta_{1} | x) }{ {\mathcal L} (\theta_{0} | x)}$. 
Here, we can think of this as comparing between two models. The likelihood-ratio test thus gives the following decision rule:  

- If $\Lambda > c$, reject $H_{0}$ 
- Otherwise, reject $H_{0}$

Notice that other sources will may consider the reciprocal of the $\Lambda$ defined here, in which case our decision rule is also switched.


 - Neyman-Pearson Lemma: Any other simple test with significance level $\alpha' \leq \alpha$ has a power less than or equal to that of the LRT. In other words, when we have simple hypotheses, the LRT is the most powerful test at that significance level among tests with simple hypotheses.


### Example

Let $X_{1},\dots, X_{n} \overset{i.i.d}{\sim} \text{Poisson}(\lambda)$ with $\lambda > 0$. Furthermore, let $H_{0}: \lambda = \lambda_{0}$ and  $H_{1}: \lambda = \lambda_{1}$. Also, assume that $\lambda_{1} > \lambda_{0}$.

Our likelihood function is, for some $\lambda$:

$\mathcal{L}(\lambda) = \prod_{i=1}^{n} \frac{e^{-\lambda}\lambda^{X_{i}}}{X_{i}!}= \frac{\exp \left\{ -n\lambda \right\} \lambda^{\sum_{i=1}^{n} X_{i}}}{ \prod_{i=1}^{n} X_{i} ! }$

Then, the likelihood-ratio test statistic is

$$\Lambda = \frac{ {\mathcal L} ( \lambda_{1} | x) }{ {\mathcal L} (\lambda_{0} | x)} = \frac{\exp \left\{ -n\lambda_{1} \right\} \lambda_{1}^{\sum_{i=1}^{n} X_{i}}}{ \prod_{i=1}^{n} X_{i} ! } \frac{ \prod_{i=1}^{n} X_{i} !  }{ \exp \left\{ -n\lambda_{0} \right\} \lambda_{0}^{\sum_{i=1}^{n} X_{i}}  } = \exp\left\{ -n (\lambda_{1} - \lambda_{0}) ) \right\} \left( \frac{\lambda_{1}}{\lambda_{0}} \right)^{\sum_{i=1}^{n} X_{i} }$$

Notice that when $\Lambda$ is high, then, this means it is much more likely for our data to come from the alternative distribution as opposed to the null distribution. Since $\lambda_{1} > \lambda_{0}$, the likelihood ratio is large when $\sum_{i=1}^{n} X_{i}$ is also large, and we want to reject when $\Lambda > b$ for some $b$. Then, we can simply consider $\sum_{i=1}^{n} X_{i} > c$ for some $c$. We know that the sum of Poisson is also Poisson with rate $n\lambda$. Then, our rejection region is: $\sum_{i=1}^{n} X_{i} > c$ where for a $\alpha$ significance level test, $c$ is the $1-\alpha$ quantile of a $Pois(n\lambda)$. Under $H_{0}$, it is $Pois(n\lambda_{0})$, and under $H_{1}$, it is $Pois(n\lambda_{1})$.




---

Note: There are different variations of the likelihood ratio test, including a generalized ratio test, which considers a set of values that a parameter can take on under the entire possible set of values and under the null hypothesis.

Under the generalized likelihood ratio test, we have the following set up: $H_{0}: \theta \in \Theta_{0}$, and our likelihood ratio test statistic is $\lambda = 2 \log\left( \frac{ \sup_{\theta \in \Theta} {\mathcal L}(\theta)  }{ \sup_{\theta \in \Theta_{0}} {\mathcal L}(\theta) } \right)=2 \log\left( \frac{{\mathcal L} }{{ \mathcal L {\hat \theta} }({\hat \theta}_{0}) } \right)$
where ${\hat \theta}$ is the MLE under the entire space and $\hat \theta_{0}$ is the MLE under the restricted space under the null.

Furthermore, there is a useful theorem about the limiting distribution of the LRT. 

Let $\theta = (\theta_{1},..., \theta_{q}, \theta_{q+1},..., \theta_{r})$ and say we want to test the null, $\Theta_{0}: (\theta_{q+1},..., \theta_{r} ) = \theta_{0, q+1},..., \theta_{0, r}$. Then $\lambda \rightarrow \chi^{2}_{r-q, \alpha}$ under $H_{0}$.


---

## When to use

We use z-tests and t-tests when we want to do a one-sample or two-sample tests, and if we know the variance, we opt for the z-test. Generally, however, we do not, so we estimate the variance and use the use a t-test to account for this added uncertainty. Likelihood ratio tests are used when we are comparing simple hypotheses and tests for the goodness-of-fit between two models. Generalized likelihood ratio tests is a general method for testing composite hypotheses.

Notice that z-tests and t-tests in particular are limited to comparing at most 2 groups. 

When we want to test for more than two groups, then we consider ANOVA, MANOVA, etc.

## ANOVA

ANOVA is used to determine whether or not there is a statistically significant difference between means of at least 2 independent groups. ANOVA is used when we are dealing with at least two categories and comparing based on a continuous outcome variable. For instance, perhaps we want to examine if at least 2 different classroom setups lead to different average exam scores. 


### Example






## $\chi^{2}$ tests


### Goodness-of-fit $\chi^{2}$ test



### $\chi^{2}$ test for Independence


### $\chi^{2}$ test for Homogeneity

## Sources

Wasserman, L. (2010). All of statistics: a concise course in statistical inference. New York: Springer. ISBN: 9781441923226 1441923225