# Hypothesis testing

## 0. Intro

* **Null hypothesis**: assumes that the observation is due to a chance factor.

* **Alternative hypothesis**: shows that observations are the result of a real effect.

* **Level of significance**: a probability threshold below which the null hypothesis will be rejected.

#### How to make a test:

1. State the relevant null and alternative hypotheses.
2. Pick a test statistic $T$.
3. Derive the distribution of the statistic under the null hypothesis.
4. Select a significance level $α$. Common values are $5\%$ and $1\%$.
5. Compute the observed value $t_{obs}$ of the test statistic $T$.
6. Decide to either reject $H_0$ in favor of the alternative or not reject it.

#### Errors

* __Type I error__: reject true null hypothesis.  $p(Error_I) \leq \alpha$.
* __Type II errors__: accept false null hypothesis. 

__Power__ - the probability that the test rejects $H_0$ when a specific alternative $H_1$ is true:

$$\textrm{pow} = P(\textrm{reject }H_0| H_1) = 1 - P(\textrm{accept }H_0| H_1) = 1 - P(Error_{II})$$

__How to pick a criterion__: among all correct criteria select one with the highest power.

__$p$-value__ - the probability, under the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed:

$$p = P(T \geq t | H_0)$$

__NB__: Statistical significance does not imply practical significance.

## 1. Binomial test for portion

The test can be used when we have results of a set of experiments with only 2 possible outcomes each. In this setting we can check hypotheses about the share $p$ of experiments with positive outcome.

**Sample**: $X^n = (X_1, \dots, X_n)$

**$H_0$**: $X \sim Bernoulli(p)$

**$H_1$**: $X \sim Bernoulli(q)$, where $q \lt \neq \gt p$

**Statistic**: $T(X^n) = \sum_{i=1}^{n} X_i$

**Null distribution**: $T(X^n) \sim Bin(n, \frac{1}{2})$

## 2. $\chi^2$ test

With this test you can check if the sample $X^n$ has a particular distribution $F(x)$.

**Sample**: $X^n = (X_1, \dots, X_n)$

**$H_0$**: $X \sim F(x)$

**$H_1$**: $X \nsim F(x)$

**Statistic**: $T(X^n) = \sum_{i=1}^{n} \frac{(n_i - np_i)^2}{np_i}$, $n_i$ - number of elements in the $i$-th bin, $p_i$ - probability of being in the $i$-th bin

**Null distribution**: $T(X^n) \sim \chi^2$

## 3. Relation to confidence intervals

For a hypothesis $H_0$: $\theta = \theta_0$ against $H_1$: $\theta \neq \theta_0$ p-value can be found using confidence intervals.

$$p\textrm{-value} = \max \left\{ \alpha: \theta_0 \in \textrm{CI}_{1-\alpha} \right\} $$