## Some Basic Concepts

|  | $H_0$ is true | $H_0$ is false | 
|-----------|--------------------------|-----------------------|
| Do not reject $H_0$ |    | Type-II Error |
| Reject $H_0$ | Type-I Error |  |

- $\alpha = P(\text{Type I Error})$ = level of significance = size of the test
- $\beta = P(\text{Type II Error})$ = 1 - power of the test
- $\alpha$ can be reduced by setting more restrictive critical region, at the expense of increasing $\beta$ (intuitively, making the test more forgiving). In order for both $\alpha$ and $\beta$ to decrease, the only way is to increase the sample size.
- Wiki's definition of $p$-value: the $p$-value is the probability that under a given statistical model (or under the null hypothesis), the **test statistics** is equal or more extreme than the observed results. **Under the null hypothesis, the p-value is itself a random variable, whose distribution is uniform between 0 and 1**.

## Likelihood Ratio Test

Following is from [wikipedia](https://en.wikipedia.org/wiki/Likelihood-ratio_test). The LR test is applicable for a wide range of applications concerning MLEs, but in general would require the **competing models to be nested**.
![image.png](attachment:image.png)

## Single Sample: Tests Concerning a Single Mean

Note that the following two tests implicitly assumes the sample size is large enough so that the CLT applies, and sample mean is roughly normal, and sample variance is an independent $\chi^2$. In fact, people may go as far as substituting $\sigma$ with the sample variance $s$ in the first test, even if variance is unknown, and still perform the Z-statistic test. This approximation is based on consistency of $s$ to $\sigma$ and thus $s\approx\sigma$ when sample size is large.

![mean-0.PNG](attachment:mean-0.PNG)

![mean-1.PNG](attachment:mean-1.PNG)

## Two Samples: Tests on Two Means

Tests in this section, as similar to the case above, invoke CLT and certain consistency results. We only show the cases where variance (equal or not) is unknown, as known variances reduce these t-tests to Z-tests, as shown above.

![pool-mean.PNG](attachment:pool-mean.PNG)

![pool-unequal-mean.PNG](attachment:pool-unequal-mean.PNG)

A special case with two samples is with **paired observations**, where the hypothesis test is done on the difference of paired observations, reducing the problem back to single-sample case.

## Visualization of Comparing Means

Besides formal hypothesis testing, a graphical way to compare means is by plotting the **box plots side-by-side**.

## One-Sample: Test on Single Proportion

![proportion-0.PNG](attachment:proportion-0.PNG)

## Two Samples: Test on Two Proportions

This test is based on the normal approximation. It then makes the further approximation that when $H_0$ is true, the calculation of the variance can be pooled.

![proportion-1.PNG](attachment:proportion-1.PNG)

![proportion-2.PNG](attachment:proportion-2.PNG)

## One- and Two-Sample Tests Concerning Variances

Under the assumption that the underlying population follows a **normal distribution**, then the sample variance follows a $\chi^2$ distribution. Hence the hypothesis testing for $\sigma^2=\sigma_0^2$ is given by
\begin{align}
\chi^2 = \frac{(n-1)s^2}{\sigma_0^2},
\end{align}
where $n$ is the sample size, $s^2$ is the sample variance, and $\sigma_0^2$ is the value given by the null hypothesis. When $H_0$ is true, $\chi^2$ is a value of the chi-square distribution with $\nu=n-1$.

To test the equality of the variances $\sigma_1^2$ and $\sigma_2^2$ of two populations, when the two population are **independent and approximately normally distributed**. Then the $f$-value for testing $\sigma_1^2=\sigma_2^2$ is the ratio
\begin{align}
f=\frac{s_1^2}{s_2^2},
\end{align}
where $s_1^2$ and $s_2^2$ are the sample variances computed from the two samples. When the null hypothesis is true, then the $f$ ratio is a value of $F$-distribution with $\nu_1=n_1-1$ and $\nu_2=n_2-1$.

Note that the $\chi^2$-test and $F$-test above on a single variance is **very sensitive to normality**.

## Tests for the Equality of Several Variances

Suppose we have $k$ normal samples, preferably of equal size. And we want to test the equality of their variances:
\begin{align}
H_0: \sigma_1^2=\sigma_2^2=\dots=\sigma_k^2
\end{align}

![bartlett-1.PNG](attachment:bartlett-1.PNG)

![bartlett-2.PNG](attachment:bartlett-2.PNG)

## Goodness-of-Fit Test: Application to test Independence and Homogeneity

![goodness-of-fit.PNG](attachment:goodness-of-fit.PNG)

The above procedure can be used to test for **independence** by way of **contingency table**.

![independence.PNG](attachment:independence.PNG)

![independence-2.PNG](attachment:independence-2.PNG)

The above independence test assumes a large sample size, which leads to a limiting distribution of chi-square as an approximation. In small sample, maybe the **Fisher's exact test** is more appropriate. In the following example taken from Wikipedia, say we want to test whether men/women is independent of studying/not-studying. If it is indeed independent, then it is like there are $n$ balls, and we draw $a+c$ balls without replacement, where $a$ is black - since the draws do not affect whether the ball is black. As such, the test statistic of $p$ below follows a hypergeometric distribution, which is exact.

![image.png](attachment:image.png)

The above procedure can also be used to test for **homogeneity** for equality of multiple proportions.

![homogeneity-1.PNG](attachment:homogeneity-1.PNG)

![homogeneity-2.PNG](attachment:homogeneity-2.PNG)

The above is very similar, but with subtle difference conceptually, with the **McNemar's test**, to see if two classifications in ML yield roughly the same marginal distribution

![McNemar.PNG](attachment:McNemar.PNG)

McNemar's test can be used to compare machine learning models; see [this notebook](../machine-learning/meta-learning/evaluation-metrics-and-information-criterions.ipynb).

## Testing for Skewness and Kurtosis against Gaussian: the Jarque-Bera Test

The Jarque–Bera test is a goodness-of-fit test of **whether sample data have the skewness and kurtosis matching a Gaussian distribution**. The test statistic is given by 
\begin{align}
JB = \frac{\hat{S}^2}{6/n} + \frac{(\hat{K}-3)}{24/n}.
\end{align}
Under the null hypothesis, the above will have an asymptotic $\chi^2$ distribution with a degree of freedom 2. Here $\hat{S}$ and $\hat{K}$ are the sample skewness and kurtosis, where their population counterparts are defined as $S=E\left(\frac{(X-\mu)^3}{\sigma^3}\right)$ and $K=E\left(\frac{(X-\mu)^4}{\sigma^4}\right)$.

## Time Series Tests

See [time-series](time-series-models.ipynb) notebook

## Reference

- [< Probability and Statistics for Engineers and Scientists >](https://www.evernote.com/shard/s191/nl/21353936/00cb1a0b-3a5b-4aa4-95c0-7fc29aa3a032?title=Probability%20and%20Statistics%20for%20Engineers%20and%20Scientists%20Global%20Edition%209ed), Global Edition, Chapters 10.
- [Wikipedia: the McNemar's Test](https://en.wikipedia.org/wiki/McNemar%27s_test) 
- Steven Kou's formula sheet