# $\chi^2$ hypothesis test

## Overview

In this section we introduce the <a href="https://en.wikipedia.org/wiki/Chi-squared_test">$\chi^2$ hypothesis test</a>. The are various applications of this test such as [2]

- Testing whether the data belong to a particular distribution
- Testing a family of distributions
- Testing independence of two factors

The $\chi^2$ test is based on observed and expected counts of categories. 
Therefore, it will  always be a one-sided, right-tail test. This is because only the low values of $\chi^2$ show that the observed counts are close to what we expect them to be under the null hypotheses.
On the contrary, large $\chi^2$ occurs when the observed counts are far from the expected counts [2].

## The $\chi^2$ hypothesis test

The $\chi^2$ test is based on the following statistic 

$$\chi^2 = \sum_{i=1}^N \frac{Obs\left(i\right)-E\left[i\right]}{E\left[i\right]}$$

where $N$ is the number of categories (not the sample size), $Obs(i)$ is the observed number of sampling units in category $i$ and $E\left[i\right]$ is the expected number of sampling units in category $i$ if $H_0$ is true i.e.

$$E\left[i\right] = E\left[Obs(i) | H_0\right]$$

Because only the low values of $\chi^2$ show that the observed counts are close to what we expect them to be under the null hypotheses, the $\chi^2$ test is always a one-sided, right tail test. 
On the contrary, large $\chi^2$ occurs when $Obs$ are far from
$Exp$, which shows inconsistency of the data and the null hypothesis and does not support $H_0$ .

A level $\alpha$ rejection region is given by [2]

$$R = [\chi_{\alpha}^2, +\infty)$$

the $p-$value is always calculated as the following probability

$$p = P(\chi^2 \geq \chi_{obs}^2)$$

## Example

Let's use Python to perform a $\chi^2$ test. The example below is taken for the official <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html">scipy.stats</a>.

In [1]:
from scipy.stats import chisquare

In [2]:
# observed frequencies in each category
f_obs=[16, 18, 16, 14, 12, 12]

# expected frequencies in each category
f_exp=[16, 16, 16, 16, 16, 8]

chisquare(f_obs, f_exp)

Power_divergenceResult(statistic=3.5, pvalue=0.6233876277495822)

The p-value is the probability that

$$p = P(\chi^2 \geq 3.5) = 0.623$$

## Summary

In this section we introduced the $\chi^2$ hypothesis test. This is a one-sided  right-tail test. The test statistic is 

$$\chi^2 = \sum_{i=1}^N \frac{Obs\left(i\right)-E\left[i\right]}{E\left[i\right]}$$

Chi-square tests is a technique that is based on counts; we compare the counts expected under $H_0$ with the observed through the chi-square statistic.
This way we are able to test for the goodness of fit and for the independence of two factors. Furthermore, contingency tables can 
be used for the detection of significant relations between categorical variables [2].

In the next section we will review some of the applications of the $\chi^2$ test.

## References

1. <a href="https://en.wikipedia.org/wiki/Chi-squared_test">Chi-squared test</a>
2. Michael Baron, _Probability and statistics for computer scientists_, 2nd Edition, CRC Press