In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# The t-test

## What is it

The t-test tests if two datasets are significantly different, that is, if their means are, the null hypothesis being that they are not. The test statistic used to determine this is distributed according to [Student's t distribution](../distributions/famous-distributions.ipynb#Student's-t) under the null hypothesis, which means it'd follow a normal distribution if the sample size were bigger.

<img src="../../imgs/ttest.jpg" width="300" align="left"/>

The t-test evaluates the difference between the means of the distributions with respect to their spread (variability). In the figure, the distributions have the same means difference but very different variabilities. 

<img src="../../imgs/ttest-var.jpg" width="300" align="right"/>

It was published by W S Gosset, known as Student, in Biometrika in 1908 [[1]](#1).

A typical application is in medicine, to test whether a treatment is effective or not. 

## How does it work

Given two sets of data indicated by indices $1$ and $2$, the $t$ statistics is calculated as

$$
t = \frac{\bar{x_1} - \bar{x_2}}{s_{\bar{x_1} - \bar{x_2}}} \ ,
$$

where the bar indicates the mean of the distributions of the sets and $s$ is the standard error of the difference of such means:

$$
s_{\bar{x_1} - \bar{x_2}} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \ ,
$$

where $s_i$ is the unbiased estimator of the sample variance %TODO claarify and refer to S/SD
and $n_i$ is the number of points in the sample.

The t statistics, so calculated, has to be checked against the table of values of the distribution of the Student's t to get the p value %TODO refer
so that if said p value falls below the chosen threshold for significance, the null hyphotesis gets rejected.

%TODO the one described above is the two-samples
In the \textit{one-sample t-test}, we test the null hyphotesis that the population mean is equal to a specified value $\mu_0$. In this case the t statistics to use is

$$
t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \ ,
$$

with $s$ being the stnardard deviation of the sample and $n$ the sample size.

In the \textit{paired t-test}, we compare two population means where we have two samples and the observations in them are paired.

For example, we have observations before and after doing something on the same individual (students' results before and after a course or two medical treatment results on the same individual). The observations are then not independent so a 2-sample t-test is not appropriate.

A paired t-test is performed by testing the difference of the two measurements in a 1-sample t-test, so the difference of pairs does not follow a symmetric distribution around 0.

Steps are

1. $\forall i$, we calculate $|x_{1, i} - x_{2, i}|$ and $sng(x_{1, i} - x_{2, i})$
2. we esclude pairs with such difference being 0, so we have the reduced sample size $N_r$
3. Order the $N_r$ pairs by the absolute differences ascending 
4. Rank the pairs so that the smallest gets rank 1, ties are ranked with rank equal to the average of the ranks spanned
5. Calculate the test statistics $w = \sum_{i=1}^{N_r} sgn(x_{1, i} - x_{2, i}) R_i$ where $R_i$ is the rank of the pair
6. Under the null hyphotesis $H_0$, $w$ follows a specific distribution (there is no simple expression) with expected value 0 and variance $\frac{N_r (N_r + 1)(2 N_r + 1)}{6}$, so $w$ can be compared to table values and $H_0$ gets rejected if $|w| \geq W_{critical, N_r}$
7. As $N_r$ increases, the distribution of $w$ converges to a gaussian, thus a z-score can be calculated as $z = \frac{w}{\sigma_w}$, where $\sigma_w$ is the standard deviation, so if $|z| \geq z_{critical}$ we reject $H_0$ %TODO refer to z score

## References

1. <a name="1"></a> Student, [The probable error of a mean](http://seismo.berkeley.edu/~kirchner/eps_120/Odds_n_ends/Students_original_paper.pdf), *Biometrika*, 6:1, 1908