# T-Test Guide

![](https://i.ytimg.com/vi/yoVnu7ZCXcI/maxresdefault.jpg)

__t-test__ are used during a hypothesis test to evaluate one or two sample means. We typically use t-test to compare:
* a sample mean to a hypothezied value
* the means of two independent samples, or
* the difference between paired samples 


There are three main t-tools:
1. one-sample t-test
2. two-sample t-test
3. paired t-test

## One Sample t-test (example 2)

We use a one sample T-test to determine whether our sample mean (observed average) is statistically significantly different to the population mean (expected average).

this is cacluated as: 
$$t = \dfrac{\bar{x} – \mu}{SE}$$

Where $\bar{x}$ is the `sample mean` and $\mu$ is the `population mean` and the `standard error` is:
$$SE = \dfrac{s}{\sqrt{n}}$$



Let’s say we want to calculate the resting systolic blood pressure of 20 first-year resident female doctors and compare it to the general public population mean of 120 mmHg.

The null hypothesis is that there is no significant difference between the blood pressure of the resident female doctors and the general population.

* H0: mu = 120
* Ha: mu != 120

Source: [Comparative Statistics in python using scipy](http://benalexkeen.com/comparative-statistics-in-python-using-scipy/)

In [1]:
from scipy import stats

female_doctor_bps = [128, 127, 118, 115, 144, 142, 133, 140, 132, 131, 
                     111, 132, 149, 122, 139, 119, 136, 129, 126, 128]

stats.ttest_1samp(female_doctor_bps, 120)

Ttest_1sampResult(statistic=4.512403659336718, pvalue=0.00023838063630967753)

Using the following properties we can calucate the `p-value`:
* t-statistics value is 4.512
* degrees of freedom (n-1) is 19

The `p-value` is 0.0002 which is far less than the standard threshild of 0.05 or 0.01, therefoere we `reject the null hypotehsis` and we can say there is statistically significant difference between the resting systolic blood preasure of the resident female doctors and the general population

## Two Sample t-test: Compare two means


A two sample T-test is used to compare the means of two separate samples.

It is calculated as follows (assuming equal variances):

$$t = \dfrac{(\bar{x}_1 + \bar{x}_2)}{\sqrt{s_p^2(\dfrac{1}{n_1} + \dfrac{1}{n_2})}}$$

Where $s_p^2$ is the pooled variance, calculated as follows:

$$s_p^2 = \dfrac{(n_1 – 1)s_1^2 + (n_2 – 2)s_2^2}{n_1 + n_2 – 2}$$

Our degrees of freedom in this case is  $n_1 + n_2 – 2$

Continuing from out resting systolic bloodpresure example, lets compare the blood preasure of male consultant doctors with the jounior resident female doctors we explored above. 

Our null hypothesis in this case is that there is no statistically significant difference in the mean of the male consulting doctors and jonior resident femail doors. 

In [2]:
female_doctor_bps = [128, 127, 118, 115, 144, 142, 133, 140, 132, 131, 
                     111, 132, 149, 122, 139, 119, 136, 129, 126, 128]

male_consultant_bps = [118, 115, 112, 120, 124, 130, 123, 110, 120, 121,
                      123, 125, 129, 130, 112, 117, 119, 120, 123, 128]

stats.ttest_ind(female_doctor_bps, male_consultant_bps)

Ttest_indResult(statistic=3.5143256412718564, pvalue=0.0011571376404026158)

So the interpretation of these results is as follows:
* `t-statistic` value is 3.514 along our degrees of freedom 38 which can be used to calculate a p-value.
* `p-value` is 0.0012, which again is below the standard threshold of 0.05 and 0.01, so we `reject the null hypothesis` and we can say there is a statistically significant difference between the resting systolic blood preasure and the resistent female doctors and the male consultant

## Paired T-Test

The prior examples, the samples have been independent of one another. Perhaps we want ot compare two related samples, such as _before and after test_, then we would use a __paired T-Test__. 

Calculation:

$$t = \dfrac{\bar{d}}{s / \sqrt{n}}$$

Where $\bar{d}$ is the average difference between the paired samples. The degrees of freedom is $n-1$ 


#### Example
In this example we will measure the amount of sleep got by patients before and after taking soporific drugs to help them sleep. 

The null hypothesis is that the soporific drug has no effect on the sleep duration of the patients. 

In [3]:
control = [8.0, 7.1, 6.5, 6.7, 7.2, 5.4, 4.7, 8.1, 6.3, 4.8]
treatment = [9.9, 7.9, 7.6, 6.8, 7.1, 9.9, 10.5, 9.7, 10.9, 8.2]

stats.ttest_rel(control, treatment)

Ttest_relResult(statistic=-3.6244859951782136, pvalue=0.0055329408161001415)

The t-statistic value is -3.62 along the degrees of freedom 9, which can be used to calculate the p-value.

The `p-value` is 0.005, which again is below the standard thresholds of 0.05 or 0.01 so we `reject the null hypothesis` and we can say there is a statistically significant difference in sleep duration caused by the soporific drug. 