# Introduction to hypothesis testing
Remember that :
* the distribution of sample means $\bar{x}$ based on samples of size $n$ follows a $t$-distribution with a mean $\mu$, a standard deviation $\frac{s}{\sqrt{n}}$ and $n - 1$ degrees of freedom.
* $t = \frac{\bar{x} - \mu}{s / \sqrt{n}}$ is from a $t$-distribution with a mean of 0, variance of 1 and $n - 1$ degrees of freedom.

## The single-sample t-test
This is used to test whether the mean $\mu$ of a normally distributed population is at a specific value $\mu_0$. The null hypothesis is that the population mean is $\mu_0$ : $H_0: \mu = \mu_0$. The alternative hypothesis is that $\mu$ is not equal to $\mu_0$ : $H_1: \mu \neq \mu_0$.

To test the null hypothesis, a random sample is taken from the population and the sample mean $\bar{x}$ and the sample standard deviation $s$ are calculated. The value of $t$ is calculated with the above formula. There is a 95 % chance that $t$ lies between values calculated with the percent point function : $PPF(0.025)$ and $PPF(0.975)$ with $n-1$ degrees of freedom. For a sample of size 10, $PPF(0.025)=-2.262$ and $PPF(0.975)=2.262$ so we reject the null hypothesis if $t < -2.262$ or if $t > 2.262$. This is an example of a **two-tailed test** because before collecting the data we do not know if $\bar{x}$ will be greater or less than $\mu_0$. To carry out a **one-tailed test** $t$ would be compared to $PPF(0.05)$ or $PPF(0.95)$. One could also directly calculate $PPF(t)$ and this would give the probability of observing a larger $t$-value than the one obtained if the null hypothesis is true. This probability is called the **p-value**.

The following shows an example of a two-tailed single-sample $t$-test. The mean yield of the reference oat variety is 2.0 t/ha. A new variety is tested and yields obtained from a sample of 6 plots are calculated.

In [1]:
import numpy as np
from scipy.stats import t
from math import sqrt

In [2]:
#reference yield
mu = 2.0

#sample of new varitey yields
sample = np.random.normal(loc=2.3, scale=0.26, size=6)
n = len(sample)
x = np.mean(sample)
s = np.std(sample, ddof=1) #ddof is delta degrees of freedom so that df = n - ddof

#t associated with the sample and corresponding probability
t_value = (x - mu) / (s / sqrt(n))
prob = t.cdf(t_value, df=n-1)

#critical values of the t-distribution
crit = [t.ppf(0.025, df=n-1), t.ppf(0.975, df=n-1)]

print("The mean of the sample is : {:.3f}".format(x))
print("The standard deviation of the sample is : {:.3f}".format(s))
print("The t value associated with the sample is : {:.3f}".format(t_value))
print("The critical values of the t-distribution for a 95 % confidence interval are {0:.3f} and {1:.3f}.".format(crit[0], crit[1]))

if crit[0] < t_value < crit[1]:
    print("\nWe cannot reject the null hypothesis with a 5 % level of significance.")
else:
    print("\nWe can reject the null hypothesis with a 5 % level of significance.")

print("The actual probability of rejecting the null hypothesis if it is true is {:.2f} %.".format((1 - prob) * 2 * 100))

The mean of the sample is : 2.419
The standard deviation of the sample is : 0.258
The t value associated with the sample is : 3.983
The critical values of the t-distribution for a 95 % confidence interval are -2.571 and 2.571.

We can reject the null hypothesis with a 5 % level of significance.
The actual probability of rejecting the null hypothesis if it is true is 1.05 %.


## Estimation versus hypothesis testing
If a test rejects the null hypothesis, it is not informative if a big difference is expected. Calculation of confidence intervals conveys more information because it gives an indication of the size of the difference. The 95 % confidence interval for the sample above is obtained by adding and subtracting the critical values multiplied by the standard error.

In [3]:
ci = [x + value * s / sqrt(n) for value in crit] 
print("The sample mean is between {0:.3f} and {1:.3f}.".format(ci[0], ci[1]))

The sample mean is between 2.149 and 2.690.


## Paired sample $t$-test
This test can be used to compare pairs of similar samples. For example, one could plant two varieties in adjacent plots and measure their yields. If the two varieties give equal population yields (null hypothesis), you would expect the mean of the within pair difference to be zero, so the paired $t$-test is a single sample $t$-test on the differences.