# Student's t-testing

## Overview

The previous section introduced the Wald test for testing hypothesis of the form

\begin{equation}
H_0: \theta = \theta_0, \text{vs}~ H_{\alpha}: \theta \neq \theta_0
\end{equation}

This is the test approach to use to the test, for example, about the mean difference between two populations when the data is assumed to be normally distributed, the sample  size is large i.e. $n>30$.
However, when the sample size is small i.e. $n<30$ it is better to use the <a href="https://en.wikipedia.org/wiki/Student's_t-test">Student's t-test</a>.

## Student's t-testing

Consider testing the hypothesis


\begin{equation}
H_0: \mu = \mu_0, \text{vs}~ H_{\alpha}: \mu \neq \mu_0
\end{equation}

Further assume that the data available is Normal. For a small sample size i.e. $n<30$ we can use a $t-$test. The 
$t-$test uses the Student's $t-$statistic is defined as [1]


\begin{equation}
T = \frac{\sqrt{n}\left(\bar{x} - \mu_0\right)}{S}
\end{equation}

where $S$ is the sample standard deviation and $\bar{x}$ is the sample mean.

For large samples $T\approx N(0,1)$ under $H_0$ [1]. However, the exact distribution of $T$ under the null hypothesis is 
$t_{n-1}$ [1]. We can reject $H_0$ when 

\begin{equation}
|T| > t_{n-1,\alpha/2}
\end{equation}


----
**Remark**

When the sample size is relatively large then the $t-$test is identical to the Wald test [1].

----

#### Example

In [None]:
import numpy as np  
from scipy import stats 

In [4]:
# set seed in order to be able to reproduce
# the experiment
np.random.seed(42)

# population mean 
mu = 10

# choose the sample size 
n1 = 21

# generate random sample from normal distribution
# with the given mean and standard deviation 1.0
x = np.random.normal(mu, scale=1.0, size=n1)

# Using the Stats library, compute t-statistic and p-value
t_stat, p_val = stats.ttest_1samp(a=x, popmean = mu)


print("t-statistic = " + str(t_stat))  
print("p-value = " + str(p_val)) 


t-statistic = -0.4271006547619592
p-value = 0.6738687532323435


### Comparing the means of two samples

The $t-$ test can also be used to test the hypothesis that the means of two independent samples are identical
assuming that the populations the samples are coming from have identical variances. In particular, we want to test
the hypothesis


\begin{equation}
H_0: \mu_1 = \mu_2, \text{vs}~ H_{\alpha}: \mu_1 \neq \mu_2
\end{equation}


The test statistic then becomes


\begin{equation}
T = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{S_{1}^2}{n_1} + \frac{S_{2}^2}{n_2}}}
\end{equation}

where $n_i$ is the sample size for sample $i$ and $S_{i}^2$ is its respective variance.

#### Example

In [None]:
import numpy as np  
from scipy import stats 

In [5]:
# sample sizes
n1 = 21
n2 = 25

# normally distributed data with mean = 10.5 and var = 1  
x = np.random.normal(10.5, scale=1.0, size=n1)

# normally distributed data with mean = 9.5 and var = 1  
y = np.random.normal(9.5, scale=1.0, size=n2)

## Using the internal function from SciPy Package  
t_stat, p_val = stats.ttest_ind(x, y)  
print("t-statistic = " + str(t_stat))  
print("p-value = " + str(p_val))

t-statistic = 3.149095298366158
p-value = 0.002940881662651649


## Summary

In this section, we reviewed Student's $t-$test. This test is used when we want to test hypothesis of the form 

\begin{equation}
H_0: \mu = \mu_0, \text{vs}~ H_{\alpha}: \mu \neq \mu_0
\end{equation}

but the sample size is small. Student's $t-$test uses the following statistic

\begin{equation}
T = \frac{\sqrt{n}\left(\bar{x} - \mu_0\right)}{S}
\end{equation}

and we reject $H_0$ when 

\begin{equation}
|T| > t_{n-1,\alpha/2}
\end{equation}

The $t$-test is only applicable to two data groups. 
However, in many case we want to compare more than two groups. 
In this case, we  have to resort to other tests such as ANOVA.

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.