In [1]:
import numpy as np
import pandas as pd
from scipy import stats

## Paired T-test

There is a hypothesis that attending a learning module changes exam results (increases or decreases).

In [2]:
data = [[1, 18, 22],
        [2, 21, 25],
        [3, 16, 17],
        [4, 22, 24],
        [5, 19, 16],
        [6, 24, 29],
        [7, 17, 20],
        [8, 21, 23],
        [9, 23, 19],
        [10, 18, 20],
        [11, 14, 15],
        [12, 16, 15],
        [13, 16, 18],
        [14, 19, 26],
        [15, 18, 18],
        [16, 20, 24],
        [17, 12, 18],
        [18, 22, 25],
        [19, 15, 19],
        [20, 17, 16]
]

data = pd.DataFrame(data, columns =['student', 'score_before', 'score_after'])
data['diff'] = data['score_after'] - data['score_before']
data

Unnamed: 0,student,score_before,score_after,diff
0,1,18,22,4
1,2,21,25,4
2,3,16,17,1
3,4,22,24,2
4,5,19,16,-3
5,6,24,29,5
6,7,17,20,3
7,8,21,23,2
8,9,23,19,-4
9,10,18,20,2


The null hypothesis is that the learning module does not change the exam results
(distribution of the score differences is zero).

$H_0$: $\mu_0 = 0$

$H_1$: $\mu_0 \neq 0$ (two-tailed test)

The test statistics is defined as

$$
t = \frac{\overline{X}_d - \mu_0}{\frac{s_d}{\sqrt{n}}}
$$

where $\overline{X}_d$ and $s_d$ is the mean and
standard deviation of the sample score differences.
It follows the student's t distribution under the null hypothesis.

In [3]:
def paired_t_test(diffs, mu_0=0):
    diff_std = np.std(diffs, ddof=1)
    diff_mean = np.mean(diffs)
    n = len(diffs)

    return (diff_mean - mu_0) / (diff_std / np.sqrt(n))

In [4]:
t = paired_t_test(data['diff'])
print(t)

3.2312526655803127


The computed statistics is

$$
t = 3.231
$$

For the two tailed test, we can compute the test statistics,
using the student's t CDF.

In [19]:
p = 2 * (1 - stats.t.cdf(t, 19))
print(p)

0.004394965993185673


In [17]:
# just to make sure
stats.ttest_rel(data['score_before'], data['score_after'], alternative='two-sided')

Ttest_relResult(statistic=-3.231252665580312, pvalue=0.004394965993185664)

We reject the null hypothesis at $\alpha = 0.05$ significance level
as $p < \alpha$.

Let's now consider a right-tailed test where he $H_1$
is that the module __improves__ the results.

$H_0$: $\mu_0 = 0$

$H_1$: $\mu_0 > 0$ (right-tailed test)

In [32]:
p = 1 - stats.t.cdf(t, 19)
print(p)

0.0021974829965928366


In [31]:
stats.ttest_rel(data['score_after'], data['score_before'], alternative='greater')

Ttest_relResult(statistic=3.231252665580312, pvalue=0.002197482996592832)

We again reject the null hypothesis at $\alpha = 0.05$ significance level
as $p < \alpha$.

## T-test

We have two sets of measurements. The null hypothesis is that both
sets come from populations with equal means.
The indivudual measurements are not paired between
the two sets as in the paired test. The sets can contain different number
of measurements.

In [35]:
X_a = [30.02, 29.99, 30.11, 29.97, 30.01, 29.99]
X_b = [29.89, 29.93, 29.72, 29.98, 30.02, 29.98]

$H_0$: $\mu_a = \mu_b$

$H_0$: $\mu_a \neq \mu_b$ (two-sided test)

The test statistics is

$$
t = 1.959
$$

When we assume that the two sets have the same population variance,
the number of degrees of freedom of the t distribution is

$$
d = 10
$$

In [48]:
p = 2 * (1 - stats.t.cdf(1.959, 10))
print(p)

0.07856653118709644


In [40]:
stats.ttest_ind(X_a, X_b, alternative='two-sided', equal_var=True)

Ttest_indResult(statistic=1.9590058081081436, pvalue=0.07856577385723071)

When we assume that the two sets have the unequal population variance,
the number of degrees of freedem of the t distribution is

$$
d \approx 7.031
$$

In [49]:
p = 2 * (1 - stats.t.cdf(1.959, 7.031))
print(p)

0.09077152899736807


In [50]:
stats.ttest_ind(X_a, X_b, alternative='two-sided', equal_var=False)

Ttest_indResult(statistic=1.9590058081081434, pvalue=0.09077332428566114)

We fail reject the null hypothesis at $\alpha = 0.05$ significance level
as $p < \alpha$.