# Student's paired sample t-test

In this exercise you'll need to complete the code for computing the t statistic and p-value of a Student's paired sample t-test.

$H_0: \bar{x}_A = \bar{x}_B$

$H_1: \bar{x}_A \neq \bar{x}_B$

In [None]:
pip install ipytest

In [None]:
import ipytest
import numpy as np
import pytest
from scipy import stats
from typing import List

ipytest.autoconfig()

## t statistic

$$t = \frac{\bar{x}_D}{\frac{s_D}{\sqrt[]{n}}}$$

with $\bar{x}_D$ and $s_D$ as the average and standard deviation of the differences between all pair.

In [None]:
def t_stat(a: List[float], b: List[float], n: int) -> float:
    """Computes the t statistic between two systems.
    
    Args:
      a: System A recorded metric for each topic.
      b: System B recorded metric for each topic.
      n: Size of the sample.

    Retuns:
      t statistic for t-test between two systems.
    """
    n = min(len(a), n)
    x = np.array(a[:n]) - np.array(b[:n])

    x_D = np.mean(x)
    s_D = np.sqrt(sum((x-x_D)**2) / (n-1))

    return x_D / (s_D/np.sqrt(n))

# p-value

$$\text{p-value} = P(T(X^*) \leq T(X_0) \mid H_0) + P(T(X^*) \geq T(X_0) \mid H_0)$$

Each probability composing the p-value can be computed using the cumulative distribution function (CDF) of the t-Student distribution. SciPy has an implementation of the [CDF](https://docs.scipy.org/doc/scipy-1.9.1/reference/generated/scipy.stats.t.html): `cdf(x, df, loc=0, scale=1)`.

In [None]:
def p_value(n: int, t_stat: float) -> float:
    """Computes the p-value.
    
    Args:
      n: Size of the sample.
      t_stat: t statisitic.
      
    Returns:
      p-value for t statistic.
    """
    df = n - 1
    p = (1.0 - stats.t.cdf(abs(t_stat), df)) * 2.0
    return p

## Tests

In [None]:
%%ipytest

def test_lecture_example():
    system_A = [0.2215, 0.3924, 0.654, 0.5611, 0.9186, 0.1104, 0.6086, 0.5062, 0.9688, 0.995]
    system_B = [0.0765, 0.0426, 0.5738, 0.1571, 0.9881, 0.7164, 0.7507, 0.435, 0.3959, 0.8709]
    n = len(system_A)
    t = t_stat(system_A, system_B, n)
    p = p_value(n, t)

    assert t.round(3) == 0.897
    assert p.round(3) == 0.393