# Permutation Testing


## Overview

In this section we will review <a href="https://en.wikipedia.org/wiki/Permutation_test">permutation testing</a>. This is a nonparametric method for testign whether two distributions are the same.

## Permutation testing

A permutation test is an exact hypothesis test meaning that it is not based on large sample theory 
approximations [1]. A permutation test involves two or more samples. 
The null hypothesis is that all samples come from the same distribution. Specifically, assume that
$X_1, \dots, X_m \sim F_X$ and $Y_1, \dots, Y_n \sim F_Y$ then in permutation testing we test [1]

\begin{equation}
H_0: F_X = F_Y ~~ \text{versus} ~~ H_a: F_X \neq F_Y
\end{equation}

This type of testing we would consider when testing if a developed treatment differs from a placebo.


The permuation test, considers all the possible permuations of the combined data.
Specifically, let $N=m+n$ and consider all the permutations, $N!$, of the combined data
$X_1, \dots, X_m, Y_1, \dots, Y_n$. For each permutation compute the test statistic $T$. 
Under $H_0$ each of the $N!$ computated statistics is equally likely [1].
We call the distribution that puts mass $1/N!$ on each $T_j$ the permutation distribution of $T$ [1].


Consider now $t_{obs}$ to be the observed value of the test statistic. The $p-$value is [1]


\begin{equation}
p-\text{value}=P_0(T>t_{obs})=\frac{1}{N!}\sum_{j=1}^{N!}I(T_j>t_{obs})
\end{equation}

## Example 1

This example is taken from [1]. Assume the dataset $(X_1, X_2, Y_1)=(1,9,3)$. The assumed test statistic is 
the mean difference i.e. $T = |\bar{x}-\bar{y}|$. Let's use Python to form all possible permuatations. We already know these
will be $3!=6$ in total

In [8]:
import numpy as np
import itertools

The observed statistic is 

In [12]:
t_obs = abs((1+9)*0.5 - 3)
print(t_obs)

2.0


In [11]:
data = [1,9,3]
permutations = list(itertools.permutations(data))
print(permutations)

[(1, 9, 3), (1, 3, 9), (9, 1, 3), (9, 3, 1), (3, 1, 9), (3, 9, 1)]


For each permutation compute the value of $T = |\bar{x} - \bar{y}|$

In [19]:
statistics = []
for item in permutations:
    item_mean_diff = abs((item[0]+item[1])*0.5 - item[2])
    statistics.append(item_mean_diff)
    
print(statistics)

[2.0, 7.0, 2.0, 5.0, 7.0, 5.0]


Compute the $p-$value

In [23]:
p_value = sum([1 if t > t_obs else 0 for t in statistics]) / len(statistics)
print(f"Computed p-value {p_value}")

Computed p-value 0.6666666666666666


## Summary

In this section we reviewd permuation testing. This is a methodology we want to use for testing 
that two samples follow the 
same distribution. The test is exact that is it is not based on large sample theory approximations. 
The test considers all possible permuations of the combined samples i.e. $X_1, \dots, X_m, Y_1, \dots, Y_n$.
Given that it may not be prectical to compute all $N!$ permuations, we can approximate the $p-$value by sampling 
randomly from these. The fraction of times that $T_j>t_{obs}$ among these samples will approximate the $p-$value.

In large samples, the permuatation test usually will deliver similar results as a test that is based
on large sample theory [1]. Therefore, a permutation test is most useful when dealing with small samples.

## References

1. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.