# Permutation testing

Permutation testing tries to obtain the distribution of the test statistic under the null **without making any strong assumptions** about the data, as opposed the classical tests like t-test or chi-square test.

### Steps involved:

![permutation test](img/permutation_test1.jpg)

A pemutation test is a kind of **non-parametric** test, which only assumes that it is possible for all treatment groups to be very similar.

Consider data from two groups, A and B. A permutation test typically involves the following steps:
- First we detertmine the test statistic, this could be anything but usually we would like to know the difference in means of two groups,
- Next, the observations are pooled and a new dataset is generated for every possible permutation of labels in groups A and B.
- In practice, it gets restrictive to look at every permutation, so we use a random sample of possible permutations.
- As a next step, we calculate the difference in means of each datasets. This set of calculated differences is the exact distribution of the difference in means under the null hypothesis where the group labels are irrelevant.
- As a final step, we can check to see where our test statistic falls in this distribution. For instance, if the test statistic falls within the 95% of the Confidence Interval, we can say that there is no real difference between groups A and B.

You could even use this distribution to obtain a p-value if you please.  
Thus permutation tests are quite simple and intuituve.

### Advantages:

- Very flexible
- No strict assumptions
- Widely acceptable

### Disadvantages:

- Computationlally expensive
- Custom coding required

In the next thre exercises, we will code up a permutation test from scratch.

Suppose that you are in charge of a non-profit, and are testing two different web page designs for your donation web site.  
You are interested in seeing whether these two designs make a difference in donations.

## Generating a single permutation

In the next few exercises, we will run a significance test using permutation testing. As discussed in the video, we want to see if there's any difference in the donations generated by the two designs - A and B. Suppose that you have been running both the versions for a few days and have generated 500 donations on A and 700 donations on B, stored in the variables donations_A and donations_B.

We first need to generate a null distribution for the difference in means. We will achieve this by generating multiple permutations of the dataset and calculating the difference in means for each case.

First, let's generate one permutation and calculate the difference in means for the permuted dataset.

In [1]:
import pickle
import numpy as np

donations_A = pickle.load(open('data/donations_A.pk','rb'))
donations_B = pickle.load(open('data/donations_B.pk','rb'))

# Concatenate the two arrays donations_A and donations_B into data
len_A, len_B = len(donations_A), len(donations_B)
data = np.concatenate([donations_A, donations_B])
print('data shape: ', data.shape)

# Get a single permutation of the concatenated length
perm = np.random.permutation(len(donations_A) + len(donations_B))

# Calculate the permutated datasets and difference in means
permuted_A = data[perm[:len(donations_A)]]
permuted_B = data[perm[len(donations_A):]]
print("Difference in the permuted mean values = {:.3f}.".format(permuted_A.mean() - permuted_B.mean()))

data shape:  (1200,)
Difference in the permuted mean values = 0.102.


## Hypothesis testing - Difference of means

We want to test the hypothesis that there is a difference in the average donations received from A and B. Previously, you learned how to generate one permutation of the data. Now, we will generate a null distribution of the difference in means and then calculate the p-value.

For the null distribution, we first generate multiple permuted datasets and store the difference in means for each case. We then calculate the test statistic as the difference in means with the original dataset. Finally, we calculate the p-value as twice the fraction of cases where the difference is greater than or equal to the absolute value of the test statistic (2-sided hypothesis). A p-value of less than say 0.05 could then determine statistical significance.

In [2]:
# Generate permutations equal to the number of repetitions
reps = 1000
perm = np.array([np.random.permutation(len(donations_A) + len(donations_B)) for i in range(reps)])
print('shape of perm: ',perm.shape)
permuted_A_datasets = data[perm[:, :len(donations_A)]]
print('shape of permuted A: ',permuted_A_datasets.shape)
permuted_B_datasets = data[perm[:, len(donations_A):]]
print('shape of permuted B: ',permuted_B_datasets.shape)

# Calculate the difference in means for each of the datasets
samples = permuted_A_datasets.mean(axis=1) - permuted_B_datasets.mean(axis=1)
print('shape of samples: ', samples.shape)

# Calculate the test statistic and p-value
test_stat = donations_A.mean() - donations_B.mean()
print("p-value = {}".format(2*np.sum(samples >= np.abs(test_stat))/reps))

shape of perm:  (1000, 1200)
shape of permuted A:  (1000, 500)
shape of permuted B:  (1000, 700)
shape of samples:  (1000,)
p-value = 0.0


## Hypothesis testing - Non-standard statistics

In the previous two exercises, we ran a permutation test for the difference in mean values. Now let's look at non-standard statistics.

Suppose that you're interested in understanding the distribution of the donations received from websites A and B. For this, you want to see if there's a statistically significant difference in the median and the 80th percentile of the donations. Permutation testing gives you a wonderfully flexible framework for attacking such problems.

Let's go through running a test to see if there's a difference in the median and the 80th percentile of the distribution of donations. As before, you're given the donations from the websites A and B in the variables donations_A and donations_B respectively.

In [3]:
# Calculate the difference in 80th percentile and median for each of the permuted datasets (A and B)
samples_percentile = np.percentile(permuted_A_datasets, 80, axis=1) - np.percentile(permuted_B_datasets, 80, axis=1)
samples_median = np.median(permuted_A_datasets, axis=1) - np.median(permuted_B_datasets, axis=1)

# Calculate the test statistic from the original dataset and corresponding p-values
test_stat_percentile = np.percentile(donations_A, 80) - np.percentile(donations_B, 80)
test_stat_median = np.median(donations_A) - np.median(donations_B)

print("80th Percentile: test statistic = {0:.3f}, p-value = {1:.3f}".format(test_stat_percentile,
                                                                  2*np.sum(samples_percentile >= np.abs(test_stat_percentile))/reps))
print("Median: test statistic = {0:.3f}, p-value = {1:.3f}".format(test_stat_median,
                                                         2*np.sum(samples_median >= np.abs(test_stat_median))/reps))

80th Percentile: test statistic = 1.695, p-value = 0.022
Median: test statistic = 0.643, p-value = 0.010
