# Permutation test session

In [None]:
import pandas as pd
import numpy as np
import matplotlib as plt
import itertools
from scipy import stats
from scipy.special import comb

In [None]:
def get_t_statistic(data, ind, var):
    d = data.loc[data.index.isin(ind), var].values
    p = data.loc[~data.index.isin(ind), var].values
    return stats.ttest_ind(d,p)[0]

In [None]:
def get_mean_difference(data, ind, var):
    d = data.loc[data.index.isin(ind), var].values
    p = data.loc[~data.index.isin(ind), var].values
    return np.mean(d) - np.mean(p)

# Outline

- [Experimental situation](#1)
- [Intuitive idea](#2)
- [Permutation test in action: a t-test example](#3)
- [Permutation test in action: another statistic](#4)
- [Montecarlo or approximated permutation test](#5)
- [Permutation test and AB testing](#6)

<a name='1'></a>
## Experimental situation

We are part of a team in charge of analysing a randomised study which aim is to check if there exists a relationship between a drug (*D*) and the level of ethimil estradiol (*EE*).

To carry out the study, a (random) sample of women is obtained. All of them are taken contraceptive with ethimil estradiol (*EE*). We want to study the influence of the drug *D* on the levels of *EE*. Such a level is measured by the variable *area under the curve* (*AUC*). that is: *EEAUC*.

In total, there are 16 women. 8 receive the drug *D* and 8 receives a placebus *P*.

In [None]:
data = pd.read_csv('./data/women.csv')
print(data.head())

In [None]:
print(data.treat.value_counts())

In [None]:
data.groupby('treat')['EEAUC'].mean()

In [None]:
data.boxplot(by='treat')

We could proceed with *t-test* to check if there exists a difference between the populations (*p-value*, bla bla bla). 

In [None]:
d = data.loc[data.treat == 'D']['EEAUC'].values
p = data.loc[data.treat == 'P']['EEAUC'].values
t_estimator, p_value = stats.ttest_ind(d,p)
print(f'The value of the estimator is {t_estimator}. The p-value is {p_value}')

An alternative way of analysing this data is by means of **permutation test**.

<a name='2'></a>
# Intuitive idea

If the two samples that we are comparing (*D* and *P*) have the variable *AUC* equal, belonging to *D* or *P* is simply a label and it has no relation with the fact that AUC is large or small systematically. With this statement we are **establishing $H_0$**.

As a consequence, it is equally probable to observe the previous sample and any permutation of the sample in which the values of *AUC* belong to *D* or *P* in another way.

In [None]:
new_data = data.copy()
new_data['treat'] = np.random.permutation(new_data.treat)
print('Original data:\n', data.head(), '\n\n Permuation:\n', new_data.head())

A method of pure hard strength consists in numbering all possible permutatuins of the data, computing a proper statistic for each permutation (for example, *t-statistic*) and counting how many times this value is more.

In our case, there are *16!* permutations. This is not feaseable at all!!!!. However, since some of the *16!* perms are repeated, we could avoid some of them (Why??). In the end, we need:

$$\frac{16!}{8!\cdot8!} = 12870$$

<a name='3'></a>
# Permutation test in action: a t-test example

The way to proceed with the permutation test is the following one:

1. Calculation of statistic for the original data
2. List of all possible combinations
3. Calculation of statistic for each one of these indexes resortings
4. Finally, *p-value*: proportion of times that the statistic computed from permutations is equal or more extrem than the one compuded from the original data

In [None]:
# 1. t-statistic for the original data
d = data.loc[data.treat == 'D']['EEAUC'].values
p = data.loc[data.treat == 'P']['EEAUC'].values
t_orig, _ = stats.ttest_ind(d,p)
print(f'The t-statistic for the original data is {t_orig}')

In [None]:
# 2. List of all possible combinations
all_combinations_d = itertools.combinations(range(len(data.index)), sum(data.treat == 'D'))

In [None]:
# What does itertools.combinations do?
all_comb = itertools.combinations(range(5), 3)
for comb in all_comb:
    print(comb)

In [None]:
# 3. Calculation of statistic for each one of these indexes resortings
result = [get_t_statistic(data=data, ind=sample, var='EEAUC') for sample in all_combinations_d]

In [None]:
print(f'Total number of permutations: {len(result)}')

In [None]:
plt.pyplot.hist(result)

In [None]:
# 4. Finally, p-value
sum(result>=t_orig)/len(result)

Why is this *p-value* different from the *p-value* of the *t-test*???

<details><summary>CLICK ME</summary>
<p>
The p-value we have just obtained refers to unilateral test in which we consider the alternative hypothesis:
    
$$H_1 : D > P$$

In order to perform the bilateral test with $H_1 : D \ne P$, the most appropriate statistic is the absolute value of t-Student because if both negative and positive differences are far away from zero, there are evidences against the null hypothesis.
    
</p>
</details>

In [None]:
# Place the code here
# # 4.1 Finally, p-value

<a name='4'></a>
# Permutation test in action: another statistic

we have just seen a use case of permutation test. However, we could avoid it because we could use *t-test* to analyse the hypothesis test. 

**What are the benefits of permutation tests?**

One of the advantages of permutation tests is that we can use the statistic that best expresses the type of differences we want to show, and we are not forced to use a statistic which distribution under null hypothesis is mathematically easy to determine and with good statistical prorperties. 

The permutation mechanism allows to ingnore the necessity of good mathematical behavior.
Computation of p-value is easy

In this part we are going to analyse the same data but, instead of using *t-statistic*, we are going to use the following one:

$$diff = mean(d) - mean(p)$$

In [None]:
all_combinations_d = itertools.combinations(range(len(data.index)), sum(data.treat == 'D'))
result = [get_mean_difference(data=data, ind=sample, var='EEAUC') for sample in all_combinations_d]

In [None]:
plt.pyplot.hist(result)

In [None]:
index_d = data.loc[data.treat == 'D'].index
mean_original = get_mean_difference(data=data, ind=index_d, var='EEAUC')
print(f'The original statistic is: {mean_original}')

In [None]:
sum(np.abs(result)>=mean_original)/len(result)

In [None]:
# We can get a CI for the t-statistic:
level = 0.05
ci = np.quantile(result, [level/2, 1-level/2])
print(f'{100*(1-level)}% CI: ({ci[0]}, {ci[1]})')

<a name='4'></a>
# Montecarlo or approximated permutation test

Exact permutation test as the previous for medium or large sample sizes are impossible to be computed. The most usual solution is doing an aproximated permutation test or Montecarlo test. 

In this approach, a random sample of permutations is simulated, usually very large but smaller than the whole 'population' of possible permutations, which can be huge. Proceeding in this way, a *p-value* estimation is obtained which can be supposed very precise.

<a name='5'></a>
# Permutation test and AB testing
We decide to perform an *AB test* in order to check the performance of our new algorithm (**A group**). For this experiment, the following decisions are taken:
- Split the population randomly with 50% of probability
- Same sample size in both groups (100 visits)

Once the experiment is finished, we take the data and analyse it. We have to decide if the new algorithm has a better performace, which means **having a higher click rate**.

In [None]:
ab_result = pd.read_csv('./data/ab.csv')

In [None]:
ab_result.head(10)

In [None]:
ab_result.groupby('group')['click'].mean()

We are going to perform a permutation test to analyse the following hypothesis test:

$$\left\{
\begin{array}{ll}
      H_0 & \mu_A=\mu_B \\
      H_1 & \mu_A \ne \mu_B \\
\end{array} 
\right. $$

As the statistic we will use the mean difference. Notice that if we wanted to perform the exact permutation test we would need a lot of permutatitions:

$$\frac{200!}{100!\cdot100!}.$$

We are going to use *Montecarlo permutation test* with 9999 permutations. When using this approximation, an unbiased estimator of *p-value* is the following one:

$$\frac{\text{simulations} >= \text{original statistic} \bf{+1}}{\text{total_perms} \bf{+1}}$$

In [None]:
# First, we calculate the statistic for the original sample
ind = ab_result.loc[ab_result.group == 'A'].index
orig_stat = get_mean_difference(data=ab_result, ind=ind, var='click')
print(f'The original statistic is: {orig_stat}')

In [None]:
# Generate permutations
total_perms = 9999
sample_size = len(ab_result)
len_a = sum(ab_result.group == 'A')
ind_list = [np.random.choice(sample_size, len_a) for i in range(total_perms)]
result = [get_mean_difference(data=ab_result, ind=ind, var='click') for ind in ind_list]
print(f'Total permutations: {len(result)}')

In [None]:
ind_list[:2]

In [None]:
p_value = (sum(np.abs(result) >=orig_stat) +1)/(len(result)+1)
print(f'p-value: {p_value}')

In [None]:
plt.pyplot.hist(result)

In [None]:
# We can get a CI for the t-statistic:
level = 0.05
ci = np.quantile(result, [level/2, 1-level/2])
print(f'{100*(1-level)}% CI: ({ci[0]}, {ci[1]})')

In [None]:
data.index.isin([0,1,2])