# Assignment 5: Permutation Tests
Frontiers of Computational Journalism Fall 2018, Columbia Journalism School

In this assignment you will compute a one-tailed difference of means p-value by using a permutation test. This is a classic statistical problem the comes up in many different scenarios when comparing two groups. Examples include testing medications between treatment and control groups of patients, or testing an intervention like a different teaching method in two different classes.

I don't really recommend using p-values, or statistical significance testing in general -- I'd prefer to see these types of results reported as confidence intervals on the effect sizes. However, a great deal of research used and still uses p-values, so it's important that you understand what they are.

Normally this sort of p-value would be computed by using something like a t-test. I want you to do it using a permutation test because I think it's conceptually clearer. Also, randomization methods are cool, and often a lot simpler and more powerful than classicial analytic methods.

References:

* Solve Every Statistics Problem with One Weird Trick - well not really "all", but it's an entertaining five minute introduction to randomization methods.
* A Brief Overview of Permutation Tests with Examples - Good intro. Example 1 is almost exactly what you'll be doing in this assignment.

* Permutation Methods: A Basis for Exact Inference - a more scholarly and technical discussion

In [2]:
import numpy as np

In [3]:
# Here's your data
np.random.seed(42)
a = np.random.randn(20)*5+12
b = np.random.randn(15)*3+10

In [4]:
a

array([ 14.48357077,  11.30867849,  15.23844269,  19.61514928,
        10.82923313,  10.82931522,  19.89606408,  15.83717365,
         9.65262807,  14.71280022,   9.68291154,   9.67135123,
        13.20981136,   2.43359878,   3.37541084,   9.18856235,
         6.9358444 ,  13.57123666,   7.45987962,   4.93848149])

In [5]:
b

array([ 14.39694631,   9.3226711 ,  10.20258461,   5.72575544,
         8.36685183,  10.33276777,   6.54701927,  11.12709406,
         8.19808393,   9.12491875,   8.19488016,  15.55683455,
         9.95950833,   6.82686721,  12.46763474])

In [6]:
observed_mean = a.mean() - b.mean()
observed_mean

1.3868126559919833

In [7]:
# How many permutations to sample
nsamples = 10000

Your assignment is to write code that does the following:

* Generates nsamples random permutations of the elements of a and b. Note that a and b are different lengths
* Computes the difference of means for each permutation
* Calculates the fraction that are greater or equal to observed_mean

In [8]:
# your code here

What is the computed p-value?