# Permutation tests

Goal: We often want to make arguments that variables are related. For example, we might want to argue that one author uses a word more often than another, or that 19th C novels use longer sentences than 20th century novels. But with finite and often small samples, we could find patterns that are really just random chance. How do we tell whether two variables are actually related, or if there is only chance similarity?

The key question is: what would I observe if there were no connection between the two variables? An experiment is *statistically* convincing if the pattern I saw is sufficiently unlikely by random chance. But what do "unlikely" and "sufficiently" mean?

We'll start by replicating the "Tea Tasting" experiment from R.A. Fisher's book "The Design of Experiments" (1935). Here the two variables are (1) whether milk was added to a cup before or after the tea and (2) whether a taster says the milk was added before or after.

The "lady" was Dr. Muriel Bristol (1888--1950), a PhD in the study of algae. Here's a [longer description of the event](https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1740-9713.2012.00620.x).

In [6]:
from collections import Counter
import random

eight_cups = [1, 1, 0, 0, 1, 0, 1, 0]

## Simulate a random guess with an equal number of positives/negatives
def guess_equal(correct, print_output=True):
    n = len(correct)
    
    ## Make a copy of the correct list, then shuffle it
    guess = list(correct)
    random.shuffle(guess)
    
    if print_output:
        print(correct)
        print(guess)
    
    correct_guesses = 0
    for i, j in zip(correct, guess):
        if i == j:
            correct_guesses += 1
            
    return correct_guesses

def guess_randomly(correct, print_output=True):
    num_guesses = len(correct)
    
    ## Simulate purely random guessing
    guess = [ random.randint(0,1) for i in range(num_guesses) ]
    
    print(correct)
    print(guess)
    
    correct_guesses = 0
    for i, j in zip(correct, guess):
        if i == j:
            correct_guesses += 1
            
    return correct_guesses


Run the `guess_equal` and `guess_randomly` functions once each. We'll collect the results together.

In [7]:
## Remember that in Python we can pass a function as an argument!
##  This effectively gives a function a new, temporary name.

def repeat_experiment(correct, guess_function, num_trials):
    results = Counter()
    
    ## Add code here to run the experiment `num_trials` times and record 
    ##  the results in `results`.
    
    return results

In [None]:
repeat_experiment(eight_cups, guess_equal, 1000)