## The Euro problem

Adapted from "Teaching statistical inference with resampling," Copyright 2018 Allen Downey
License: http://creativecommons.org/licenses/by/4.0/

In *Information Theory, Inference, and Learning Algorithms*, David MacKay writes, "A statistical statement appeared in *The Guardian* on Friday January 4, 2002:

*When spun on edge 250 times, a Belgian one-euro coin came
up heads 140 times and tails 110. ‘It looks very suspicious
to me’, said Barry Blight, a statistics lecturer at the London
School of Economics. ‘If the coin were unbiased the chance of
getting a result as extreme as that would be less than 7%’.*

But do these data give evidence that the coin is biased rather than fair?"

In [None]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(4)

Let's unpack what Dr. Blight said:

"If the coin were unbiased the chance of getting a result as extreme as that would be less than 7%".

To see where that comes from, let's simulate the result of spinning an "unbiased" coin, meaning that the chance of heads is 50%.

Here's an example with 10 spins:

In [None]:
spins = np.random.random(10) < 0.5

`np.random.random` returns numbers between 0 and 1, uniformly distributed.  So the probability of being less than 0.5 is 50%.  

The sum of the array is the number of `True` elements, that is, the number of heads:

In [None]:
np.sum(spins)

Write a function that simulates `n` spins with probability `p`, and returns the number of heads

In [None]:
def spin(n, p):
    return 0

Here's an example with the actual sample size (250) and hypothetical probability (50%).

In [None]:
heads, tails = 140, 110
sample_size = heads + tails

In [None]:
hypo_prob = 0.5
spin(sample_size, hypo_prob)

Each time we run this simulated experiment, we get a different outcome.

Here's a function that runs the experiment (250 spins) many times, and collects the outcomes (number of heads) in an array.

In [None]:
def run_experiments(n, p, iters):
    t = [spin(n, p) for i in range(iters)]
    return np.array(t)

In [None]:
outcomes = run_experiments(sample_size, hypo_prob, 10000);

The result is an array of 10000 integers, each representing the number of heads in a simulated experiment.

What is the mean of `outcomes` ?

In [None]:
np.mean(outcomes)

On average, the expected number of heads is the product of the hypothetical probability and the sample size:

In [None]:
expected = hypo_prob * sample_size

How much do the values in `outcomes` differ from the expected value?

In [None]:
diffs = ...

The following function plots a histogram of a sequence of values:

In [None]:
def plot_hist(values, low=None, high=None):
    options = dict(alpha=0.5, color='C0')
    xs, ys, patches = plt.hist(values,
                               normed=True,
                               histtype='step', 
                               linewidth=3,
                               **options)
    
    
    plt.ylabel('Density')
    plt.tight_layout()
    return patches[0]

Here's what the deviations from the expected value look like:

In [None]:
plot_hist(diffs)

plt.title('Sampling distribution (n=250)')
plt.xlabel('Deviation from expected number of heads');

This is the "sampling distribution" of deviations.  It shows how much variation we should expect between experiments with this sample size (n = 250).

## P-values

Getting get back to this line:

"If the coin were unbiased the chance of getting a result as extreme as that would be less than 7%".

Let's count how many times, in 10000 attempts, the outcome is "as extreme as" the observed outcome, 140 heads.

The observed deviation is the difference between the observed and expected number of heads:

In [None]:
observed_diff = heads - expected

How many times did the simulated `diffs` exceed the observed deviation?

Dr. Blight said 7%. Where did that come from?

We only counted the cases where the outcome is *more* heads than expected.  What about the cases where the outcome is *fewer* than expected?

How many times did the simulated `diffs` fall below the observed deviation?

To get the total probability of a result "as extreme as that", we can use the absolute value of the simulated differences:

Is that consistent with what Dr. Blight reported?

The next function fills in the histogram between `low` and `high`

In [None]:
def fill_hist(low, high, patch):
    options = dict(alpha=0.5, color='C0')
    fill = plt.axvspan(low, high, 
                       clip_path=patch,
                       **options)

Make a plot that shows the sampling distribution of `diffs` with two regions shaded.  These regions represent the chance that an unbiased coin yields a deviation from the expected as extreme as 15.

In [None]:
patch = plot_hist(diffs)

low = ...
high = ...
fill_hist(low, high, patch)

low = ...
high = ...
fill_hist(low, high, patch)

plt.title('Sampling distribution (n=250)')
plt.xlabel('Deviation from expected number of heads');

What's your conclusion? Why?