In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from __future__ import division, print_function


# Bootstrap

- The example we used to evaluate evidence for comparing armA to armB is
an example of a general technique called the **bootstrap.**

- The model for a trial is a box with 210 `1`'s and 1890 `0`'s. This
is the pooled data.

- The object `armA` chooses 1000 items **with replacement** from `trial`.

- The object `armB` similarly chooses 1000 items **with replacement** from `trial`.

- We have constructed the number of successes in arms A and B by exactly
the same mechanism.

In [None]:
from code.probability import BoxModel, Binomial
trial = BoxModel([1]*210+[0]*1790)
armA = Binomial(1000, trial, event_spec=[1])
armB = Binomial(1000, trial, event_spec=[1])
absolute_differences = lambda armA, armB: np.abs(armA.trial() - armB.trial())
sample_differences = np.array([absolute_differences(armA, armB) for i in range(10000)])
np.mean(sample_differences >= 30)

## Comparison to permutation

- The permutation method also regenerates successes in arms A and B.

- The permutation draws `armA` and `armB` **without replacement**.

- The bootstrap draws `armA` and `armB` **with replacement**.

# Other uses of the bootstrap

- Let's consider the proteomics data introduced in Module ??

In [None]:
# download proteomics data somewhere here

## Assessing variability with the bootstrap

- Bootstrap estimate of SE.

- Bootstrap percentile interval.

- One sample tests using the bootstrap.

## Parametric bootstrap

- Our A/B example was an example of parametric bootstrap.

- McNemar's test of car cell phone data also a parametric bootstrap.

- From Wikipedia page on McNemar test:

<center>
<table style="text-align:center">
<tr>
<td></td>
<td><b>After:</b> present</td>
<td><b>After:</b> absent</td>
<td>Row total</td>
</tr>
<tr>
<td><b>Before:</b> present</td>
<td>101</td>
<td>121</td>
<td>222</td>
</tr>
<tr>
<td><b>Before:</b> absent</td>
<td>59</td>
<td>33</td>
<td>92</td>
</tr>
<tr>
<td>Column total</td>
<td>160</td>
<td>154</td>
<td>314</td>
</tr>
</table>
</center>


In [None]:
model = BoxModel([0,1])
null_dbn = Binomial(160, model, event_spec=[1])
successes = np.array(null_dbn.sample(10000))
f = plt.figure(figsize=(8,8))
ax = f.gca()
ax.hist(successes, bins=np.linspace(50,110,61))
print('The chances are: %0.1e' % np.mean(successes <= 59)*2)

## A more complicated example

- Two sample test  with bootstrap (kind of unnatural).