# Bootstrap example for Stroke data from pp. 3-5 of Efron & Tibshirani

RTB wrote it 29 October 2016 (derived from BS_ex1.m)
RTB modified it 30 Jan 2017: combined etASAhypoth.m and etASAstats.m into
one file that is modular for my stats class.
RTB modified it to emphasize stroke data (13 September 2018). ERBB translated to Python 08 September 2021. 

**The scenario**:

 A study was done to see if low-dose aspirin would prevent
heart attacks in healthy middle-aged men. The study design was optimal:
controlled, randomized, double-blind. Subjects were randomly assigned to
receive aspirin (ASA) or placebo. The summary statistics:

aspirin group (n=11037): 104 heart attacks (MI); 10933 no MI
placebo group (n=11034): 189 heart attacks; 10845 no MI

Scientific question #1: Does aspirin help to prevent heart attacks?

In the same study, the researchers also recorded the number of strokes in
each group:

aspirin group (n=11037): 119 strokes; 10918 without stroke
placebo group (n=11034): 98 strokes; 10936 without stroke

Scientific question #2: Does aspirin increase the risk of having a
stroke?

We will start by addressing the 2nd question regarding strokes. The code
you generate here will then allow you to rapidly analyze the heart attack
data.

**What to do:** 

Login to learning catalytics and join the session for the
module entitled "ASA Bootstrap". You will answer a series of
questions based on the guided programming below. Each section begins with
a '%%'. Read through the comments and follow the instructions provided.
In some cases you will be asked to answer a question, clearly indicated
by 'QUESTION'. In other cases, you be asked to supply missing code,
indicated by 'TODO'. The corresponding question in learning catalytics
will be indicated in parentheses (e.g. Q1). If there is no 'Q#'
accompanying a 'QUESTION' just type your answer into this script and
discuss it with your team. Once you have supplied the required code, you
can execute that section by mouse-clicking in that section (The block
will turn yellow.) and then simultaneously hitting the 'ctrl' and 'enter'
keys (PC) or 'command' and 'enter' keys (Mac).


**Concepts covered:**
1. Test for proportions: odds ratio
2. Comparing resampling tests with Fisher's exact test
3. Std. error and confidence intervals through bootstrapping
4. Relationship between CI and hypothesis test
5. Permutation test for strong test of H0.
6. One-tailed vs. two-tailed tests.
7. CI with 'bootci' and an anonymous function
8. Making data tables with 'table' command


In [None]:
# Imports

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [None]:
# Constants: these would normally be passed as arguments to a function

n_boot = 10000
my_alpha = 0.05

# useful numbers for stroke data
n_rx = 11037  # total number of patients in the treatment group (ASA)
n_stroke_rx = 119  # number of strokes in the treatment group
n_ctrl = 11034  # total number of patients in the control group (placebo)
n_stroke_ctrl = 98  # number of strokes in the control group
n_total = n_rx + n_ctrl

##  Calculate the actual ratio of rates of disease: an odds ratio

This is defined as the ratio of 2 ratios: the numerator is the ratio of
the number of subjects in the treatment group who had a stroke divided by
the number who did not have a stroke. The denominator is the same, but
for the control group.


In [None]:
# TODO: calculate the odds ratio for this study
or_hat = ...

print(or_hat)

**QUESTION (Q1)**: What is your odds ratio to 4 decimal places?


## Create a population from which to resample:

The general approach in bootstrapping is to resample from our *original
sample*. But you've been given only proportions, so you have to
*re-create* the raw data based on the proportions. It's alarmingly
simple, but you might have to think about it for a bit. (Note that we won't use the tidy format here to sync better with Matlab)

HINT: You should be able to re-calculate your original odds ratio with this formula:
`( data['rx_grp'].sum() / (data['rx_grp'] == 0).sum() ) / ( data['ctrl_grp'].sum() / (data['ctrl_grp'] == 0).sum() )`

In [None]:
n_rows = np.max([n_ctrl, n_rx])

data = pd.DataFrame({'ctrl_grp': np.nan * np.ones((n_rows,)),
                     'rx_grp': np.nan * np.ones((n_rows, )) })

# TODO: recreate the raw data
... # your code here!!

data.head()

In [None]:
( data['rx_grp'].sum() / (data['rx_grp'] == 0).sum() ) / ( data['ctrl_grp'].sum() / (data['ctrl_grp'] == 0).sum() )

Note that we have some NaNs to make these columns the same size. To grab the non-Nan data, which you'll need to do below, use `data['ctrl_grp'].dropna()`

## Generate bootstrap replicates of the odds ratio


In [None]:
# holds each bootstrap calc. of the odds ratio
or_bootstrap = pd.DataFrame({'or_star': np.zeros((n_boot,))}) 

# set random seed
np.random.seed(123)

for k in range(n_boot):

    # TODO: Re-sample from each group WITH REPLACEMENT to create two new
    # samples: rx_star and ctrl_star. Then use these two bootstrap samples to
    # calculate an odds ratio and store it in or_bootstrap['or_star']
    rx_star = ...
    ctrl_star = ...
    or_bootstrap.loc[k, 'or_star'] = ...  

## Make a histogram of our bootstrap replicates of OR


In [None]:
fig, ax = plt.subplots(1, 1)

sns.histplot(data = or_bootstrap, x = 'or_star', ax = ax)

ax.plot([or_hat, or_hat], ax.get_ylim(), 'lime', lw = 2)
ax.set(xlabel = 'OR^*',
       ylabel = '#',
       title = 'Distribution of bootstrapped odds ratios');

## Calculate the standard error and the confidence intervals


In [None]:
# TODO: Compute the bootstrap estimate of the standard error of the odds ratio
sem_boot = or_bootstrap['or_star'].std()

sem_boot

**QUESTION (Q2)**: What is bootstrap estimate of the standard error of the odds ratio?

In [None]:
# Use the percentile method to determine the 95% confidence interval.

... # !! more code here

conf_interval = ...

print(conf_interval)

**QUESTION (Q3)**: What is the 95% CI based on your bootstrap distribution?

**QUESTION (Q4)**: What is the null value of the odds ratio?

**QUESTION (Q5)**: How can we use the known null value of the odds ratio to
perform a hypothesis test?

**QUESTION (Q6)**: Can we reject H0 at an alpha of 0.05?


## Plot CIs on histogram


In [None]:
fig, ax = plt.subplots(1, 1)
sns.histplot(data = or_bootstrap, x = 'or_star');

ylims = ax.get_ylim()

ax.plot([or_hat, or_hat], ylims, 'lime', lw = 2, label = 'or_hat')

# Add CIs
ax.plot([conf_interval[0], conf_interval[0]], ylims, 'r', label = '95% CI')
ax.plot([conf_interval[1], conf_interval[1]], ylims, 'r')

ax.legend()

ax.set(xlabel = 'OR^*',
       ylabel = '#',
       title = 'Distribution of bootstrapped odds ratios');

## Perform an explicit hypothesis test by modeling our OR under H0

In this case, we will use a permutation test, where we resample WITHOUT
replacement. The logic is that we are essentially randomly assigning each
patient to the treatment or control group, then recalculating our odds
ratio. Here, we are testing the most extreme version of H0, which is that
the two distributions are the SAME.



In [None]:
# TODO: Perform resampling as though the patients all belonged to the
# same group (called H0_data), shuffle this data, then arbitraily assign
# each patient to the treatment or control group and compute the odd ratio. 
# Store each bootstrapped odds ratio in orPerm

# Pool all the data:
H0_data = pd.concat([data['rx_grp'].dropna(), data['ctrl_grp'].dropna()])

# Place to store our results:
or_bootstrap['or_perm'] = np.zeros((n_boot,))

# Set random seed
np.random.seed(123)

# Loop over bootstraps
for k in range(n_boot):

    # Shuffle and assign to rx_star and ctrl_star (ALL values are used!)
    ... # !! your code here
    or_bootstrap.loc[k, 'or_perm'] = ...

## Plot the distrubtion of permuted ORs


In [None]:
fig, ax = plt.subplots(1, 1)
sns.histplot(data = or_bootstrap, x = 'or_perm');

ylims = ax.get_ylim()

ax.plot([or_hat, or_hat], ylims, 'lime', lw = 2, label = 'or_hat')

ax.legend()

ax.set(xlabel = 'Permuted ORs',
       ylabel = '#',
       title = 'Distribution of ORs under H0');

## Calculate a 1-tailed p-value

In [None]:
# TODO: Calculate a one-tailed p-value based on your permuted samples (i.e.
# or_perm) and store it in a variable called 'p_val_1t'
p_val_1t = ...

p_val_1t

**QUESTION (Q7)**: What is our one-tailed p-value for the odds ratio?


## Calculate a 2-tailed p-value

In [None]:
# TODO: Calculate a two-tailed p-value based on your permuted samples (i.e.
# or_perm) and store it in a variable called 'p_val_2t'
p_val_2t = ...

p_val_2t

**QUESTION (Q8)**: What is our two-tailed p-value for the odds ratio?

Philosophical interlude: The difference in implementation between
2-tailed and 1-tailed p-value is pretty clear. The philosophical
difference, somewhat less. If you accidentally coded a 2-tailed test and
get a p value of, say,0.06, and then remember "Oh! A 1-tailed test was
actually more appropriate!" (and it really is in that instance, not for a
"p-hacky" reason) and obtain p ~ 0.03, there's a sudden shift in
perspective on the data. But it's the same data, and you're performing
more or less the same analysis. Does this seem even remotely reasonable?
This very subtle distinction would have a pretty heavy impact on a
statistics-naïve researcher. It can be helpful to think about edge cases
like this, where our arbitrary thresholding statistical procedure leads
to binarization of the same data into two categories which are
interpreted in very different ways, and how we should consider data of
this variety. Is it helpful to construct a new categorization, e.g.
"statistically significant (p small)," "unlikely to produce statistical
significance (p biggish)" and "of uncertain relationship (p kinda
small?)" or does that just move the problem?

See my article: https://www.eneuro.org/content/6/6/ENEURO.0456-19.2019

## Compare with the Fisher Exact Test for Stroke data


In [None]:
import scipy.stats

table = np.array([[n_stroke_rx, n_rx - n_stroke_rx], 
                  [n_stroke_ctrl, n_ctrl - n_stroke_ctrl]])


# TODO: Calculate a 2-tailed p-value and 95% confidence interval using Fisher's Exact Test
... # your code here

This function doesn't return the confidence interval, unlike the Matlab version, so I implemented it myself below. See the equations [here](https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717_ComparingFrequencies/PH717_ComparingFrequencies8.html).


In [None]:
# Compute lower bound of 95% confidence interval

# Compute the standard error of odds ratio
se = np.sqrt( 1 / table[0, 0] + 1 / table[0, 1] + 1 / table[1, 0] + 1 / table[1, 1])

# Get the confidence interval
norm_inv = scipy.stats.norm.ppf((1 - my_alpha / 2))
conf_interval_fisher = [odds_ratio*np.exp(-norm_inv*se), odds_ratio*np.exp(norm_inv*se)]

print(conf_interval_fisher)

**QUESTION (Q9)**: What p-value does Fisher's Exact Test give?

**QUESTION (Q10)**: What is the lower bound of the 95% CI from Fisher's Exact
Test?

Be sure to compare this with your values from the bootstrap!

**QUESTION (Q11)**: Save your final figure for the stroke data as a jpeg and
upload it to the LC site.


## Repeat calculations for the heart attack data

In [None]:
# Useful numbers for heart attack data
n_rx = 11037  # number of patients in the treatment group (ASA)
n_MI_rx = 104  # number of heart attacks (= MIs) in the treatment group
n_ctrl = 11034  # number of patients in the control group (placebo)
n_MI_ctrl = 189  # number of MIs in the control group

# Generate the raw data
n_rows = np.max([n_ctrl, n_rx])

data = pd.DataFrame({'ctrl_grp': np.nan * np.ones((n_rows,)),
                     'rx_grp': np.nan * np.ones((n_rows, )) })

... # your code here

data.head()

# Odds ratio for MI data
or_hat = ...

print(or_hat)

**QUESTION (Q12)**: What is the odds ratio, `or_hat`, for the heart attack data?

TODO: Repeat the above analysis for the MI data. If you wrote
everything in terms of `rx_grp` and `ctrl_grp`, then once you've generated
the corresponding raw values for the MI data, you should be able to just
re-run cells below the first dataframe creation/odds ratio computation without further edits to your code.

**QUESTION (Q13)**: What is the bootstrap estimate of the 95% CI of the odds
ratio for the heart attack data?

**QUESTION (Q14)**: Can we reject H0 at an alpha of 0.05 for the heart attack data?

**QUESTION (Q15)**: Based on the permutation test, what is your 1-sided
p-value for the heart attack data?

Be sure to compare the confidence intervals that you obtain
through bootstrapping to those obtained with Fisher's Exact Test.