# Lecture 16: The Bootstrap 
***

We'll need Numpy, Matplotlib, Pandas, and scipy.stats for this notebook, so let's load them. 

In [3]:
import numpy as np 
from scipy import stats
import pandas as pd 
import matplotlib.pylab as plt 
%matplotlib inline

### Exercise 1 - Bootstrapped Confidence Intervals for the Mean 
*** 

In this exercise you will experiment with empirical bootstrap techniques to compute confidence intervals the mean of the exponential distribution with parameter $\lambda = 5$.  

**Part A**: Write down the expected value, variance, and standard deviation of $X \sim Exp(5)$.  You can look them up on [Wiki](https://en.wikipedia.org/wiki/Exponential_distribution) if you need to. 

** 1/ lambda, or 0.2**

**Part B**: The variable $\texttt{sample}$ below consists of $500$ samples from $Exp(5)$.  Complete the function bootstrapped_mean below, to draw at least $5000$ bootstrapped resamples (with replacement) from the empirical distribution defined by $\texttt{sample}$ and compute a bootstrapped confidence interval for the mean at the 95% confidence level.  

In [17]:
sample = np.random.exponential(1/5, size=500)

In [23]:
# Assumes we're looking for a 95% CI
def bootstrapped_mean(sample, num_boots=5000):
    # resample the data num_boots time
    # compute the sample means of each resample
    resampled = np.zeros(num_boots)
    for ii in range(num_boots):
        resampled[ii] = np.mean(np.random.choice(sample, size=(len(sample)), replace=True))
    
    # compute the CI using the percentiles (2.5 and 97.5th percentile of resampled means)
    L = np.percentile(resampled, 2.5)
    U = np.percentile(resampled, 97.5)
    
    return [L, U] 
    
print("Bootstripped 95% CI for the Mean:", bootstrapped_mean(sample))

Bootstripped 95% CI for the Mean: [0.17658079807151872, 0.2116200302421852]


**Part C**: Use the sample mean of $\texttt{sample}$ and the known standard deviation of the distribution to compute a traditional 95% confidence interval for the mean of the distribution.  Compare your traditional confidence interval to the bootstrapped confidence interval returned by your code. 

In [27]:
from scipy.stats import norm
xbar = np.mean(sample)
xsd = 0.2
n = len(sample)
zao2 = norm.ppf(0.975)
SE = xsd/np.sqrt(n)
TCI = (xbar - zao2*SE, xbar + zao2*SE)
print("Tradish 95% CI for mean: ", TCI)

Tradish 95% CI for mean:  (0.17641584046420086, 0.21147674208726411)


**Part D**: Modify the code you wrote in **Part B** to also plot a histogram of the boostrapped sample means along with some graphical representation of the calculated confidence interval. 

### Exercise 2 - Bootstrapped Confidence Intervals for the Variance
*** 

In this exercise you will experiment with empirical bootstrap techniques to compute confidence intervals for various statistics of the variance of the exponential distribution with parameter $\lambda = 5$.  

**Part A**: Complete the function bootstrapped_var below, to draw at least $5000$ bootstrapped samples (with replacement) from the empirical distribution defined by $\texttt{sample}$ and compute a bootstrapped confidence interval for the variance at the 95% confidence level.  You should be able to copy paste your code from Exercise 1. Use your function to find the 95% bootstrapped CI for the data stored in $\texttt{sample}$ from Exercise 1. 

In [31]:
def bootstrapped_mean(sample, num_boots=5000):
    # resample the data num_boots time
    # compute the sample means of each resample
    resampled = np.zeros(num_boots)
    for ii in range(num_boots):
        resampled[ii] = np.var(np.random.choice(sample, size=(len(sample)), replace=True))
    
    # compute the CI using the percentiles (2.5 and 97.5th percentile of resampled means)
    L = np.percentile(resampled, 2.5)
    U = np.percentile(resampled, 97.5)
    
    return [L, U] 
    
print("Bootstripped 95% CI for the Var:", bootstrapped_mean(sample))

Bootstripped 95% CI for the Var: [0.03093265436592068, 0.055211020394035616]


**Part B**: Does your 95% bootstrapped confidence interval cover the true variance of the population? 

**Yes.** 

### Exercise 3 - Empirical Coverage of Bootstrapped Confidence Intervals  
*** 

Complete the function CI_test below to test the coverage of the bootstrapped confidence intervals at the 95% confidence level for the mean of the population that $\texttt{sample}$ is drawn from.  Recall that you know that the true sample mean is $1/\lambda = 0.2$.  

In [None]:
def CI_test(sample, num_CIs=100, num_boots=5000):
    return 1.0 

### Exercise 4 - Parametric Bootstrap for the Exponential Parameter 
*** 

In this exercise you will experiment with the parametric bootstrap technique to compute confidence intervals for various statistics of the exponential distribution with parameter $\lambda = 5$.  

**Part A**: Complete the function bootstrapped_lam below, to draw at least $5000$ bootstrapped resamples (with replacement) from the empirical distribution defined by $\texttt{sample}$ and compute a bootstrapped confidence interval for the exponential parameter $\lambda$.  Recall from class that a good estimator for $\lambda$ is $1/x$ where $x$ is assumed to come from $Exp(\lambda)$. 

In [None]:
def bootstrapped_lam(sample, num_boots=5000):
    CI = np.array([0,1])
    return CI 
    
bootstrapped_lam(sample, num_boots=5000)

**Part B**: Complete the function parametric_stdev below, to draw at least $5000$ bootstrapped resamples (with replacement) from the empirical distribution defined by $\texttt{sample}$.  From each bootstrapped resample, estimate the exponential parameter $\lambda$, then transform the estimate of $\lambda$ to the variance of the exponential distribution.  With your bootstrapped estimates of $\sigma$, compute a 95% confidence interval for the variance.  How does this confidence interval compared to the one computed in Exercise 2? 


In [None]:
def parametric_stdev(sample, num_boots=5000):
    CI = np.array([0,1])
    return CI 
    
parametric_stdev(sample)