# Estimation of standard errors with bootstrap

* [Overview](#overview) 
* [Boostrap](#sec1)
    * [Monte Carlo simulations](#subsec1)
* [Summary](#sum)
* [References](#refs)

## Overview

In the previous section, we saw how to estimate the error associated with a parameter estimator. Specifically, we talked about the standard error associated with estimating the $\lambda$ of a Poisson distribution.

In many cases an analytical formula for the standard deviation of the estimator may not exist. In this case we can use boostrap. Indeed, bootstrap is used to estimate population parameters by Monte Carlo simulations when it is too difficult to do it analytically.

## Bootstrap

The difficulty with evaluating the performance of parameters estimators i.e. their standard errors, is that we can observe an estimator $\hat{\theta}$ only once; we only have one sample for the population. If we had many samples we could compute $\hat{\theta}$ on each sample and then compute the variance of these estimators.

Bootstrap is a simple technique that consists of taking all possible samples from a given sample $S$ compute $\hat{\theta}$ on all these samples and then compute the sample variance of those estimators in order to obtain an estimate for the variance of the estimator.

----

**Remark**

Recall that sampling from the observed sample $S$ is  called resampling. Thus bootstrap is a resampling method and the obtained samples from $S$ are called bootstrap samples.

----

Thus, a bootstrap sample is a random sample drawn with replacement from the observed sample $S$ of the same size as $S$. The distribution of a statistic across bootstrap samples is called a **bootstrap distribution**. Similarly, an estimator that is computed on basis of bootstrap samples is a **bootstrap estimator**.

### Monte Carlo simulations

For a sample of size $n$ there are $n^n$ possible bootstrap samples. Thus for a large sample computing all the bootstrap samples is infeasible.  The alternative is just generate a large number of them.  

The number of generated bootstrap samples, $n_{MC}$, can be very large. As $n_{MC} \rightarrow \infty$, our estimator  becomes just as good as if we had a complete list of bootstrap samples. Typically, thousands or tens of thousands of bootstrap samples are being generated.

The following code snippet demonstrates the Monte Carlo based bootstrap simulation

In [2]:
import numpy as np

In [3]:
sample = [3, 5, 8, 5, 5, 8, 5, 4, 2]
n_mc = 10000
median = np.median(sample)

In [4]:
print("Median of sample={0}".format(median))

Median of sample=5.0


In [9]:
means = []

for itr in range(n_mc):
    bootstrap_sample = np.random.choice(sample, size=len(sample), replace=True)
    means.append(np.mean(bootstrap_sample))
    
bootstrap_mean = np.mean(means)

In [12]:
print("Mean={0}".format(bootstrap_mean))
print("Bias of mean={0}".format(bootstrap_mean - median))
print("Std of mean={0}".format(np.std(means)))

Mean=4.9919
Bias of mean=-0.008099999999999774
Std of mean=0.6346164934066114


## Summary

## References