This exercise demonstrate how boostrapping is performed. I am using python for the bootstrapping demo

#Bootstrap method

The bootstrap method is a statistical technique for estimating quantities about a population by averaging estimates from multiple small data samples. Data samples are derived by drawing a subset of data from the observations and returning it back to the observations before retrieving the next subset of samples. This process is known as sampling with replacement. 

The bootstrap method can be used to estimate a quantity of a population. This is done by repeatedly taking small samples, calculating the statistic, and taking the average of the calculated statistics.

Bootstrapping can be used to estimate the skill of a machine learning model. This is done by training the model on the sample and evaluating the skill of the model on those samples not included in the training sample. These samples not included in a given sample are called the out-of-bag samples, or OOB for short.
Here is this process:

1. Choose a number of bootstrap samples to perform
2. Choose a sample size
3. For each bootstrap sample
     3.1 Draw a sample with replacement with the chosen size
     3.2 Fit a model on the data sample
     3.3 Estimate the skill of the model on the out-of-bag sample.
4. Calculate the mean of the sample of model skill estimates.

Importantly, any data preparation prior to fitting the model or tuning of the hyperparameter of the model must occur within the for-loop on the data sample. This is to avoid data leakage where knowledge of the test dataset is used to improve the model. This, in turn, can result in an optimistic estimate of the model skill.

One nice feature of bootstrapping is that the resulting sample estimations can be represented as a Gausian distribution (most of the time). 

##parameters

There are two types of parameters to be set in bootstrapping. 
1. sample size: amount of observations to be used in training
2. repetitions: how many training, testing iterations to be performed. It should be large enough to derive meaningful statistics, such as mean and standard deviantion. 


##Bootstrapping with scikit-learn package
Scikit-learn package provides an implementation to derive a single bootstrap sample of a dataset. Given below is an example. For repetitions, we need to iterate through the process. 

In [11]:
from sklearn.utils import resample
# data sample
data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
# prepare bootstrap sample
boot = resample(data, replace=True, n_samples=4, random_state=1)
print('Bootstrap Sample: %s' % boot)
# out of bag observations
oob = [x for x in data if x not in boot]
print('OOB Sample: %s' % oob)

Bootstrap Sample: [0.6, 0.4, 0.5, 0.1]
OOB Sample: [0.2, 0.3]
