# Bootstrapping data

## Introduction

## What it means to bootstrap

Say we have observations one through thirteen.

In [2]:
import numpy as np
one_to_thirteen = np.arange(1, 14)
one_to_thirteen

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [3]:
np.random.seed(21)
np.random.choice(one_to_thirteen, 5, replace = False)

array([6, 8, 2, 7, 3])

In [4]:
np.random.seed(21)
np.random.choice(one_to_thirteen, 8, replace = True)

array([10,  9,  5,  1,  1,  9,  4, 13])

### Applying Bootstrapping to Our Models

Now that we understand the operation behind bootstrapping let's see what happens if we apply bootstrapping to our training sets.  We'll continue to use our list of numbers, one through thirteen, to keep things simple.

In [5]:
from sklearn.model_selection import train_test_split
one_to_thirteen = np.arange(1, 14)
x_train, x_test = train_test_split(one_to_thirteen)

And now, mimicking the procedure of a random forest, let's begin by taking subsamples of our data, without replacement.  Let's say that we select sixty percent of our training set each time.

In [12]:
import numpy as np
np.random.seed(23)
datasets = np.vstack([np.random.choice(x_train, 6, replace = False) for num in range(0, 5)])
datasets.sort(axis=1)
datasets

array([[ 4,  5,  6,  7, 10, 12],
       [ 1,  4,  9, 10, 12, 13],
       [ 4,  5,  7,  9, 12, 13],
       [ 6,  7,  9, 10, 12, 13],
       [ 4,  5,  6, 10, 12, 13]])

In [13]:
np.random.seed(23)
datasets = np.stack([np.random.choice(x_train, 6, replace = True) for num in range(0, 5)])
datasets.sort(axis=1)
datasets

array([[ 1,  4,  4,  7,  9,  9],
       [ 1,  5,  6,  9, 10, 10],
       [ 1,  5,  7, 10, 13, 13],
       [ 5,  5,  7,  9, 12, 12],
       [ 1,  1,  4,  6,  9, 13]])

### Bootstrapping in Random Forests

In [24]:
# RandomForestRegressor(bootstrap = False)

### Resources

[Random Forest Top to Bottom](https://www.gormanalysis.com/blog/random-forest-from-top-to-bottom/)