<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Bagging---Bootstrap-AGGregation" data-toc-modified-id="Bagging---Bootstrap-AGGregation-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Bagging - Bootstrap AGGregation</a></span></li><li><span><a href="#Random-Forest" data-toc-modified-id="Random-Forest-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Random Forest</a></span><ul class="toc-item"><li><span><a href="#The-Good?" data-toc-modified-id="The-Good?-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>The Good?</a></span></li><li><span><a href="#The-Bads?" data-toc-modified-id="The-Bads?-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>The Bads?</a></span></li><li><span><a href="#Subspace-Sampling-Method" data-toc-modified-id="Subspace-Sampling-Method-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Subspace Sampling Method</a></span><ul class="toc-item"><li><span><a href="#Don't-be-like-a-banana-tree" data-toc-modified-id="Don't-be-like-a-banana-tree-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Don't be like a banana tree</a></span></li><li><span><a href="#Breed-variety-of-trees" data-toc-modified-id="Breed-variety-of-trees-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Breed variety of trees</a></span></li><li><span><a href="#🧠Knowledge-Check:-Why-would-this-beneficial?" data-toc-modified-id="🧠Knowledge-Check:-Why-would-this-beneficial?-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>🧠Knowledge Check: Why would this beneficial?</a></span></li></ul></li><li><span><a href="#Code" data-toc-modified-id="Code-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Code</a></span></li><li><span><a href="#Other-Cool-Features" data-toc-modified-id="Other-Cool-Features-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Other Cool Features</a></span><ul class="toc-item"><li><span><a href="#Feature-Importance" data-toc-modified-id="Feature-Importance-2.5.1"><span class="toc-item-num">2.5.1&nbsp;&nbsp;</span>Feature Importance</a></span></li></ul></li></ul></li></ul></div>

# Bagging - Bootstrap AGGregation

Train weak learners, combine together into one via voting

Essentially pick 'randomly' the different columns, create multiple DTs and let them vote!

![](images/bagging.png)

# Random Forest

## The Good?

- Super friend! 
- High performance 
    + low variance
- Transparent
    + inherited from Decision Trees

## The Bads?

- We got so many trees to plant...
- Computationally expensive
- Memory
    + all trees stored in memory
    + think back to k-Nearest Neighbors

## Subspace Sampling Method

### Don't be like a banana tree

Banana trees can be susceptible to [Panama's disease](https://en.wikipedia.org/wiki/Panama_disease)

![Many individual yellow bananas](images/bananas.jpeg)
They're all clones!

All Decision Trees will be the same if given the same data! (A clone!!!)

### Breed variety of trees

Take part of the data to create different trees

Steps:

1. Save a portion of data for validation (**out-of-bag**), the rest for training (**bag**)
2. The data for training (**bag**) is then split up by randomly selecting predictors
3. Grow/train your tree with the training data using just those features
4. Use our validation set (**out-of-bag**), take out the columns used in our tree from the previous step, and predict using the tree & this *out-of-bag* data
5. Compare on how well the tree did *out-of-bag error*
6. Repeat to make new trees and use the result to "vote" for the final decision

### 🧠Knowledge Check: Why would this beneficial?

Less overfitting! Variety is the spice of life!

## Code

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
from sklearn.datasets import load_iris
iris = load_iris()

In [None]:
clf = RandomForestClassifier(n_estimators=100, max_depth=2,
                             random_state=0)

In [None]:
clf

In [None]:
clf.fit(iris["data"], iris["target"])

In [None]:
print(clf.predict([[0.1, 0.8, 0.3, 0.5],[0.1, 0.1, 0.1, 0.1]]))

## Other Cool Features 

### Feature Importance

In [None]:
for name, score in zip(iris["feature_names"], clf.feature_importances_):
    print(name, score)