# Chapter 7: Ensemble Learning and Random Forests

## Load dataset

In [22]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits
dataset = load_digits()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

## Voting

A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes.

In [37]:
from sklearn.ensemble import RandomForestClassifier 
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression 
from sklearn.svm import SVC
log_clf = LogisticRegression(solver='liblinear',multi_class='auto',max_iter=100)
rnd_clf = RandomForestClassifier(n_estimators=10)
svm_clf = SVC(gamma='scale',C=0.1,probability=True)
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), 
                ('rf', rnd_clf), 
                ('svc', svm_clf)],
    voting='soft'
)

from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.9611111111111111
RandomForestClassifier 0.9555555555555556
SVC 0.95
VotingClassifier 0.9722222222222222


## Bagging and Pasting

Another approach is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement (intersecting sets), this method is called bagging. When sampling is performed without replacement (disjoint), it is called pasting.

#### Note
* In sklearn, BaggingClassifier has a 'bootstrap' parameter. Setting it to True makes it Bagging, False makes it Pasting.
* Keeping all training instances (i.e., bootstrap=False and max_sam ples=1.0) but sampling features (i.e., bootstrap_features=True and/or max_fea tures smaller than 1.0) is called the Random Subspaces method.

In [39]:
from sklearn.ensemble import BaggingClassifier 
from sklearn.tree import DecisionTreeClassifier
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1
)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
print(bag_clf.__class__.__name__, accuracy_score(y_test, y_pred))

BaggingClassifier 0.9333333333333333


## Random forests

A Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size of the training set.

In [54]:
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=100, max_leaf_nodes=500, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
y_pred = rnd_clf.predict(X_test)
print(rnd_clf.__class__.__name__, accuracy_score(y_test, y_pred))

RandomForestClassifier 0.9666666666666667


## Adaboost

A first base classifier is trained and used to make predictions on the training set. The relative weight of misclassified training instances is then increased. A second classifier is trained using the updated weights and again it makes predictions on the training set, weights are updated, and so on...

Once all predictors are trained, the ensemble makes predictions very much like bagging or pasting, except that predictors have different weights depending on their overall accuracy on the weighted training set.

In [66]:
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
     DecisionTreeClassifier(max_depth=1), n_estimators=200,
     algorithm="SAMME.R", learning_rate=0.5
 )
ada_clf.fit(X_train, y_train)
print(ada_clf.__class__.__name__, accuracy_score(y_test, y_pred))

AdaBoostClassifier 0.9666666666666667


## Gradient Boosting

Just like AdaBoost, Gradient Boosting works by sequentially adding predictors to an ensemble, each one correcting its predecessor. However, instead of tweaking the instance weights at every iteration like AdaBoost does, this method tries to fit the new predictor to the residual errors made by the previous predictor.

### By hand

In [100]:
from sklearn.tree import DecisionTreeRegressor 
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
boston = load_boston([])
X, y = boston.data, boston.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)
tree_reg1 = DecisionTreeRegressor(max_depth=8)
tree_reg1.fit(X_train, y_train)
y2 = y_train - tree_reg1.predict(X_train)
tree_reg2 = DecisionTreeRegressor(max_depth=8)
tree_reg2.fit(X_train, y2)
y3 = y2 - tree_reg2.predict(X_train)
tree_reg3 = DecisionTreeRegressor(max_depth=8)
tree_reg3.fit(X_train, y3)
y_pred = sum(tree.predict(X_test) for tree in (tree_reg1, tree_reg2, tree_reg3))
print("GB by hand: ", mean_squared_error(y_test, y_pred))

GB by hand:  17.15417597061351


### With library

In [101]:
from sklearn.ensemble import GradientBoostingRegressor
gbrt = GradientBoostingRegressor(max_depth=8, n_estimators=3, learning_rate=1.0)
gbrt.fit(X_train, y_train)
y_pred = gbrt.predict(X_test)
print("GB with library: ", mean_squared_error(y_test, y_pred))

GB with library:  7.236793473706949
