### INTRO

* combines predictions of multiple estimators with a learning algorithm
* family 1: "averaging methods"
   * (bagging, tree forests,...)
* family 2: "boosting methods" = combine weak models to produce better one.
   * (Adaboost, gradient tree boosting,...)

### BAGGING

[API](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html#sklearn.ensemble.BaggingClassifier) | [demo: bias-variance analysis](plot_bias_variance.ipynb)

* many variants, mostly re: how random subsets are selected
* `max_samples`,`max_features` controls subset size
* `bootstrap`,`bootstrap_features` controls with/without replacements



In [11]:
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
bagging = BaggingClassifier(KNeighborsClassifier(),
                            max_samples=0.5, max_features=0.5)

### FORESTS OF RANDOMIZED TREES

[API (classifier)](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)

[demo](plot_calibration_multiclass.ipynb)

* two averaging algos: RandomForest, Extra-Trees

In [12]:
from sklearn.ensemble import RandomForestClassifier
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = RandomForestClassifier(n_estimators=10)
clf = clf.fit(X, Y)

### EXTEMELY RANDOMIZED TREES

[Classifer](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html#sklearn.ensemble.ExtraTreesClassifier) | [Regressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor) |
[demo:plot forest iris](plot_forest_iris.ipynb)

* Can adjust relative feature rank (depth) to asses importance compared to target predictability. Features at top of tree = contribute to decision surface on a larger fraction of inputs.

[demo:pixel importances](plot_forest_importances_faces.ipynb)

In [13]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier

X, y = make_blobs(n_samples=10000, n_features=10, centers=100,
    random_state=0)

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
    random_state=0)
scores = cross_val_score(clf, X, y)
scores.mean()                             


clf = RandomForestClassifier(n_estimators=10, max_depth=None,
    min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
scores.mean()                             


clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
    min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y)
scores.mean() > 0.999

True

### ADABOOST

[Classifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html#sklearn.ensemble.AdaBoostClassifier) |
[Regressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html#sklearn.ensemble.AdaBoostRegressor)

* Fits a sequence of weak learners on repeatedly modified versions of the dataset.
* Predictions are combined via weighted majority vote (or sum)
* Each step: incorrectly predicted samples have their weights increased
* Difficult-to-predict samples get ever-increasing influence,
* so each subsequent weak learner = forced to concentrate on previous misses.

[discrete vs real AdaBoost](plot_adaboost_hastie_10_2.ipynb) |
[multiclass AdaBoosted DT](plot_adaboost_multiclass.ipynb) |
[2class Adaboost](plot_adaboost_twoclass.ipynb) |
[Adaboost regression](plot_adaboost_regression.ipynb)

In [14]:
# example, 100 weak learners
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier

iris = load_iris()
clf = AdaBoostClassifier(n_estimators=100)
scores = cross_val_score(clf, iris.data, iris.target)
scores.mean()     

0.95996732026143794

### GRADIENT TREE BOOSTING

* Also: "gradient boosted regression trees" (GBRT)
* Can be used for classification & regression
* Support for adding estimators to already-fitted model via `warm_start`=True
* Tree size controlled with `max_depth`=h and `max_leaf_nodes`=k

[wiki](https://en.wikipedia.org/wiki/Gradient_boosting) | [classifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier) | [regressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor)

[demo:regression](plot_gradient_boosting_regression.ipynb) |
[demo:out-of-bag estimatess](plot_gradient_boosting_oob.ipynb)

In [15]:
# example: classifier with 100 decision stumps as weak learners
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_hastie_10_2(random_state=0)
X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]

clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
    max_depth=1, random_state=0).fit(X_train, y_train)
clf.score(X_test, y_test)   

0.91300000000000003

In [16]:
# example: regressor
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_friedman1(
    n_samples=1200, 
    random_state=0, 
    noise=1.0)

X_train, X_test = X[:200], X[200:]
y_train, y_test = y[:200], y[200:]

est = GradientBoostingRegressor(
    n_estimators=100, 
    learning_rate=0.1,
    max_depth=1, 
    random_state=0, 
    loss='ls').fit(X_train, y_train)

mean_squared_error(y_test, est.predict(X_test))  

5.0091548599603213

[example:effect of shrinkage & subsampling on model fit](plot_gradient_boosting_regularization.ipynb)

In [17]:
# how to "see" feature importance & partial dependence

from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble.partial_dependence import partial_dependence
from sklearn.ensemble.partial_dependence import plot_partial_dependence

X, y = make_hastie_10_2(random_state=0)
clf = GradientBoostingClassifier(
    n_estimators=100, 
    learning_rate=1.0,
    max_depth=1, 
    random_state=0).fit(X, y)

print("importances\n",clf.feature_importances_ )

features = [0, 1, (0, 1)]
pdp,axes = partial_dependence(clf, [0], X=X)
print("pdp\n",pdp)
print("axes\n",axes)

importances
 [ 0.11  0.1   0.11  0.1   0.09  0.11  0.09  0.1   0.1   0.09]
pdp
 [[ 2.46643157  2.46643157  2.46643157  2.46643157  2.46643157  2.46643157
   1.15418258  1.15418258  1.15418258  1.15418258  1.15418258  0.61847569
   0.61847569  0.61847569  0.61847569  0.61847569  0.61847569  0.61847569
   0.61847569 -0.03524098 -0.03524098 -0.03524098 -0.03524098 -0.03524098
  -0.03524098 -0.03524098 -0.03524098 -0.03524098 -0.03524098 -0.03524098
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.41817365
  -0.41817365 -0.41817365 -0.41817365 -0.41817365 -0.03532577 -0.03532577
  -0.03532577 -0.03532577 -0.035

[demo:plot partial dependence](plot_partial_dependence.ipynb)

### VOTING CLASSIFIER

* idea: combine conceptually different classifiers, use voting scheme to predict labels
* hard voting: `voting`='hard' (majority)
* soft voting: `voting`='hard' (argmax of sum of predicted probabilities)

In [18]:
# example, majority rule
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()

eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')

for clf, label in zip([clf1, clf2, clf3, eclf], ['Logistic Regression', 'Random Forest', 'naive Bayes', 'Ensemble']):
    scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

Accuracy: 0.90 (+/- 0.05) [Logistic Regression]
Accuracy: 0.93 (+/- 0.05) [Random Forest]
Accuracy: 0.91 (+/- 0.04) [naive Bayes]
Accuracy: 0.95 (+/- 0.05) [Ensemble]


In [19]:
# example, soft voting
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from itertools import product
from sklearn.ensemble import VotingClassifier

# Loading some example data
iris = datasets.load_iris()
X = iris.data[:, [0,2]]
y = iris.target

# Training classifiers
clf1 = DecisionTreeClassifier(max_depth=4)
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
eclf = VotingClassifier(estimators=[('dt', clf1), ('knn', clf2), ('svc', clf3)], voting='soft', weights=[2,1,2])

clf1 = clf1.fit(X,y)
clf2 = clf2.fit(X,y)
clf3 = clf3.fit(X,y)
eclf = eclf.fit(X,y)

[demo](plot_voting_decision_regions.ipynb)

In [20]:
# example, using VotingClassifier with GridSearch to tune parameters
from sklearn.model_selection import GridSearchCV
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
eclf = VotingClassifier(
    estimators=[
        ('lr', clf1), 
        ('rf', clf2), 
        ('gnb', clf3)], 
    voting='soft')

params = {
    'lr__C': [1.0, 100.0], 
    'rf__n_estimators': [20, 200],}

grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5)
grid = grid.fit(iris.data, iris.target)