# Chapter 07 - Ensemble Learning 

## Voting Classifiers
Muchos algoritmos (idealmente, que cometen errores distintos.) que predicen sobre una misma instancia. El output del Voting Classifier es la instancia más votada}

In [1]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=1000, noise=0.2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [2]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()
voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
voting='hard')
voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()), ('svc', SVC())])

In [3]:
from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.87
RandomForestClassifier 0.98
SVC 0.975
VotingClassifier 0.975


#### Bueno... no se ve tanto acá. Pero en el libro se ve que el VotingClassifier funciona mejor que cada uno de los clasificadores independientes.

## Bagging Classifier - _Bootsrap Aggregating Classifiers_
Básicamente, entrenan el mismo algoritmo con diferentes subsets del training set: bagging = Con reposición. pasting = Sin reposición.

In [4]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)

In [5]:
accuracy_score(y_test, y_pred)

0.975

Ok... 96%... not bad

In [6]:
# Out-of-bag (oob) evaluation
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1,
    oob_score=True  # Setteo oob=True
)  

In [7]:
bag_clf.fit(X_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(), max_samples=100,
                  n_estimators=500, n_jobs=-1, oob_score=True)

In [8]:
bag_clf.oob_score_

0.96625

## Random Forests

In [9]:
from sklearn.ensemble import RandomForestClassifier

It is equivalent to use Bagging Classifier with Decision Tree Classifier

In [10]:
# Also, they show feature importance
from sklearn.datasets import load_iris
iris = load_iris()

rf_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1)
rf_clf.fit(iris["data"], iris["target"])

for name, score in zip(iris["feature_names"], rf_clf.feature_importances_):
    print(name, score)

sepal length (cm) 0.08942630642540082
sepal width (cm) 0.021051918647736402
petal length (cm) 0.4295513027005101
petal width (cm) 0.45997047222635273


OK. Seems that petal width is the most important feature in this dataset

## Boosting

In [12]:
# AdaBoost: Los modelos se entrenan secuencialmente (en serie). Cada modelo se enfoca en clasificar bien los errores del modelo anterior

from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(
DecisionTreeClassifier(max_depth=1), n_estimators=200, algorithm="SAMME.R", learning_rate=0.5)
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),
                   learning_rate=0.5, n_estimators=200)

### Gradient Boosting: Mismo concepto. Entrenamiento secuencial. Pero se enfoca en el error residual del modelo previo



In [15]:
# Primero un gradient boosting a mano:
from sklearn.tree import DecisionTreeRegressor

# Semilla. Paso 1
tree_reg1 = DecisionTreeRegressor(max_depth=2)
tree_reg1.fit(X, y)
# Paso 2
y2 = y - tree_reg1.predict(X)  # Este es el error residual! delta(y, y_hat)
tree_reg2 = DecisionTreeRegressor(max_depth=2)
tree_reg2.fit(X, y2)
# Paso 3
y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2)
tree_reg3.fit(X, y3)

# Hasta acá tengo los 3 modelos creados
# Ahora hago las predicciones sumando las predicciones de los 3 classifiers
X_new = X[:3]  # Simulo nuevas observaciones
y_pred = sum(tree.predict(X_new) for tree in (tree_reg1, tree_reg2, tree_reg3))

In [16]:
y_pred

array([0.04982003, 0.11290391, 0.94094906])

In [17]:
# Esto equivale a:
from sklearn.ensemble import GradientBoostingRegressor
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0)
gbrt.fit(X, y)

GradientBoostingRegressor(learning_rate=1.0, max_depth=2, n_estimators=3)

In [20]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X, y)
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120)
gbrt.fit(X_train, y_train)
errors = [mean_squared_error(y_val, y_pred)for y_pred in gbrt.staged_predict(X_val)]

bst_n_estimators = np.argmin(errors) + 1
gbrt_best = GradientBoostingRegressor(max_depth=2,n_estimators=bst_n_estimators)
gbrt_best.fit(X_train, y_train)

GradientBoostingRegressor(max_depth=2, n_estimators=113)

OK... el error mínimo lo encontró en 113 estimators, cuando el máximo era 120

In [21]:
# Early stopping también se puede hacer con warm_start = True 
gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True)  # Warm Start

min_val_error = float("inf")
error_going_up = 0

for n_estimators in range(1, 120):
    gbrt.n_estimators = n_estimators
    gbrt.fit(X_train, y_train)
    y_pred = gbrt.predict(X_val)
    val_error = mean_squared_error(y_val, y_pred)
    if val_error < min_val_error:
        min_val_error = val_error
        error_going_up = 0
    else:
        error_going_up += 1
        if error_going_up == 5:  # Early stop si el validation error no baja en 5 iteraciones consecutivas
            break # early stopping

In [24]:
# Honorable Mention a XGBoost
import xgboost
xgb_reg = xgboost.XGBRegressor()
xgb_reg.fit(X_train, y_train)
y_pred = xgb_reg.predict(X_val)

In [27]:

xgb_reg.fit(X_train, y_train, eval_set=[(X_val, y_val)], early_stopping_rounds=3)
y_pred = xgb_reg.predict(X_val)

[0]	validation_0-rmse:0.36606
[1]	validation_0-rmse:0.27278
[2]	validation_0-rmse:0.21293
[3]	validation_0-rmse:0.17580
[4]	validation_0-rmse:0.15112
[5]	validation_0-rmse:0.13755
[6]	validation_0-rmse:0.12808
[7]	validation_0-rmse:0.12427
[8]	validation_0-rmse:0.12175
[9]	validation_0-rmse:0.12103
[10]	validation_0-rmse:0.11930
[11]	validation_0-rmse:0.11912
[12]	validation_0-rmse:0.12009
[13]	validation_0-rmse:0.12249
[14]	validation_0-rmse:0.12346


XGBoost se encarga de hacer el early stopping. En este caso, para cuando el error sube por 3 rondas consecutivas

## Stacking