# Boosting
* **Boosting:** Ensemble method combining several weak learners to form a strong learner.
* **Weak learner:** Model doing slightly better than random guessing.
* **Example of weak learner:** Decision stump (CART whose maximum depth is 1).
* Train an ensemble of predictors sequentially.
* Each predictor tries to correct its predecessor.
* Most popular boosting methods:
    * AdaBoost,
    * Gradient Boosting.

# AdaBoost (Adaptive Boosting)

## Adaboost
* Stands for Adaptive Boosting.
* Each predictor pays more attention to the instances wrongly predicted by its predecessor.
* Achieved by changing the weights of training instances.
* Each predictor is assigned a coefficient $\alpha$ that weighs its contribution in the ensembleś final prediction.
* $\alpha$ depends on the predictor's training error.

## Learning Rate

* An important parameter used in training is the learning rate called $\eta$.
* $\eta$ is a number between 0 and 1; it is used to shrink the coefficient alpha of a trained predictor.
* It's important to note that there's a trade-off between $\eta$ and the number of estimators.
* A smaller value of $\eta$ should be compensated by a greater number of estimators.

## AdaBoost: Prediction
**Classification:**
* Weighted majority voting.
* In sklearn: AdaBoostClassifier.

**Regression:**
* Weighted average.
* In sklearn: AdaBoostRegressor 

## AdaBoost Classification in sklearn (Breast Cancer dataset)

In [1]:
import pandas as pd
wbc = pd.read_csv('wbc.zip')
X = wbc.iloc[:, 2:-1]
y = pd.Categorical(wbc['diagnosis']).codes
X.shape

(569, 30)

In [2]:
# Import models and utility functions
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
# Set seed for reproducibility
SEED = 1
# Split data into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=SEED)

In [3]:
# Instantiate a classification-tree 'dt' (week learner)
dt = DecisionTreeClassifier(max_depth=1, random_state=SEED)
# Instantiate an AdaBoost classifier 'adab_clf'
adb_clf = AdaBoostClassifier(estimator=dt, n_estimators=100)
# Fit 'adb_clf' to the training set
adb_clf.fit(X_train, y_train)
# Predict the test set probabilities of positive class
y_pred_proba = adb_clf.predict_proba(X_test)[:,1]
# Evaluate test-set roc_auc_score
adb_clf_roc_auc_score = roc_auc_score(y_test, y_pred_proba)

In [4]:
# Print adb_clf_roc_auc_score
print('ROC AUC score: {:.2f}'.format(adb_clf_roc_auc_score))

ROC AUC score: 0.99


## AdaBoost Classification in sklearn (Indian Liver Patient)

In [5]:
indian = pd.read_csv('indian_liver_patient.zip').dropna()
indian['Gender'] = pd.Categorical(indian['Gender']).codes
X = indian.iloc[:, :-1]
y = indian['Dataset']

In [6]:
# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)
# Instantiate ada
ada = AdaBoostClassifier(estimator=dt, n_estimators=180, random_state=1)

In [7]:
# Fit ada to the training set
ada.fit(X_train, y_train)

# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test,)[:,1]

In [8]:
# Import roc_auc_score
from sklearn.metrics import roc_auc_score
# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)
# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

ROC AUC score: 0.99


# Gradient Boosting (GB)

## Gradient Boosted Trees
* Sequential correction of predecessor's errors.
* Does not tweak the weights of training instances.
* Instead, each predictor is trained using its predecessor's residual errors as labels.
* Gradient Boosted Trees: a CART is used as a base learner.

## Shrinkage

* An important parameter used in training gradient boosted trees is shrinkage.
* In this context, shrinkage refers to the fact that the prediction of each tree in the ensemble is shrinked after it is multiplied by a learning rate $\eta$ (eta) which is a number between 0 and 1.
* Similarly to AdaBoost, there's a trade-off between $\eta$ and the number of estimators.
* Decreasing the learning rate $\eta$ needs to be compensated by increasing the number of estimators in order for the ensemble to reach a certain performance.

## Gradient Boosted Trees: Prediction
* Regression:
    $$y_{pred} = y_1 + \eta _1r_1 + ... + \eta r_N$$
    * In sklearn: GradientBoostingRegressor
* Classification:
    * In sklearn: GradientBoostingClassifier

## Gradient Boosting in sklearn (auto dataset)

In [9]:
auto = pd.read_csv('auto.zip')
X = auto.iloc[:, 1:]
X['origin'] = pd.Categorical(X['origin']).codes
y = auto['mpg']

In [10]:
# Import models and utility functions
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE
# Set seed for reproducibility
SEED = 1
# Split dataset into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED)

In [11]:
# Instantiate a GradientBoostingRegressor 'gbt'
gbt = GradientBoostingRegressor(n_estimators=300, max_depth=1, random_state=SEED)
# Fit 'gbt' to the training set
gbt.fit(X_train, y_train)
# Predict the test set labels
y_pred = gbt.predict(X_test)
# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
# Print the test set RMSE
print('Test set RMSE: {:.2f}'.format(rmse_test))

Test set RMSE: 4.00


## Gradient Boosting in sklearn (Bike Sharing Demand)

In [12]:
bikes = pd.read_csv('bikes.zip')
X = bikes.drop(columns=['cnt'])
y = bikes['cnt']

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [14]:
# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4, n_estimators=200, random_state=2)

In [15]:
# Fit gb to the training set
gb.fit(X_train, y_train)
# Predict test set labels
y_pred = gb.predict(X_test)

In [16]:
# Compute MSE
mse_test = MSE(y_test, y_pred)
# Compute RMSE
rmse_test = mse_test ** (1/2)
# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test))

Test set RMSE of gb: 43.113


# Stochastic Gradient Boosting (SGB)

## Gradient Boosting: Cons
* GB involves an exhaustive search procedure.
* Each CART is trained to find the best split points and features
* May lead to CARTs using the same split points and maybe the same features.

## Stochastic Gradient Boosting
* Each tree is trained on a random subset of rows of the training data.
* The sampled instances (40%-80% of the training set) are sampled without replacement.  
    `subsample` - The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter `n_estimators`. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
* Features are sampled (without replacement) when choosing split points.  
    `max_features` - The number of features to consider when looking for the best split.
* Result: further ensemble diversity.
* Effect: adding further variance to the ensemble of trees

## Stochastic Gradient Boosting in sklearn (auto dataset)

In [17]:
auto = pd.read_csv('auto.zip')
X = auto.iloc[:, 1:]
X['origin'] = pd.Categorical(X['origin']).codes
y = auto['mpg']

In [18]:
# Import models and utility functions
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE
# Set seed for reproducibility
SEED = 1
# Split dataset into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED)

In [19]:
# Instantiate a stochastic GradientBoostingRegressor 'sgbt'
sgbt = GradientBoostingRegressor(max_depth=1,
                                 subsample=0.8,
                                 max_features=0.2,
                                 n_estimators=300,
                                 random_state=SEED)
# Fit 'sgbt' to the training set
sgbt.fit(X_train, y_train)
# Predict the test set labels
y_pred = sgbt.predict(X_test)

In [20]:
# Evaluate test set RMSE 'rmse_test'
rmse_test = MSE(y_test, y_pred)**(1/2)
# Print 'rmse_test'
print('Test set RMSE: {:.2f}'.format(rmse_test))

Test set RMSE: 4.11


## Stochastic Gradient Boosting in sklearn (Bike Sharing Demand)

In [21]:
bikes = pd.read_csv('bikes.zip')
X = bikes.drop(columns=['cnt'])
y = bikes['cnt']

In [22]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [23]:
# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4,
                                 subsample=0.9,
                                 max_features=0.75,
                                 n_estimators=200,
                                 random_state=2)

In [24]:
# Fit sgbr to the training set
sgbr.fit(X_train, y_train)
# Predict test set labels
y_pred = sgbr.predict(X_test)

In [25]:
# Compute test set MSE
mse_test = MSE(y_test, y_pred)
# Compute test set RMSE
rmse_test = mse_test ** (1/2)
# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test))

Test set RMSE of sgbr: 45.143
