#### Adaboost

boosting is an ensemble method where several models are trained sequentially with each model learning from the errors of its predecessors, two methods of boosting are AdaBoost and Gradient Boosting, boosting combines a bunch of weak learners (slightly better than random guessing) to form a strong learner 

a decision stump is a CART with a maximum depth of 1 and is an example of a weak learner 

**adaboost** is adaptive boosting, each predictor pays more attention to the instances wrongly predicted by its predecessor by constantly changing the weights of training instances, each predictor is assingned a coefficient **alpha**, alpha depends on the predictor's training error (it weighs its contribution in the ensemble's final prediction), the learning rate **eta** is a number between 0 and 1 and its used to strink the coefficient alpha of a trained predictor, there's a tradeoff between eta and the number of estimators (a smaller eta value should be compensated by a greater number of estimators)
for a classification prediction weighted majority voting is used (AdaBoostClassifier), for a regression prediction weighted average is used (AdaBoostRegressor)
individual predictors don't need to be CARTs but CARTs are used most of the time in boosting because of their high variance

In [None]:
# AdaBoost Classification in sklearn
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

SEED = 1

# split the dataset into 70% train, 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=SEED)

# instantiate a classification tree, a stump hehe :)
dt = DecisionTreeClassifier(max_depth=1, random_state=SEED)

# instantiate an AdaBoost classifier with 100 decision stumps
adb_clf = AdaBoostClassifier(base_estimator=dt, n_estimators=100)

# fit to the training set
adb_clf.fit(X_train, y_train)
# predict the test set probabilities of positive class
y_pred_proba = adb_clf.predict_proba(X_test)[:,1]

# you can now evaluate the roc auc score of the test set
adb_clf_roc_auc_score = roc_auc_score(y_test, y_pred_proba)

# print the result
print('ROC AUC score: {:.2f}'.format(adb_clf_roc_auc_score))

#### Gradient Boosting

in gradient boosing each predictor is the ensemble corrects its predecessor's error (sequential correction of predecessor's errors), unlike AdaBoost the weights of the training instances are not tweaked, instead each predictor is trained using the residual errors of its predecessor as labels, gradient boosted trees use a CART as the base learner 

**shrinkage** is an important parameter used in training gradient boosted trees, shrinkage refers to the fact that the prediction of each tree in the ensemble is shrinked after it's multiplied by a learning rate **eta** which is a number between 0 and 1, just like with AdaBoost, there's a tradeoff between eta and the number of estimators (decreasing the learning rate needs to be compensated by increasing the number of estimators), the class for a gradient boosting regressor is sklearn is GradientBoosingRegressor, a similar algorithm is used for classification problems, the class implements gradient boosted classification in sklearn is GradientBoostingClassifier

In [2]:
# gradient boosting in sklearn
# predict the mpg consumption of cars 
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import train_test_split

SEED = 1

# split the dataset into 70% train, 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=SEED)

# instantiate a GBR consisting of 300 decision stumps
gbt = GradientBoostingRegressor(n_estimators=300, max_depth=1, random_state=SEED)

# fit to the training set
gbt.fit(X_train, y_train)
# predict the test set labels
y_pred = gbt.predict(X_test)

# evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
print('Test set RMSE: {:.2f}'.format(rmse_test))

#### Stochastic Gradient Boosting (SGB)

gradient boosting cons:
- it involves an exhaustive search procedure
- each tree (CART) in the ensemble is trained to find the best split-points and the best features
- this procedure may lead to CARTs that use the same split-points and possibly the same features

to mitigate these cons you can use the stochastic gradient boosting algorithm, each cart/tree is trained on a random subset of rows of the training data, the subset is sampled without replacement and are 40-80% of the training set, at each node features are sampled (without replacement) when choosing split points, this creates further diversity in the ensemble and has a net effect of adding further variance to the ensemble of trees 

In [None]:
# gradient boosting in sklearn
# predict the mpg consumption of cars 
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import train_test_split

SEED = 1

# split the dataset into 70% train, 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=SEED)

# instantiate a stochastic GradientBoostingRegressor consisting of 300 decision stumps
# subsample is % of data each tree uses for training, max features is % of available features to perform the best split
sgbt = GradientBoostingRegressor(max_depth=1, 
                                 subsample=0.8, 
                                 max_features=0.2, 
                                 n_estimators=300, 
                                 random_state=SEED)

# fit to the training set
sgbt.fit(X_train, y_train)
# predict the test set labels
y_pred = sgbt.predict(X_test)

# compute the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
print('Test set RMSE: {:.2f}'.format(rmse_test))