# Boosting (Hypothesis Boosting)
 - Combine several weak learners into a strong leaner.
 - Train predictors sequentially.
 
## AdaBoost / Adaptive Boosting

As above for Boosting:
 - Similar to human learning, the algorithm learns from past mistakes by focusing more on difficult problems it did not get right in prior learning.
 - It pays more attention to training instances that previously underfit.
 
 
 - Fit a sequence of weak learners (models that are only slightly better than random guessing, such as small decision trees) on repeatedly modified versions of the data.
 - The precistions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction.
 - The data modifications at each boosting iteration consist of applying weights $w_1, w_2, ... , w_n$ for each of the training samples.
 - Initially, those weights are all set to $w_i=1/N$, so that the first step simply trains a weak learner on the original data.
 - For each successive iteration, the sample weights are indiviually modified and the learning algorithm is reapplied to the reweighted data.
 - At a given step, those training examples that were incorrectly predicted by the boosted model induced at the previous step have their weights increased, whereas the weights are decreased for those that were predicted correctly.
 - As iterations proceed, examples that are difficult to predict receive ever-increasing influence. Each subsequent weak learner is thereby forced to concentrate on the examples that are missed by the previous ones in the sequence.

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

df = sns.load_dataset('titanic')
df.dropna(inplace=True)
X = df[['pclass', 'sex', 'age']]
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
X['sex'] = lb.fit_transform(X['sex'])
y = df['survived']

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [3]:
def print_score(clf, X_train, y_train, X_test, y_test, train=True):
    '''
    print the accuracy score, classification report, and confusion matrix of classifier
    '''
    if train:
        '''
        Training Performance
        '''
        print("Train Result:\n")
        print("accuracy score: {0:.4f}\n".format(accuracy_score(y_train, clf.predict(X_train))))
        print("Classification Report:\n {} \n".format(classification_report(y_train, clf.predict(X_train))))
        print("Confusion Matrix:  \n {} \n".format(confusion_matrix(y_train, clf.predict(X_train))))
        
        res = cross_val_score(clf, X_train, y_train, cv=10, scoring='accuracy')
        print("Average Accuracy: \t {0:.4f}".format(np.mean(res)))
        print("Accuracy SD \t\t {0:.4f}".format(np.std(res)))
        
    elif train==False:
        '''
        test performance
        '''
        print("Test Result:\n")
        print("accuracy score: {0:.4f}\n".format(accuracy_score(y_test, clf.predict(X_test))))
        print("Classification Report:\n {} \n".format(classification_report(y_test, clf.predict(X_test))))
        print("Confusion Matrix:  \n {} \n".format(confusion_matrix(y_test, clf.predict(X_test))))

In [4]:
from sklearn.ensemble import AdaBoostClassifier

In [5]:
ada_clf = AdaBoostClassifier()

In [6]:
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None,
          learning_rate=1.0, n_estimators=50, random_state=None)

In [8]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=True)

Train Result:

accuracy score: 0.8819

Classification Report:
               precision    recall  f1-score   support

           0       0.81      0.86      0.84        44
           1       0.93      0.89      0.91        83

   micro avg       0.88      0.88      0.88       127
   macro avg       0.87      0.88      0.87       127
weighted avg       0.88      0.88      0.88       127
 

Confusion Matrix:  
 [[38  6]
 [ 9 74]] 

Average Accuracy: 	 0.7174
Accuracy SD 		 0.1236


In [9]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=False)

Test Result:

accuracy score: 0.7273

Classification Report:
               precision    recall  f1-score   support

           0       0.50      0.60      0.55        15
           1       0.84      0.78      0.81        40

   micro avg       0.73      0.73      0.73        55
   macro avg       0.67      0.69      0.68        55
weighted avg       0.75      0.73      0.73        55
 

Confusion Matrix:  
 [[ 9  6]
 [ 9 31]] 



#### AdaBoost with Random Forest

In [10]:
from sklearn.ensemble import RandomForestClassifier

In [16]:
ada_clf = AdaBoostClassifier(RandomForestClassifier())

In [17]:
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(algorithm='SAMME.R',
          base_estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
          learning_rate=1.0, n_estimators=50, random_state=None)

In [18]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=True)

Train Result:

accuracy score: 0.9528

Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.93      0.93        44
           1       0.96      0.96      0.96        83

   micro avg       0.95      0.95      0.95       127
   macro avg       0.95      0.95      0.95       127
weighted avg       0.95      0.95      0.95       127
 

Confusion Matrix:  
 [[41  3]
 [ 3 80]] 

Average Accuracy: 	 0.7650
Accuracy SD 		 0.1295


In [19]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=False)

Test Result:

accuracy score: 0.7273

Classification Report:
               precision    recall  f1-score   support

           0       0.50      0.60      0.55        15
           1       0.84      0.78      0.81        40

   micro avg       0.73      0.73      0.73        55
   macro avg       0.67      0.69      0.68        55
weighted avg       0.75      0.73      0.73        55
 

Confusion Matrix:  
 [[ 9  6]
 [ 9 31]] 



**Exercise:** Try with grid search and increasing n_estimators.

In [34]:
from sklearn import pipeline
from sklearn.model_selection import GridSearchCV

In [35]:
rf_clf = RandomForestClassifier(random_state=42)

In [52]:
params_grid = {'max_depth': [3, 4, 5, None],
               'min_samples_split': range(2, 10),
               'min_samples_leaf': range(2, 10),
               'criterion': ['gini', 'entropy'],
               'n_estimators': [500]}

In [53]:
grid_search = GridSearchCV(rf_clf, params_grid,
                          n_jobs=-1, cv=5,
                          verbose=1, scoring='accuracy')

In [54]:
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 512 candidates, totalling 2560 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   22.2s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:   60.0s
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  2.0min
[Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 1242 tasks      | elapsed:  5.2min
[Parallel(n_jobs=-1)]: Done 1792 tasks      | elapsed:  7.4min
[Parallel(n_jobs=-1)]: Done 2442 tasks      | elapsed: 10.5min
[Parallel(n_jobs=-1)]: Done 2560 out of 2560 | elapsed: 11.0min finished


GridSearchCV(cv=5, error_score='raise-deprecating',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False),
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'max_depth': [3, 4, 5, None], 'min_samples_split': range(2, 10), 'min_samples_leaf': range(2, 10), 'criterion': ['gini', 'entropy'], 'n_estimators': [500]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='accuracy', verbose=1)

In [55]:
grid_search.best_estimator_.get_params()

{'bootstrap': True,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'auto',
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 2,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 500,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}

In [62]:
rf_clf = RandomForestClassifier(max_depth=None, min_samples_leaf=2, 
                                                    min_samples_split=2, criterion='gini',
                                                    n_estimators=500)

In [69]:
ada_clf = AdaBoostClassifier(rf_clf, n_estimators=100)

In [70]:
ada_clf.fit(X_train, y_train)

AdaBoostClassifier(algorithm='SAMME.R',
          base_estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=2, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
          learning_rate=1.0, n_estimators=100, random_state=None)

In [71]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=True)

Train Result:

accuracy score: 0.9449

Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.91      0.92        44
           1       0.95      0.96      0.96        83

   micro avg       0.94      0.94      0.94       127
   macro avg       0.94      0.94      0.94       127
weighted avg       0.94      0.94      0.94       127
 

Confusion Matrix:  
 [[40  4]
 [ 3 80]] 

Average Accuracy: 	 0.7900
Accuracy SD 		 0.1213


In [72]:
print_score(ada_clf, X_train, y_train, X_test, y_test, train=False)

Test Result:

accuracy score: 0.7273

Classification Report:
               precision    recall  f1-score   support

           0       0.50      0.60      0.55        15
           1       0.84      0.78      0.81        40

   micro avg       0.73      0.73      0.73        55
   macro avg       0.67      0.69      0.68        55
weighted avg       0.75      0.73      0.73        55
 

Confusion Matrix:  
 [[ 9  6]
 [ 9 31]] 



***

## Gradient Boosting / Gradient Boosting Machine (GBM)
Works for both regression and classification

 - Sequentially adding predictors
 - Each one correcting it's predecessor
 - Fit new predictor to the residual errors
 
 **Step 1.**
 $$Y=F(x)+\epsilon$$
 **Step 2.**
 $$\epsilon=G(x)+\epsilon_2$$
 Subsituting (2) into (1), we get:
 $$Y=F(X)+G(x)+\epsilon_2$$
 **Step 3.**
 $$\epsilon_2=H(x)+\epsilon_3$$
 Now:
 $$Y=F(x)+G(x)+H(x)+\epsilon_3$$
 Finally, by adding weighting:
 $$Y=\alpha F(x)+\beta G(x)+\gamma H(x)+\epsilon_4$$