# Gradient Boosting (Adaboost)

For this hands on, we will define Gradient Boosting. I will asumme that the readers are already familiar with 1) Python; 2) decision tree model and 3) MNIST dataset.

OBJECTIVE:
    * Gradient Boosting (Adaboost) using Scikit-learn.
    * Aplying the model to predict binary digit (0 or 1).

## Gradient Boosting (Adaboost)

Adaboost is much stronger than the Logistic Regresssion or the SVM we learnt. Adaboost is highly non-linear and yet does not suffer much from overfitting. Therefore Adaboost is considered as an out-of-the-box machine learning that you can expect to get a decent performance on various challenging tasks.

The motivation behind Adaboost algorithm is to "Iteratively boost the performance of a model by penalizing more on misclassified observations". The keyword here is **Iteratively**. Both Logistic Regression and SVM are one-shot classifiers: they have one model to predict the target variable $y$ and we don't do anyhing further on the misclassification errors made $e = y - \hat{y}$.

As shown in figure, Adaboost works with a set of weak classifiers $\{G_{m}\}_{m=1}^{M}$(for this tutorial, we define the classifiers with simple tree models). A classifier at time = $m$: $G_{m}$ tries to refine from the error made at the previous model iteration $y \neq G_{m-1}(x)$.

<p align="center">
    <img width="50%" src="https://lh4.googleusercontent.com/GLcjrWSUkjadvf0hTV_y7ATgh7l-UQQ12_UYluxQYxxWvSKoP5AJN6cvKS5s-uvO_kR3OVBlgL6Q0MATYoueKF59-eIO718Fz9KsVVcObbO54OhfIEkEYlWrn6vA2rr4qXfn2rbsIkMMnHuWEQ
"> 
</p>

### Adaboost Algorithm

* Initialize the observation weights $w_i = 1/n, i = 1, ..., n$. 
* For m = 1 to M (pre-chosen number of iterations):
    - Fit a classifier G_m(x) to the training data using weights w_{i}
    - Compute $ err_{m} = \frac{ \sum_{i} w_{i} I \{ y_i \neq G_{m}(x_i) \} } {\sum_i w_i} $
    - Compute $a_m = \text{log}_e(\frac{1 - err_m}{err_m} ) $
    - Update weights: $ w_{i}^{new} \leftarrow w_{i} \cdot \text{exp} [ \alpha_{m} \cdot I \{ y_i \neq G_m( x_i ) \} ] $

* Final prediction : $G(x) = \text{sign} [ \sum_i \alpha_m G_m (x) ] $

### Pytorch Code

We are not using Pytorch this time because, Pytorch is developped majorly for designing deep learning models, Adaboost (especially with tree model) has very differnt flavor to the current deep learning standard which assumes gradient based optimizatation. Therefore we will use Scikit-learn packages that contains a collection of useful machine learning tools. 

In [2]:
# lets first import packages we will use

import numpy as np

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from sklearn import metrics

import torch
from torch.utils.data import TensorDataset, DataLoader

from torchvision.datasets import MNIST
from torchvision import transforms

#### preprocess MNIST dataset

In [13]:
# download 
trainD = MNIST(".", download = True, train = True)
testD  = MNIST(".", download = True, train = False)

# to tensor
toTensor = lambda pair: (transforms.ToTensor()(pair[0]), pair[1])
trainD = map(toTensor, trainD)
testD  = map(toTensor, testD)

# subsetting 0 and 1
only01 = lambda pair : pair[1] in [0,1]
trainD = filter(only01, trainD)
testD  = filter(only01, testD)

# normalisation
normalize = lambda pair: (transforms.Normalize(mean=[0], std=[1])(pair[0]), pair[1] )
trainD = map(normalize, trainD)
testD  = map(normalize, testD)

# flatten
flatten = lambda pair: (torch.flatten(pair[0]).detach().numpy(), pair[1] )
trainD = map(flatten, trainD)
testD  = map(flatten, testD)

processedTrainD = np.array(list(trainD))
processedTestD  = np.array(list(testD))

XTrain, yTrain  = zip(*processedTrainD)
XTest,  yTest   = zip(*processedTestD)

In [18]:
# Create adaboost classifer object
abc = AdaBoostClassifier(n_estimators=20)

# Train Adaboost Classifer
model = abc.fit(XTrain, yTrain)

In [23]:
#Predict the response for test dataset
yPred = model.predict(XTest)

print("Accuracy:",metrics.accuracy_score(yTest, yPred))