# Boosting

Boosting is an ensemble learning technique which builds a robust machine learning model by eliminating the weaknesses of weak learners. This is achieved through combining multiple weak learners which has high bias but less variance and turning them together into strong learners by minimizing errors. In boosting each training sample is assigned an equal weight, initially. Then, fitted with the model and for the samples with wrong prediction, weight for them is increased. Then the succeeding model tries to compensate for the weakness of its preeceding model. With each succeeeding model, the weak rules from preeceding model are utilized to form a strong model.

For example, we have a task on hand to classify whether a given model is SPAM or NOT-SPAM. To classify emails as SPAM, say, we have to following rules learned by each learner:


    Classifier 1: Emails that contains only links are SPAM.

    Classifier 2: Emails That contains a word like 'million', '$', 'congratulations' etc. are SPAM.

    Classifier 3: Emails from unknown senders are SPAM.



Individually, these rules are not enough to categorize email as SPAN or NOT-SPAM. And model that learns a single rule is a weak model or weak learner. However, together all three models form a strong rule for SPAM classification.

This is exactly what boosting does using different weak classifier in each iteration and forms a strong learner. To convert the weak learner to strong learner, we can combine the learning of each weak learner through

a) Averaging

b) Voting


## Let's now define what weak learners are?

Weak learners are those which can predict better than random guessing. They are prone to overfitting and can't predict well on unseen data.

## How training is done?

Boosting creates an ensemble of such weak learner. For each data sample, algorithm assigns equal weights, initially. But assigns higher weights for those samples to which first weak learner predicts incorrectly. then a higher weight is assigned to those samples and the data is again fed to the second learner. Output from this second learner is again analyzed and for the samples with incorrect prediction, a higher weight is assigned. Then the new weighted inputs are again fed to the third learner and so on.

Steps:
1. First, equal weight is assigned to each data sample and first ML model is trained. The prediction from this model is analyzed.
2. The boosting algorithm asses the output of first model and increases the weight of samples for which incorrect prediction was made by the first model. Then data wight new weight is again fed to the second model which makes prediction again. The output of second model is analyzed and same as above the new weight is assigned to the samples with incorrect predictions.
3. The input is given to third model with new weight and prediction of it is assessed again
4. This process continues until the error is below the expected level.

The second model only focuses on the shortcommings of the first model. This is how the model improves their performance by correcting the mistakes the preceeding one did.

## Matehmatical Model of Boosting
Consider a classifer C that predicts among {1, -1} for any input X, then the error rate on each training sample is given as:

\begin{equation}
err̅ = 1/N ∑_{i=1}^{N} I(y_i != C(x_i))
\end{equation}

and the expected error rate on future predictions is $E_{XY}I(Y != C(X))$.

The main purpose of boosting technique is to sequentially apply the weak classification algorithm to the weighted data such that a sequence of weak classifiers $C_m(x), m=1, 2,..., M$. The prediction from all of them are combined through a weighted average technique to produce a final prediction through strong classifer as:
\begin{equation}
    C(x) = sign(∑_{m=1} ^{M} α_mC_m(x))
\end{equation}

where, $α_i$ is computed by the boosting algorithm and is the weight contribution of each respective $C_m(x)$.

Initially, all of the weights are set to $w_i = 1/N$ where there are N no. of training examples.

## Types of Boosting:
There are three types of boosting:
1. [Adaptive Boosting(AdaBoost)](https://colab.research.google.com/drive/1_cEMx22kpmLZ5vE63lxkq5AzMiaLdYyO#scrollTo=EGnqQmf2SK3t:~:text=1.%20adaptive%20boosting%20(adaboost))
2. [Gradient Boosting]()
3. [Extreme Gradient Boosting(XGBoost)]()


### 1. **Ada**ptive **Boost**ing (AdaBoost)
This is the earliest realization of boosting technique which adapts to the weaknesses of previous model and self corrects in every iteration of boosting process. It is also called AdaBoost. The weak learner in AdaBoost are the decisiont rees with a single split which are called decision stumps. As mentioned above, AdaBoost also assigns the same weight to all the data samples of datasets. Then the weights are adjusted based on the prediction made by each model.

<img src="https://drive.google.com/uc?id=12ruVBqh5LW-Lz-EJq76sw-rEsN4hJAOM" alt="AdaBoost" />

**Notes**:
* Earlier, decision trees were used as a learner. However, we can use any machine learning algorithm that accepts weight on training data set.
* Earlier it was used for classification problem but we can use it for both classification and regression problem.
* AdaBoost continiously builds a decision stumps that compensate on the weaknesses of previous decision stumps. This process continues until it has made the number of stumps we have asked for or until a expected fit is obtained.

In [None]:
import pandas as pd
# Import Adaptive Boosting library from Scikit Learn
from sklearn.ensemble import AdaBoostClassifier

# Import Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier

# Import datasets
from sklearn.datasets import load_breast_cancer

# Import training and test data splitter
from sklearn.model_selection import train_test_split, cross_val_score

# Import confusion matrix
from sklearn.metrics import confusion_matrix, accuracy_score

In [None]:
# Load datasets
breast_cancer_ds = load_breast_cancer()

# Transform X dataframe
X = pd.DataFrame(breast_cancer_ds['data'], columns = breast_cancer_ds['feature_names'])

# Build a target dataframe
y = pd.DataFrame(breast_cancer_ds['target'], columns = ['cancer_type'])

# Check the shape of data
X.shape, y.shape

((569, 30), (569, 1))

In [None]:
# Perform Train test split of the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Create Decision Tree Classifier
dtree_clf = DecisionTreeClassifier()

# Fit the classifier
dtree_clf.fit(X_train, y_train)

# Predict on test samples
y_pred = dtree_clf.predict(X_test)

confusion_matrix(y_test, y_pred)

array([[32,  0],
       [ 7, 75]])

In [None]:
# Print the Accuracy Score
accuracy_score(y_test, y_pred)

0.9385964912280702

In [None]:
#Create Adaboost Classifier
adaboost_clf = AdaBoostClassifier(
    n_estimators=1000,
    base_estimator=dtree_clf,
    learning_rate = 1
                                  )

# Fit the AdaBoost Classifier
adaboost_clf.fit(X_train, y_train)

# Predict on test samples
y_pred = adaboost_clf.predict(X_test)

confusion_matrix(y_test, y_pred)

  y = column_or_1d(y, warn=True)


array([[30,  2],
       [ 9, 73]])

In [None]:
accuracy_score(y_test, y_pred)

0.9035087719298246

### 2. Gradient Boosting


In [None]:
# Import Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifier

gboosting_clf = GradientBoostingClassifier(
    n_estimators= 100,
    learning_rate = 0.1,
    max_depth = 1,
    random_state = 7
)
gboosting_clf.fit(X_train, y_train)

# Predict using the classifier
y_pred = gboosting_clf.predict(X_test)

confusion_matrix(y_test, y_pred)

  y = column_or_1d(y, warn=True)


array([[31,  1],
       [ 1, 81]])

In [None]:
# print accuracy
accuracy_score(y_test, y_pred)

0.9824561403508771