# Boosting

## Introduction to Boosting

Boosting algorithms typically work on top of decision trees. The 3 most common boosting algorithmsa are adaptive boosting, gradient boosting and xgboost (a variant of gradient boosting)

Boosting is an ensemble model which uses a combination of weak learners (typically decision tree stumps) to build a final model

### Hold on . How is boosting different from bagging ?

Boosting and bagging are the two major classes of ensemble methods, which both use many weak learners, typically decision trees, the basic philosophy being that a combination of weak learners is better than a single strong learner

Bagging is used when the goal is to reduce the variance/overfitting of the classifier. In Random Forest, a popular bagging method, we take several random subsets of data, and with each subset train a decision tree with again a subset of features selected randomly. This double randomness (data and features) helps reduce overfitting. Note that a single decision tree is typically overfit (a high variance model, allowed to grow to a large depth), and the ensembling averages out this overfitting

In Boosting, learners are learned sequentially with early learners fitting simple models to the data, and then errors analyzed. Consecutive trees are fit at every step, taking into account in some way the errors/residuals from the previous step. (For example in step 2, increasing the weight of a misclassified example in step 1 , so that you correct for it)



1) In Bagging, each tree is a high variance model (overfits). Ensembling reduces this variance. In boosting, each tree is typically a high bias model (underfits). Ensembling reduces this bias
2) Bagging is a parallel process. Since data is sampled randomly with replacements, N such data samples can be taken parallely and trained, and each model is independent of the others
Boosting is a sequential process. The tree at the second step depends on the model at the first step.


## Comparison of boosting techniques

## Adaboost



In adaboost, at every iteration, up-weighting observations which have a greater error. In addition,each classifier has different weights assigned to it based on the classifierâ€™s performance (more weight is assigned to the classifier when accuracy is more and vice-verse)

Each classifier is typically a simple stump (highly underfit model)

Mathematically

For any classifier i, 

![error](boosting_pic_1.PNG "error rate")

Note that average error ei of classifier i is 1 if none of the model outputs match the GT, and 0 if all model outputs match the GT

The importance of a given classifier in final output is given by

![importance](boosting_pic_2.PNG "error rate")

alphai is 0 when the classifier classifies half the points correctly and half incorrectly

![alpha_plot](boosting_pic_3.PNG "plot")

Look above for the plot of alpha with epsilon



In addition, the weights of every sample point is also changed, incorrect samples are given weights

The equation for that is

![image.png](attachment:bbf760b9-e989-491f-b799-9dce00b0fd46.png)



Note that for point j, if prediction under classifier i Ci(xj) = yj, then the weight is reduced in next iteration

Whereas if prediction is incorrect under classifier i, the weight is increased under next iteration

Zj is the normalization factor to ensure all weights sum to 1

The final prediction of each observation is made by aggregating the weighted average of the prediction made by each classifier. AdaBoost might result in overfitting. Hence, no. of trees should be checked and restricted.


### Advantages and disadvantages of adaboost

Adv

1) Lesser number of hyperparameters, easier to tweak
2) Less prone to overfitting if stumps are chosen
3) Initially built for binary classification but now can be used for text/image classification also

Disadv

1) Very sensitive to noisy data and outliers
2) Slower than Xgboost


## References

1) https://analyticsindiamag.com/a-hands-on-guide-to-hybrid-ensemble-learning-models-with-python-code/
2) https://analyticsindiamag.com/primer-ensemble-learning-bagging-boosting/
3) https://www.kaggle.com/code/prashant111/bagging-vs-boosting/notebook
4) https://analyticsindiamag.com/adaboost-vs-gradient-boosting-a-comparison-of-leading-boosting-algorithms/#:~:text=AdaBoost%20is%20the%20first%20designed,Boosting%20more%20flexible%20than%20AdaBoost.
5) https://www.analyticsvidhya.com/blog/2020/10/adaboost-and-gradient-boost-comparitive-study-between-2-popular-ensemble-model-techniques/
6) https://blog.paperspace.com/adaboost-optimizer/#:~:text=AdaBoost%20is%20an%20ensemble%20learning,turn%20them%20into%20strong%20ones.
7) https://datascience.stackexchange.com/questions/39193/adaboost-vs-gradient-boosting