# Introduction
In this notebook we will cover the following topics:
- Bagging
- Boosting

These techniques are related and are crucial to understand as they are valuable in a plethora of circumstances. They are based on the idea that a combination of multiple models produce a more powerful estimator. They are ensemble methods. 

In [39]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Ensemble methods
Ensemble models are based on the idea that combining several 'weak' models result in a 'strong' model by combining the predictions. 

An example would be combining several decision trees and take the average prediction. If there are 100 trees, and 60 trees predict *yes*, then the final model will predict *yes*.

Given a complex dataset and a base model (weak learner) we can train the model on the dataset. Often, a single model does not perform well, because it has either a high bias or high variance. The main idea is to try reducing the bias or variance of the base models by combining multiple base models to create an ensemble model that achieves better performance.

For a refresher of the bias-variance tradeoff, you can read the <a href="https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff" target="_blank">wiki</a>

In general, there are three major algorithms that can combine multiple base models:
- Bagging: learns homogeneous base models independently in parallel and combines them through an averaging process.
- Boosting: learns homogeneous base models sequentially.
- Stacking: learns heterogenous base models in parallel and combines them by training a meta-model to output a prediction.

A visualization of the idea (<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">src</a>)

![img](../Images/ensemble.png)

# Bagging (Bootstrap Aggregating)
Bagging stands for bootstrap aggregating and consists of training several models in parallel with replacement. Let's get into boostrapping first.

## Bootstrapping
Given a dataset $N$, generate bootstrap samples of size $B$ by randomly drawing with replacement $B$ observations.

$$N= \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}$$

Given $B = 3$

$$\text{Bootstrap samples} = \begin{bmatrix} 1 & 3 & 8 \\ 7 & 6 & 6 \\ ... \\ 9 & 3 & 2 \end{bmatrix}$$

Makes the approximation that data is being drawn from the true underlying distribution and independently from each other.

This is useful to measure the variance in a dataset.

## Bagging
When training a model, this model is subject to variability. This means that if it was trained on another dataset, we would have a different model. Therefore, it is helpful to create several independent models and average their predictions. By averaging the base models, we obtain a final model with less variance.

In [35]:
X, y = make_classification(n_samples=100, n_features=4,
                          n_informative=2, n_redundant=2,
                          random_state=0, shuffle=False)
clf = BaggingClassifier(estimator=SVC(), n_estimators=10, random_state=0)

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [36]:
clf.fit(X_train, y_train)

In [37]:
pred = clf.predict(X_test)

In [38]:
accuracy_score(y_test, pred)

1.0

# Boosting
With boosting, models are fitted iteratively, thus sequentially. It produces a model that is in general less biased.

Given a sequence of models, each new model gives more importance to observations in the dataset that were badly handled. Thus, the weights are adjusted.

A visualization of the idea (<a href="https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205">src</a>)

![img](../Images/boosting.png)



In [41]:
clf = AdaBoostClassifier(n_estimators=100, random_state=0)
clf.fit(X_train, y_train)

pred = clf.predict(X_test)
accuracy_score(y_test, pred)

0.96

## Conclusion
Every type of model has its own unique properties that work best with specific kinds of data. Bagging is often used to reduce variance. This can be very handy for noisy data. Boosting is often used to reduce bias. 