# **Tutorial: Bagging**
### By Kostas Hatalis

**Prerequisite Notebooks:** *Decision Trees, Ensemble Learning*

___

Bagging, also known as bootstrap aggregation, introduced in 1996 by Leo Breiman [1] is an ensemble method involving training the same algorithm many times using different subsets sampled with replacement (known as boostrap sampling) from the training data. It then aggregates their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to **reduce the variance** of a base estimator (e.g., a CART or ANN), by introducing randomization into its construction procedure and then making an ensemble out of it.

*As they provide a way to reduce overfitting, **bagging methods work best with strong and complex models** (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees).*

*When bagging with decision trees, we are less concerned about individual trees overfitting the training data. For this reason and for efficiency, the individual decision trees are grown deep (e.g. few training samples at each leaf-node of the tree) and the trees are not pruned. These trees will have both high variance and low bias. These are important characterize of sub-models when combining predictions using bagging. The only parameters when bagging decision trees is the number of samples and hence the number of trees to include. This can be chosen by increasing the number of trees on run after run until the accuracy begins to stop showing improvement.*

Bagging methods come in many flavours but mostly differ from each other by the way they draw random subsets of the training set:
- **Bagging**: samples are drawn with replacement.
- **Pasting**: samples are drawn without replacement (need big training set).
- **Random Subspaces**: random subsets of the dataset are drawn as random subsets of the features.
- **Random Patches**: when base estimators are built on subsets of both samples and features.

**Advantages:**
- Reduces over-fitting of the model.
- Handles higher dimensionality data very well.
- Maintains accuracy for missing data.

**Disadvantages:**
- Since final prediction is based on the mean predictions from subset trees, it won’t give precise values for the classification and regression model.

**Random Forest**: is an algorithm (explained in another Notebook) that makes a small tweak to Bagging and results in a very powerful prediction method.

Below the process of bootstrapping and bagging is illustrated (from [2]):

<img src="images/bagging.PNG" width="800">


In [1]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets  import load_breast_cancer
from sklearn.model_selection import cross_val_score

# load data
data = load_breast_cancer()
X, y = data.data, data.target

# Instantiate a classification-tree 'dt'
dt = DecisionTreeClassifier(random_state=1)

# Instantiate a BaggingClassifier 'bc'
bc = BaggingClassifier(base_estimator=dt, n_estimators=300)

# Get accuracy score using CV for dt
scores = cross_val_score(dt, X, y, cv=5, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), 'Decision Tree'))

# Get accuracy score using CV for bc
scores = cross_val_score(bc, X, y, cv=5, scoring='accuracy')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), 'Bagging'))

Accuracy: 0.92 (+/- 0.02) [Decision Tree]
Accuracy: 0.96 (+/- 0.03) [Bagging]


___
## **Which is the best:** Bagging or Boosting?

There’s not an outright winner; it depends on the data, the simulation and the circumstances.
Bagging and Boosting decrease the variance of your single estimate as they combine several estimates from different models. So the result may be a model with higher stability.

If the problem is that the single model gets a very low performance, Bagging will rarely get a better bias. However, Boosting could generate a combined model with lower errors as it optimises the advantages and reduces pitfalls of the single model.

By contrast, if the difficulty of the single model is over-fitting, then Bagging is the best option. Boosting for its part doesn’t help to avoid over-fitting; in fact, this technique is faced with this problem itself. For this reason, Bagging is effective more often than Boosting.

___
## **References**

[1] Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.

[2] https://campus.datacamp.com/courses/machine-learning-with-tree-based-models-in-python/

[3] https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

[4] https://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator

[5] https://en.wikipedia.org/wiki/Bootstrap_aggregating