# Ensemble learning

In this small notebook, we are going to take a look at three differen ensemble learning  techniques:

1. **Bagging:** Bagging is mostly used to *reduce the variance* in a model. A simple example of bagging is the Random Forest algorithm.

2. **Boosting:** Boosting is mostly used to *reduce the bias* in a model. Examples of boosting algorithms are Ada-Boost and Gradient Boost
3. **Stacking:** Stacking is mostly used to *increase the prediction accuracy* of a model. For implementing stacking you can use [the mlextend library provided by scikit learn](https://rasbt.github.io/mlxtend/).

This notebook is based on [this blog on medium.com](https://medium.com/@saugata.paul1010/ensemble-learning-bagging-boosting-stacking-and-cascading-classifiers-in-machine-learning-9c66cb271674). We have updated the python code and added more concise (and clear) description of the techniques.

# Bagging classifiers

Bagging is a technique that involves creating multiple subsets of the training data through random sampling with replacement (bootstrap sampling). Each subset is used to train a separate base model, typically of the same type (e.g., decision trees). These base models are trained independently and in parallel. During the prediction phase, the outputs of all base models are combined, usually by averaging (for regression) or voting (for classification), to obtain the final prediction.

The key idea behind bagging is to reduce the *variance of individual models* by introducing randomness in the training process. Since each base model is trained on a slightly different subset of the data, they tend to have different strengths and weaknesses. Combining their predictions helps to reduce the impact of individual model errors and improves the overall predictive performance.

<img src="../Images/more-bagging.jpeg" style="width:400px;"/>

Have a look at [the documentation of `sklearn.ensemble.BaggingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html) to see what is actually happening. Also, read the [BaggingClassifier user guide](https://scikit-learn.org/stable/modules/ensemble.html#bagging) for more information.

In [None]:
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.datasets import load_iris

data = load_iris()
X, y = data['data'], data['target']
RANDOM_SEED = 42

In [None]:
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, ExtraTreesClassifier, BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import RidgeClassifier, LogisticRegression

from sklearn.model_selection import cross_val_score

clf_array = [ RandomForestClassifier(n_estimators=10, random_state=RANDOM_SEED),
    ExtraTreesClassifier(n_estimators=5, random_state=RANDOM_SEED),
    KNeighborsClassifier(n_neighbors=2),
    SVC(C=10000.0, kernel='rbf', random_state=RANDOM_SEED),
    RidgeClassifier(alpha=0.1, random_state=RANDOM_SEED),
    LogisticRegression(C=20000, penalty='l2', random_state=RANDOM_SEED),
    DecisionTreeClassifier(criterion='gini', max_depth=2, random_state=RANDOM_SEED),
    AdaBoostClassifier(n_estimators=5,learning_rate=0.001)]

labels = [clf.__class__.__name__ for clf in clf_array]

normal_accuracy = []
normal_std = []
bagging_accuracy = []
bagging_std = []

for clf in clf_array:
    cv_scores = cross_val_score(clf, X, y, cv=3, n_jobs=-1)
    bagging_clf = BaggingClassifier(clf, max_samples=0.4, max_features=3, random_state=RANDOM_SEED)
    bagging_scores = cross_val_score(bagging_clf, X, y, cv=3, n_jobs=-1)
    
    n_acc = np.round(cv_scores.mean(), 4)
    n_std = np.round(cv_scores.std(), 4)
    
    b_acc = np.round(bagging_scores.mean(), 4)
    b_std = np.round(bagging_scores.std(), 4)
    
    normal_accuracy.append(n_acc)
    normal_std.append(n_std)
    
    bagging_accuracy.append(b_acc)
    bagging_std.append(b_std)
    
    print(f'Accuracy: {n_acc} (+/- {n_std}) [Normal {clf.__class__.__name__}]')
    print(f'Accuracy: {b_acc} (+/- {b_std}) [Bagging {clf.__class__.__name__}]')
    print ('')
    
    
    
    

In [None]:
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

fig, ax = plt.subplots(figsize=(20,10))
n_groups = 8
index = np.arange(n_groups)
bar_width = 0.35
opacity = .7
error_config = {'ecolor': '0.3'}
normal_clf = ax.bar(index, normal_accuracy, bar_width, alpha=opacity, color='g', yerr=normal_std, error_kw=error_config, label='Normal Classifier')
bagging_clf = ax.bar(index + bar_width, bagging_accuracy, bar_width, alpha=opacity, color='c', yerr=bagging_std, error_kw=error_config, label='Bagging Classifier')

ax.set_xlabel('Classifiers')
ax.set_ylabel('Accuracy scores with variance')
ax.set_title('Difference between normal and bagged classifier')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels((labels))
ax.legend()#fig.tight_layout()plt.show()

# Boosting classifiers

Boosting is another ensemble learning technique. It aims to improve the performance of a single model by iteratively training weak models and combining them into a strong model.

The main idea behind boosting is to sequentially train a series of weak models, where each subsequent model tries to correct the mistakes made by the previous models. The weak models are typically simple models, such as decision trees with limited depth, called "weak learners."

The key concept in boosting is the emphasis on misclassified examples during the training process. By iteratively focusing on difficult examples, boosting algorithms can create a strong model that performs well even on complex tasks. The most popular boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost.

<img src="../Images/more-boosting.webp" style="width:400px;"/>

Have a look at [the documentation for the `GradientBoostingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html) and [the documentation for `AdaBoostClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) to see what is going on.

In [None]:
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, 

ada_boost = AdaBoostClassifier(n_estimators=5)
grad_boost = GradientBoostingClassifier(n_estimators=10)
boosting_labels = ['Ada Boost', 'Gradient Boost']

for clf, label in zip([ada_boost, grad_boost], boosting_labels):
    scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
    print(f'Accuracy: {scores.mean():.3f}, Variance: (+/-) {scores.std():.3f} [{label}]')

# Stacking classifiers

Stacking, also known as stacked generalization, is a technique that involves training multiple models in a hierarchical manner. It consists of two or more levels of models. In the first level, also known as the base level, several different models are trained on the training data. Then, their predictions are used as input features to train a meta-model, often called the "blender" or "meta-learner," in the second level.

The idea behind stacking is to learn how to combine the predictions of different models using another model. The base models capture different aspects of the data, and the meta-model learns how to weight their predictions to make the final prediction. Stacking allows for more complex relationships and interactions among the models, as the meta-model can learn to make predictions based on the patterns observed in the base models' outputs.

<img src="../Images/more-stacking.webp" style="width:400px;"/>

Have a look at [the documentation of `StackingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html)

In [None]:
from sklearn.ensemble import StackingClassifier

clf_array = [RandomForestClassifier(n_estimators=10, random_state=RANDOM_SEED),
    ExtraTreesClassifier(n_estimators=5, random_state=RANDOM_SEED),
    KNeighborsClassifier(n_neighbors=2),
    SVC(C=10_000.0, kernel='rbf', random_state=RANDOM_SEED),
    RidgeClassifier(alpha=0.1, random_state=RANDOM_SEED),
    LogisticRegression(C=20_000, penalty='l2', random_state=RANDOM_SEED),
    DecisionTreeClassifier(criterion='gini', max_depth=2, random_state=RANDOM_SEED),
    AdaBoostClassifier(n_estimators=100),
    LogisticRegression(random_state=RANDOM_SEED)]

# meta classifiers
clf = StackingClassifier(estimators=clf_array)

labels = [clf.__class__.__name__ for clf in clf_array]
acc_list = []
var_list = []

for clf, label in zip(clf_array, labels):
    cv_scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
    print(f'Accuracy: {cv_scores.mean():0.4f} (+/- {cv_scores.std():0.4f}) [{label}]')
    acc_list.append(np.round(cv_scores.mean(),4))
    var_list.append(np.round(cv_scores.std(),4))
    #print("Accuracy: {} (+/- {}) [{}]".format(np.round(scores.mean(),4), np.round(scores.std(),4), label))

The main difference between bagging and stacking is the way they combine multiple models. *Bagging* combines the predictions of independent models through averaging or voting, while *stacking* trains a meta-model to learn how to combine the predictions of different base models. Bagging focuses on reducing variance, while stacking aims to leverage the strengths of individual models and learn how to best combine them.