# Boosting
___

### Overview

Boosting is an ensemble learning technique in supervised machine learning that combines multiple weak learners to create a strong learner, improving predictive accuracy and reducing errors. AdaBoost (Adaptive Boosting) is one of the most popular boosting algorithms, developed by Yoav Freund and Robert Schapire in 1995.

## Underlying Mathematics

AdaBoost works by iteratively training weak classifiers (often decision stumps) and adjusting sample weights based on classification errors. The key mathematical concepts include:

1. Sample Weighting: Initially, all samples have equal weights:

   $$w_i = \frac{1}{N}$$

   where N is the total number of data points.

2. Error Calculation: For each weak classifier, calculate the weighted error:

   $$\epsilon_m = \frac{\sum_{y_i \neq k_m(x_i)} w_i^{(m)}}{\sum_{i=1}^N w_i^{(m)}}$$

   where $k_m(x_i)$ is the prediction of the m-th classifier for sample $x_i$.

3. Classifier Weight: Determine the importance of each classifier:

   $$\alpha_m = \frac{1}{2} \ln\left(\frac{1-\epsilon_m}{\epsilon_m}\right)$$

4. Weight Update: Adjust sample weights for the next iteration:

   $$w_i^{(m+1)} = w_i^{(m)} \exp(-\alpha_m y_i k_m(x_i))$$

5. Final Classifier: The strong classifier is a weighted combination of weak classifiers:

   $$F(x) = \text{sign}\left(\sum_{m=1}^M \alpha_m k_m(x)\right)$$

##### Benefits

1. High Accuracy: AdaBoost can achieve high prediction accuracy by combining weak learners.
2. Reduced Bias: It effectively reduces bias in machine learning models.
3. Ease of Implementation: AdaBoost is relatively easy to implement and interpret.
4. Versatility: It can be used with various types of weak learners and for both classification and regression tasks.
5. Automatic Feature Selection: AdaBoost inherently performs feature selection by focusing on the most informative features.

##### Limitations

1. Sensitivity to Noisy Data: AdaBoost is highly sensitive to noisy data and outliers, which can lead to overfitting.
2. Computational Cost: It can be computationally expensive, especially for large datasets or complex base learners.
3. Sequential Nature: The algorithm's sequential nature makes it difficult to parallelize, limiting scalability for large datasets.
4. Potential for Overfitting: While generally resistant to overfitting, it can still occur, especially with noisy datasets.
5. Slower Than Some Alternatives: AdaBoost can be slower compared to other boosting algorithms like XGBoost.


In [1]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report

from em_el.datasets import load_wine
from em_el.utils import draw_confusion_matrix

In [2]:
wine = load_wine()

In [13]:
X = wine.drop('target', axis=1).to_numpy()
y = wine['target'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), random_state=42, n_estimators = 50,
                             algorithm = "SAMME",
                             learning_rate = 0.5)

ada_clf.fit(X_train, y_train)
ada_y_pred = ada_clf.predict(X_test)

In [7]:
ada_acc = accuracy_score(y_test, ada_y_pred)
ada_clf_rep = classification_report(y_test, ada_y_pred)

print("Ada Accuracy: \n", ada_acc)
print("Ada Classification Report: \n", ada_clf_rep)

Ada Accuracy: 
 0.9166666666666666
Ada Classification Report: 
               precision    recall  f1-score   support

           0       0.93      1.00      0.97        14
           1       0.92      0.86      0.89        14
           2       0.88      0.88      0.88         8

    accuracy                           0.92        36
   macro avg       0.91      0.91      0.91        36
weighted avg       0.92      0.92      0.92        36



In [8]:
# Compare with Single Decision Tree
tree_clf = DecisionTreeClassifier(max_depth=15, random_state=42)
tree_clf.fit(X_train, y_train)
tree_y_pred = tree_clf.predict(X_test)

In [9]:
tree_acc = accuracy_score(y_test, tree_y_pred)
tree_clf_rep = classification_report(y_test, tree_y_pred)

print("Decision Tree Accuracy: \n", tree_acc)
print("Decision Tree Classification Report: \n", tree_clf_rep)

Decision Tree Accuracy: 
 0.9444444444444444
Decision Tree Classification Report: 
               precision    recall  f1-score   support

           0       0.93      0.93      0.93        14
           1       0.93      1.00      0.97        14
           2       1.00      0.88      0.93         8

    accuracy                           0.94        36
   macro avg       0.95      0.93      0.94        36
weighted avg       0.95      0.94      0.94        36



With these starting parameters, the AdaBoost classifier is not outperforming the single tree. Below, we conduct hyperparameter tuning on the AdaBoost classifier to try and improve performance

In [10]:
# Hyperparameter Tuning with GridSearch

param_grid = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.1, 0.5, 1.0],
    'estimator__max_depth': [1, 3, 5],
    'estimator__min_samples_split': [2, 5, 10],
    'estimator__min_samples_leaf': [1, 5, 10]
}

base_estimator = DecisionTreeClassifier(max_depth=1, random_state=42)

# Initialize the AdaBoost classifier
ada_boost = AdaBoostClassifier(base_estimator, algorithm = "SAMME", random_state=42)

# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=ada_boost, param_grid=param_grid, scoring='accuracy', cv=3, n_jobs=-1)

# Fit the grid search to the training data
grid_search.fit(X_train, y_train)

# Extract the best parameters and the best model
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

print("Best Parameters:", best_params)
print("Best Accuracy:", grid_search.best_score_)

Best Parameters: {'estimator__max_depth': 5, 'estimator__min_samples_leaf': 10, 'estimator__min_samples_split': 2, 'learning_rate': 1.0, 'n_estimators': 50}
Best Accuracy: 0.9858156028368795


In [14]:
ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=5, min_samples_leaf=10, min_samples_split=2), random_state=42, n_estimators=50, algorithm="SAMME", learning_rate=1.0)

ada_clf.fit(X_train, y_train)
ada_y_pred = ada_clf.predict(X_test)

ada_acc = accuracy_score(y_test, ada_y_pred)
ada_clf_rep = classification_report(y_test, ada_y_pred)

print("Ada Accuracy: \n", ada_acc)
print("Ada Classification Report: \n", ada_clf_rep)

Ada Accuracy: 
 1.0
Ada Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      1.00      1.00        14
           2       1.00      1.00      1.00         8

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36



With Hyperparameter Tuning, our results are improved considerably