# Lecture 9.2 Ensemble Boosting 

**Boosting** is an ensemble method that trains predictors sequentially, each trying to correct the errors made by the previous predictor. In this lecture we consider two well known boosting methods, namely **AdaBoost** and **gradient boosting**. 

### 9.2.1 Adaboost

### What is AdaBoost?

AdaBoost, short for **Adaptive Boosting**, is a powerful ensemble learning technique used for classification problems. It works by combining multiple weak learners (simple models that perform slightly better than random guessing) to create a strong learner that can make accurate predictions.

The main idea behind AdaBoost is to focus on the samples that are difficult to classify correctly by the previous weak learners. It does this by assigning higher weights to the misclassified samples, making the next weak learner pay more attention to these samples during training.

### How Does AdaBoost Work?

AdaBoost works in an iterative manner, where weak learners are added to the ensemble one by one. Here's a step-by-step explanation of the process:

1. **Initialize weights**: Initially, all training samples are assigned equal weights.
2. **Train a weak learner**: A weak learner (e.g., a decision tree stump) is trained on the weighted training data.
3. **Calculate error rate**: The error rate of the weak learner is calculated based on the misclassified samples and their weights.
4. **Adjust weights**: The weights of the misclassified samples are increased, and the weights of the correctly classified samples are decreased. This way, the next weak learner will focus more on the difficult samples.
5. **Add weak learner to ensemble**: The weak learner is added to the ensemble, and its contribution is determined by its error rate. Weak learners with lower error rates contribute more to the final prediction.
6. **Repeat**: Steps 2-5 are repeated until a specified number of weak learners are added or the desired performance is achieved.

### Final Prediction

Once all the weak learners have been trained and added to the ensemble, the final prediction is made by taking a weighted vote of all the weak learners. Each weak learner's prediction is weighted by its contribution (determined by its error rate), and the class with the highest weighted vote is chosen as the final prediction.

### Purpose of AdaBoost

AdaBoost is a powerful technique for several reasons:

1. **Improved Accuracy**: By combining multiple weak learners, AdaBoost can achieve higher accuracy than any single weak learner alone.
2. **Robustness**: AdaBoost is relatively robust to overfitting, as the ensemble combines multiple models, reducing the impact of individual model errors.
3. **Feature Selection**: AdaBoost can be used for feature selection by analyzing the importance of features in the ensemble model.
4. **Flexibility**: AdaBoost can work with various types of weak learners, such as decision trees, neural networks, or even simple rules.

AdaBoost has been successfully applied in many real-world applications, such as object detection, face recognition, spam filtering, and medical diagnosis.

---

We use our diabete dataset as an example. First Load our diabete dataset

In [13]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Load the dataset
file_path = 'https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-Dataset/master/diabetes.csv'
data = pd.read_csv(file_path)

# Split the data into features (X) and target (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [10]:

# Create an AdaBoost Classifier
ada_clf = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(max_depth=1, random_state=42),
    n_estimators=10,
    algorithm="SAMME.R",
    learning_rate=0.5
)

# Train the AdaBoost Classifier
ada_clf.fit(X_train, y_train)

# Make predictions on the test set
ada_y_pred = ada_clf.predict(X_test)

# Print the classification report
print("AdaBoost Classification Report")
print(classification_report(y_test, ada_y_pred), "\n")

AdaBoost Classification Report
              precision    recall  f1-score   support

           0       0.80      0.87      0.83        99
           1       0.72      0.60      0.65        55

    accuracy                           0.77       154
   macro avg       0.76      0.73      0.74       154
weighted avg       0.77      0.77      0.77       154
 



---

## 9.2.2 Gradient Boost:


### What is Gradient Boosting?

Gradient Boosting is another powerful ensemble learning technique used for regression and classification problems. Like AdaBoost, it combines multiple weak learners (usually decision trees) to create a strong predictive model. However, the way it builds the ensemble is different from AdaBoost.

The main idea behind Gradient Boosting is to sequentially train weak learners on the residual errors of the previous learners, gradually improving the model's performance.

### How Does Gradient Boosting Work?

Gradient Boosting works in an iterative manner, where weak learners are added to the ensemble one by one. Here's a step-by-step explanation of the process:

1. **Initialize model**: Start with a simple model (e.g., a decision tree stump or a constant value) as the initial prediction.
2. **Calculate residuals**: Compute the residuals (errors) between the current prediction and the actual target values.
3. **Fit a weak learner**: Train a weak learner (e.g., a decision tree) on the residuals, not the original data.
4. **Update the model**: Update the ensemble model by adding the new weak learner with a small learning rate (shrinkage factor) to avoid overfitting.
5. **Repeat**: Steps 2-4 are repeated until a specified number of weak learners are added or the desired performance is achieved.

### Final Prediction

Once all the weak learners have been trained and added to the ensemble, the final prediction is made by summing the predictions of all the weak learners in the ensemble.

### Purpose of Gradient Boosting

Gradient Boosting is a powerful technique for several reasons:

1. **Improved Accuracy**: By combining multiple weak learners, Gradient Boosting can achieve higher accuracy than any single weak learner alone.
2. **Robustness**: Gradient Boosting is relatively robust to outliers and noisy data, as it focuses on the residuals rather than the original data.
3. **Feature Selection**: Gradient Boosting can be used for feature selection by analyzing the importance of features in the ensemble model.
4. **Flexibility**: Gradient Boosting can work with various types of weak learners, such as decision trees, neural networks, or even simple rules.

Gradient Boosting has been successfully applied in many real-world applications, such as ranking systems, recommendation engines, and predictive modeling.

### Differences from AdaBoost

While both AdaBoost and Gradient Boosting are ensemble techniques, they differ in their approach:

- AdaBoost focuses on misclassified samples and adjusts their weights, while Gradient Boosting focuses on the residual errors.
- AdaBoost combines weak learners using a weighted voting mechanism, while Gradient Boosting sums the predictions of the weak learners.
- Gradient Boosting is more resistant to overfitting due to the use of a learning rate (shrinkage factor), which limits the contribution of each weak learner.

In general, Gradient Boosting is considered more powerful and flexible than AdaBoost, but it can be more prone to overfitting if not configured properly.

Here we trained a gradiant boost Classifier

In [12]:
from sklearn.ensemble import GradientBoostingClassifier

# Create a Gradient Boosting Classifier
gb_clf = GradientBoostingClassifier(
    max_depth=2,
    n_estimators=150,
    learning_rate=0.2,
    random_state=42
)

# Train the Gradient Boosting Classifier
gb_clf.fit(X_train, y_train)

# Make predictions on the test set
gb_y_pred = gb_clf.predict(X_test)

# Print the classification report
print("Gradient Boosting Classification Report")
print(classification_report(y_test, gb_y_pred), "\n")

Gradient Boosting Classification Report
              precision    recall  f1-score   support

           0       0.80      0.77      0.78        99
           1       0.61      0.65      0.63        55

    accuracy                           0.73       154
   macro avg       0.71      0.71      0.71       154
weighted avg       0.73      0.73      0.73       154
 

