# Module 1: Introduction to Scikit-Learn

## Part 16: AdaBoost (Adaptive Boosting)

In this section, we will explore AdaBoost (Adaptive Boosting), a popular ensemble learning algorithm used for classification tasks. AdaBoost combines multiple weak learners to create a strong predictive model.

### 16.1 Understanding AdaBoost

AdaBoost is an ensemble learning technique that combines multiple weak learners (often decision trees) to create a strong predictive model. AdaBoost assigns higher weights to misclassified instances, allowing subsequent weak learners to focus on those instances and improve overall prediction accuracy.

The idea behind AdaBoost is to iteratively train weak learners on different subsets of the training data, with each weak learner giving more importance to the misclassified instances from the previous iterations. The final model is an aggregation of the weak learners' predictions, weighted by their performance. Each new weak learner focuses on the instances that the previous learners misclassified, and it assigns higher weights to those instances, effectively "boosting" their importance in the next round of training.

AdaBoost can provide insights into feature importance. By analyzing the contribution of each feature across the ensemble, we can identify the most influential features in the predictive model.

### 16.2 Training and Evaluation

To train an AdaBoost model, we need a labeled dataset with the target variable and the corresponding feature values. The model learns by iteratively training weak learners on different subsets of the training data.

AdaBoost models have hyperparameters that can be tuned to improve performance. These include the number of weak learners (decision trees), the learning rate, and the maximum depth of the trees.

In [None]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report

cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

base_classifier = DecisionTreeClassifier(max_depth=1)
adaboost_classifier = AdaBoostClassifier(base_classifier)
param_grid = {
    'n_estimators': [50, 100, 150],
    'learning_rate': [0.1, 0.5, 1.0]
}

grid_search = GridSearchCV(adaboost_classifier, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print("Best Hyperparameters:")
print(grid_search.best_params_)

best_adaboost_classifier = grid_search.best_estimator_
y_pred = best_adaboost_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
classification_rep = classification_report(y_test, y_pred)
print("Classification Report:\n", classification_rep)

In this example, a breast cancer classification task is performed using the AdaBoost ensemble method with Decision Tree classifiers as weak learners. The Breast Cancer dataset is loaded, split into training and testing sets, and a base classifier (Decision Tree with max depth 1) is created. AdaBoost is then applied to boost the performance of this base classifier.

A grid search is conducted to find the best hyperparameters for AdaBoost, specifically the number of weak classifiers (n_estimators) and the learning rate. GridSearchCV is used to systematically evaluate different combinations of hyperparameters using 5-fold cross-validation on the training data.

The best hyperparameters found are a learning rate of 1.0 and 150 weak classifiers. The best AdaBoost classifier model is then selected based on these hyperparameters.

The model is evaluated on the test data, achieving an impressive accuracy of 98%. The classification report provides additional metrics such as precision, recall, and F1-score for both classes (0 and 1), demonstrating strong performance in identifying breast cancer cases (class 1) with high precision and recall.

Overall, this example showcases the effectiveness of AdaBoost in enhancing the classification performance of a simple Decision Tree classifier on the Breast Cancer dataset, resulting in a highly accurate and reliable predictive model.

### 16.3 Random Forests vs AdaBoost

AdaBoost and Random Forests are both ensemble learning methods used in machine learning, but they are quite different in terms of their underlying principles and how they build and combine individual base models.

Here are the key differences between AdaBoost and Random Forests:

1. Base Learners<br>
AdaBoost (Adaptive Boosting) primarily uses a sequence of weak learners that are trained sequentially. Each new weak learner focuses on the instances that the previous learners misclassified, and it assigns higher weights to those instances, effectively "boosting" their importance in the next round of training.<br> Random Forests, on the other hand, use a collection of decision trees that are trained independently and in parallel. Each tree is constructed by randomly sampling both data points and a subset of features. The results from all the trees are then aggregated to make predictions.

2. Weighting of Instances<br>
AdaBoost assigns different weights to training instances during each iteration to focus on the samples that were misclassified by previous learners. This adaptive weighting of instances is a key feature of AdaBoost.<br> In Random Forests, all training instances are treated equally during the construction of each tree. There is no specific emphasis on misclassified instances from previous trees.

3. Combining Predictions<br>
AdaBoost combines the predictions from all weak learners by giving more weight to those that perform better. It uses a weighted majority vote to make the final prediction.<br> Random Forests combine the predictions by averaging (for regression) or taking a majority vote (for classification) from all individual trees.

4. Parallelization<br>
AdaBoost is inherently sequential, as each new weak learner depends on the outcomes of previous learners.<br> Random Forests can be parallelized, as the individual trees are constructed independently. This makes Random Forests suitable for parallel and distributed computing.

In summary, while both AdaBoost and Random Forests are ensemble methods that aim to improve predictive performance, they differ in how they create and combine base models. AdaBoost focuses on adapting to the most challenging data points during training, while Random Forests use parallelization and randomization to create diverse and robust base models. The choice between the two depends on the problem at hand and the characteristics of the data.

### 16.4 Summary

AdaBoost (Adaptive Boosting) is a powerful ensemble learning algorithm for classification tasks. It combines multiple weak learners to create a strong predictive model. Scikit-Learn provides the necessary classes to implement AdaBoost easily. Understanding the concepts, training, and evaluation techniques is crucial for effectively using AdaBoost in practice.

In the next part, we will explore Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), popular linear classification algorithms.

Feel free to practice implementing AdaBoost using Scikit-Learn. Experiment with different hyperparameter settings, evaluation metrics, and techniques to gain a deeper understanding of the algorithm and its performance.