# Case Study: AdaBoost, Gradient Boosting, and XGBoost Classification using the Breast Cancer Dataset

In this case study, we will classify the Breast Cancer dataset using three boosting algorithms:
1. **AdaBoost**
2. **Gradient Boosting**
3. **XGBoost**

We will go through the following steps:
1. **Importing Libraries**: Required libraries for data preprocessing, model training, and evaluation.
2. **Data Loading and Preprocessing**: Loading the dataset, splitting the data into training and test sets, and standardizing the features.
3. **Model Training**: Training the models using AdaBoost, Gradient Boosting, and XGBoost.
4. **Evaluation**: Making predictions and evaluating the models using confusion matrices and classification reports.

In [None]:
# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
import xgboost as xgb
from sklearn.metrics import classification_report, confusion_matrix

### Step 2: Load and Preprocess the Breast Cancer Dataset

In this step, we will:
- Load the Breast Cancer dataset.
- Extract the features (X) and labels (y).
- Select only the first two features for visualization.
- Split the dataset into training and test sets.
- Standardize the features using `StandardScaler` to ensure that they have zero mean and unit variance.

In [None]:
# Load dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target  # Features & labels

# Select first two features for visualization
X = X[:, :10]

# Split dataset into train & test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display dataset shapes
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Standardize the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Step 3: Train the Boosting Models

Now we will train the following models:
1. **AdaBoost**: Using AdaBoostClassifier.
2. **Gradient Boosting**: Using GradientBoostingClassifier.
3. **XGBoost**: Using XGBoost from the `xgboost` library.

We will train each model on the training data and evaluate it on the test data.

In [None]:
# AdaBoost Model
adaboost_model = AdaBoostClassifier(n_estimators=50, random_state=42)
adaboost_model.fit(X_train, y_train)

# Gradient Boosting Model
gradientboosting_model = GradientBoostingClassifier(n_estimators=50, random_state=42)
gradientboosting_model.fit(X_train, y_train)

# XGBoost Model
xgb_model = xgb.XGBClassifier(n_estimators=50, random_state=42)
xgb_model.fit(X_train, y_train)

### Step 4: Evaluate the Models

We will evaluate the performance of each model using the confusion matrix and classification report.

1. **Confusion Matrix**: Visualize the model's classification results.
2. **Classification Report**: Show detailed metrics such as precision, recall, and F1-score.

In [None]:
# Function to plot confusion matrix
def plot_confusion_matrix(cm, title="Confusion Matrix"):
    plt.figure(figsize=(5, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title(title)
    plt.show()

# AdaBoost Evaluation
y_pred_adaboost = adaboost_model.predict(X_test)
cm_adaboost = confusion_matrix(y_test, y_pred_adaboost)
print("AdaBoost Classification Report:")
print(classification_report(y_test, y_pred_adaboost))
plot_confusion_matrix(cm_adaboost, title="AdaBoost Confusion Matrix")

# Gradient Boosting Evaluation
y_pred_gradientboosting = gradientboosting_model.predict(X_test)
cm_gradientboosting = confusion_matrix(y_test, y_pred_gradientboosting)
print("Gradient Boosting Classification Report:")
print(classification_report(y_test, y_pred_gradientboosting))
plot_confusion_matrix(cm_gradientboosting, title="Gradient Boosting Confusion Matrix")

# XGBoost Evaluation
y_pred_xgb = xgb_model.predict(X_test)
cm_xgb = confusion_matrix(y_test, y_pred_xgb)
print("XGBoost Classification Report:")
print(classification_report(y_test, y_pred_xgb))
plot_confusion_matrix(cm_xgb, title="XGBoost Confusion Matrix")

### Summary

In this notebook, we demonstrated how to:
1. Load and preprocess the Breast Cancer dataset.
2. Train three boosting models: AdaBoost, Gradient Boosting, and XGBoost.
3. Evaluate the models' performance using confusion matrices and classification reports.

This example showcases the power of different boosting algorithms in binary classification tasks and how they can be evaluated in terms of accuracy, precision, recall, and F1-score.