# Ensemble Methods: Boosting

## Context
While *Bagging* (like Random Forest) trains many independent models in parallel and averages them, **Boosting** trains models **sequentially**. 

In Boosting, each new model looks at the mistakes made by the previous model and tries specifically to correct them. It turns a sequence of "weak learners" (like very shallow decision trees) into a single "strong learner".

In SRE, Boosting algorithms (specifically Gradient Boosting or XGBoost) are often the reigning champions for tabular telemetry data tasks, like predicting Database Locks or SLA breaches, because they are incredibly accurate.

## Objectives
- Generate a challenging SRE dataset: Predicting SLA Breaches based on Latency, Retries, and Error Rates.
- Train an **AdaBoost (Adaptive Boosting)** model.
- Train a **Gradient Boosting** model.
- Compare and contrast the two.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

### 1. Generating SLA Breach Data
Predicting if an API SLA (Service Level Agreement) will breach based on current telemetry.

In [None]:
np.random.seed(42)
n_samples = 1000

X = pd.DataFrame({
    'P99_Latency_ms': np.random.normal(500, 200, n_samples),
    'Error_Rate_pct': np.random.normal(1.5, 1.0, n_samples),
    'Retry_Count': np.random.poisson(lam=5, size=n_samples)
})

# Complex combination leading to an SLA breach (1)
y = ((X['P99_Latency_ms'] > 750) | ((X['Error_Rate_pct'] > 2.5) & (X['Retry_Count'] > 8))).astype(int)

# Inject some noise to make it harder
noise_idx = np.random.choice(n_samples, size=50, replace=False)
y[noise_idx] = 1 - y[noise_idx]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### 2. AdaBoost (Adaptive Boosting)
AdaBoost uses "Decision Stumps" (decision trees with only 1 split, e.g., `if Latency > 750: Breach else Normal`).
After the first stump is trained, it heavily weights the specific data points that it got wrong. The next stump focuses specifically on those hard weights. This repeats, and the final model is a weighted sum.

In [None]:
adaboost = AdaBoostClassifier(n_estimators=50, random_state=42)
adaboost.fit(X_train, y_train)

ada_pred = adaboost.predict(X_test)
print("AdaBoost Testing Accuracy: {:.2f}%".format(accuracy_score(y_test, ada_pred)*100))

### 3. Gradient Boosting
Gradient Boosting also builds trees sequentially. However, instead of simply weighting misclassified points, the next tree tries to predict the **residual error** (the mathematical difference between the prediction and the actual value) of the previous tree using Gradient Descent.

It is generally more powerful and flexible than AdaBoost.

In [None]:
# learning_rate controls how strongly each successive tree tries to correct the errors of the precursor.
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gb_model.fit(X_train, y_train)

gb_pred = gb_model.predict(X_test)
print("Gradient Boosting Testing Accuracy: {:.2f}%".format(accuracy_score(y_test, gb_pred)*100))

### Summary Comparison

- **AdaBoost:** 
  - Focuses on misclassified samples by increasing their weight.
  - Simpler, but can be severely derailed by heavy noise/outliers (because it will obsess over trying to fix an unfixable outlier).
- **Gradient Boosting:** 
  - Focuses on minimizing the loss function (residuals) via gradient descent.
  - Highly flexible, handles complex SRE relationships gracefully, and produces state-of-the-art results for tabular data.
  - Very prone to overfitting if `n_estimators` is too high or `learning_rate` is not tuned properly.