# Ensemble Methods: Bagging (Bootstrap Aggregating)

## Context
In observability, models can easily overfit to noise. For instance, a single decision tree trained to predict if a server will run out of memory might perfectly memorize past traffic spikes but fail entirely in production.

**Bagging (Bootstrap Aggregating)** solves this by:
1. Creating multiple subsets of the original data (Bootstrapping).
2. Training a weak model (like a Decision Tree) on each subset.
3. Aggregating their predictions (Voting for classification, Averaging for regression).

The most famous bagging algorithm is the **Random Forest**.

## Objectives
- Synthesize a dataset for predicting if a server will trigger an OOM (Out of Memory) alert.
- Train a standard `DecisionTreeClassifier` and observe its variance/overfitting.
- Train a `RandomForestClassifier` (a Bagging ensemble) and compare its stability.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

### 1. Generating Infrastructure Telemetry
We will predict `OOM_Alert` (0=Healthy, 1=Out of Memory) based on `CPU_Usage`, `Memory_Usage`, and `Concurrent_Connections`.

In [None]:
np.random.seed(42)
n_samples = 500

X = pd.DataFrame({
    'CPU_Usage': np.random.normal(60, 20, n_samples),
    'Memory_Usage': np.random.normal(70, 15, n_samples),
    'Concurrent_Connections': np.random.normal(1500, 500, n_samples)
})

# If Memory > 85 and Connections > 1800, high chance of OOM
y = ((X['Memory_Usage'] > 85) & (X['Concurrent_Connections'] > 1800)).astype(int)

# Add some random noise so it isn't a perfect rule
flip_indices = np.random.choice(n_samples, size=30, replace=False)
y[flip_indices] = 1 - y[flip_indices]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### 2. The Problem with a Single Decision Tree
A single decision tree without depth limits will grow until it perfectly memorizes the training data (overfitting). When it sees the test data, performance drops.

In [None]:
single_tree = DecisionTreeClassifier(random_state=42)
single_tree.fit(X_train, y_train)

print("Single Tree - Training Accuracy: {:.2f}%".format(accuracy_score(y_train, single_tree.predict(X_train))*100))
print("Single Tree - Testing Accuracy: {:.2f}%".format(accuracy_score(y_test, single_tree.predict(X_test))*100))


### 3. Random Forest (Bagged Decision Trees)
Random Forest builds many decision trees (e.g., 100). 
Each tree trains on a random sample of the rows AND a random sample of the columns. The final prediction is a democratic vote among all 100 trees, preventing the model from deeply memorizing noise.

In [None]:
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

print("Random Forest - Training Accuracy: {:.2f}%".format(accuracy_score(y_train, rf_model.predict(X_train))*100))
print("Random Forest - Testing Accuracy: {:.2f}%".format(accuracy_score(y_test, rf_model.predict(X_test))*100))

# Notice how the Testing Accuracy improves! The Random Forest is much more generalized.

### 4. Generic Bagging Classifier
You don't have to just bag Decision Trees. You can bag any model (Support Vector Machines, KNN). Scikit-learn provides a wrapper `BaggingClassifier` for this.

In [None]:
# Here we explicitly bag 50 decision trees
bagging_model = BaggingClassifier(
    estimator=DecisionTreeClassifier(),
    n_estimators=50,
    random_state=42
)
bagging_model.fit(X_train, y_train)

print("Generic Bagging - Testing Accuracy: {:.2f}%".format(accuracy_score(y_test, bagging_model.predict(X_test))*100))

### Summary
- **Pros of Bagging:** Massively reduces overfitting, making models highly robust to noise in telemetry data. Works "out-of-the-box" very well.
- **Cons of Bagging:** Harder to interpret than a single decision tree (you can't easily visualize a forest of 100 trees), and takes more compute power to train.