# Week 10 — Ensemble Methods: Boosting & Bagging

**Course:** Applied ML Foundations for SaaS Analytics  
**Week Focus:** Combine multiple models for robust predictions.

---

## 🎯 Learning Objectives

- Understand ensemble methods: Bagging, Boosting, Stacking
- Implement Gradient Boosting & XGBoost
- Tune ensemble hyperparameters
- Compare single vs ensemble model performance
- Build production-grade ensemble pipelines

In [None]:
import pandas as pd, numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, precision_score, recall_score

# Load & prepare data
subs = pd.read_csv('../data/subscriptions.csv')
feature_usage = pd.read_csv('../data/feature_usage.csv')
user_events = pd.read_csv('../data/user_events.csv')

# Quick feature engineering
engagement = feature_usage.groupby('user_id').agg({'usage_count': 'sum', 'feature_name': 'nunique'}).reset_index()
engagement.columns = ['user_id', 'total_usage', 'features_adopted']
events = user_events.groupby('user_id').size().reset_index(name='total_events')

df = subs[['user_id', 'tenure_days', 'mrr', 'churn_date']].merge(engagement, on='user_id', how='left')
df = df.merge(events, on='user_id', how='left').fillna(0)
df['churned'] = df['churn_date'].notna().astype(int)

X = df[['tenure_days', 'mrr', 'total_usage', 'features_adopted', 'total_events']]
y = df['churned']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Dataset: {len(df)} customers | {y.sum()} churned ({100*y.mean():.1f}%)")
print(f"Train: {len(X_train)} | Test: {len(X_test)}")

## Part 1: Bagging vs Boosting

**Bagging** (Random Forest): Train multiple models on random subsets. Average predictions. Reduces variance.

**Boosting** (Gradient Boosting): Train sequentially, each corrects previous. Weight hard examples. Reduces bias.

**💡 Depth Note:** When does boosting outperform bagging? Compare on imbalanced datasets.

In [None]:
# Bagging: Random Forest
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
rf_auc = roc_auc_score(y_test, rf.predict_proba(X_test)[:, 1])

# Boosting: Gradient Boosting
gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=5, random_state=42)
gb.fit(X_train, y_train)
gb_auc = roc_auc_score(y_test, gb.predict_proba(X_test)[:, 1])

print("="*60)
print("ENSEMBLE COMPARISON")
print("="*60)
print(f"Random Forest (Bagging) AUC: {rf_auc:.4f}")
print(f"Gradient Boosting AUC:       {gb_auc:.4f}")
print(f"Winner: {'Boosting' if gb_auc > rf_auc else 'Bagging'} (+{abs(gb_auc - rf_auc):.4f})")

print(f"\nTop features (GB):")
features = X.columns
for feat, imp in sorted(zip(features, gb.feature_importances_), key=lambda x: x[1], reverse=True)[:3]:
    print(f"  {feat:.<25} {imp:.2%}")

## Part 2: Voting Classifier (Stacking)

In [None]:
# Combine 3 different models
voting = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression(random_state=42, max_iter=1000)),
        ('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
        ('gb', GradientBoostingClassifier(n_estimators=50, random_state=42))
    ],
    voting='soft'
)
voting.fit(X_train, y_train)
voting_auc = roc_auc_score(y_test, voting.predict_proba(X_test)[:, 1])

print(f"Voting Classifier AUC: {voting_auc:.4f}")
print(f"Improvement over best single: {voting_auc - max(rf_auc, gb_auc):.4f}")

print("\n💡 Ensemble combinations work best when:")
print("  - Models have different architectures")
print("  - Models make different types of errors")
print("  - Diversity + accuracy balance")

## Part 3: Hyperparameter Tuning

**💡 Depth Note:** Grid search over ensemble parameters. Trade-off between complexity and performance?

In [None]:
# Simple parameter sweep
learning_rates = [0.01, 0.05, 0.1, 0.2]
n_estimators_list = [50, 100, 200]
results = []

for lr in learning_rates:
    for n in n_estimators_list:
        gb_test = GradientBoostingClassifier(learning_rate=lr, n_estimators=n, max_depth=5, random_state=42)
        gb_test.fit(X_train, y_train)
        auc = roc_auc_score(y_test, gb_test.predict_proba(X_test)[:, 1])
        results.append({'LR': lr, 'N': n, 'AUC': auc})

results_df = pd.DataFrame(results)
best = results_df.loc[results_df['AUC'].idxmax()]
print(f"Best hyperparameters:")
print(f"  Learning Rate: {best['LR']}")
print(f"  N Estimators: {int(best['N'])}")
print(f"  AUC: {best['AUC']:.4f}")

## Hands-On Exercises

### Exercise 1: XGBoost Implementation
Compare XGBoost vs Gradient Boosting. Is faster training worth accuracy loss?

### Exercise 2: Stacking Levels
Build a 2-level stacker: base models → meta-learner. How much improvement?

In [None]:
# TODO: Implement XGBoost
# TODO: Build 2-level stacking

## Key Takeaways

Takeaway checklist for this week...

## 🔜 Next Week: Deep Learning Fundamentals