# Lesson 7A: Ensemble Methods TheoryMaster the art of combining models: Bagging, Boosting, and Stacking for superior performance.

## IntroductionImagine you're diagnosing a rare disease. Would you trust one doctor, or get opinions from multiple specialists and combine their diagnoses?Ensemble methods do exactly this with machine learning models - they combine multiple models to achieve better predictions than any single model could.

## Table of Contents1. Why Ensembles Work2. Bagging (Bootstrap Aggregating)3. Random Forests Deep Dive4. Boosting Algorithms5. AdaBoost Theory6. Gradient Boosting Theory7. Stacking & Blending8. When to Use Each Method

In [None]:
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, VotingClassifier, StackingClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import load_breast_cancer, make_classificationfrom sklearn.model_selection import train_test_split, cross_val_scorefrom sklearn.metrics import accuracy_score, classification_reportnp.random.seed(42)print('✅ Libraries loaded')

## Why Ensembles Work**The Wisdom of Crowds:**Mathematically, if you have N independent models each with accuracy p > 0.5, the ensemble accuracy approaches 1.0 as N increases!**Three Key Principles:**1. **Diversity:** Models should make different errors2. **Independence:** Models trained on different data/features3. **Quality:** Individual models must be better than random**Bias-Variance Tradeoff:**- **Bagging:** Reduces variance (Random Forests)- **Boosting:** Reduces bias (XGBoost, AdaBoost)- **Stacking:** Reduces both by learning optimal combination

## Bagging (Bootstrap Aggregating)**Algorithm:**1. Create B bootstrap samples (sample with replacement)2. Train a model on each sample3. Average predictions (regression) or vote (classification)**Math:** For classification with B models:$\hat{y} = \text{mode}(h_1(x), h_2(x), ..., h_B(x))$For regression:$\hat{y} = \frac{1}{B}\sum_{i=1}^{B} h_i(x)$

In [None]:
# Demonstrate variance reductionfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import BaggingClassifierX, y = make_classification(n_samples=1000, n_features=20, random_state=42)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)# Single decision tree (high variance)tree = DecisionTreeClassifier(random_state=42)tree.fit(X_train, y_train)tree_acc = accuracy_score(y_test, tree.predict(X_test))# Bagging reduces variancebagging = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50, random_state=42)bagging.fit(X_train, y_train)bagging_acc = accuracy_score(y_test, bagging.predict(X_test))print(f'Single Tree Accuracy: {tree_acc:.3f}')print(f'Bagging Accuracy: {bagging_acc:.3f}')print(f'Improvement: {(bagging_acc - tree_acc)*100:.1f}%')print('\n✅ Bagging reduces overfitting!')

## Boosting**Key Idea:** Train models sequentially, each focusing on mistakes of previous models.**AdaBoost Algorithm:**1. Start with equal sample weights2. Train weak learner3. Increase weights of misclassified samples4. Repeat, combining with weighted voting**Gradient Boosting:**- Train each model to predict residuals (errors) of previous model- More powerful than AdaBoost- Forms basis of XGBoost, LightGBM, CatBoost

In [None]:
# AdaBoost exampledata = load_breast_cancer()X, y = data.data, data.targetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Weak learner (shallow tree)weak = DecisionTreeClassifier(max_depth=1)  # Called a 'stump'weak.fit(X_train, y_train)weak_acc = accuracy_score(y_test, weak.predict(X_test))# AdaBoost boosts the weak learnerada = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=50)ada.fit(X_train, y_train)ada_acc = accuracy_score(y_test, ada.predict(X_test))print(f'Weak Learner (Stump): {weak_acc:.3f}')print(f'AdaBoost: {ada_acc:.3f}')print(f'Improvement: {(ada_acc - weak_acc)*100:.1f}%')print('\n✅ Boosting turns weak learners into strong ones!')

## Stacking**Meta-Learning Approach:**1. Train multiple diverse base models2. Use their predictions as features3. Train a meta-model to combine them optimally**Advantages:**- Learns optimal combination weights- Can combine very different model types- Often wins Kaggle competitions

In [None]:
# Stacking examplefrom sklearn.ensemble import StackingClassifier# Diverse base modelsestimators = [    ('rf', RandomForestClassifier(n_estimators=50, random_state=42)),    ('ada', AdaBoostClassifier(n_estimators=50, random_state=42)),    ('gb', GradientBoostingClassifier(n_estimators=50, random_state=42))]# Meta-modelstacking = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())stacking.fit(X_train, y_train)stack_acc = accuracy_score(y_test, stacking.predict(X_test))print(f'Stacking Accuracy: {stack_acc:.3f}')print('✅ Stacking combines strengths of different models!')

## Conclusion**When to Use:****Bagging (Random Forests):**- ✅ High variance models (deep trees)- ✅ Want stability and reliability- ✅ Parallel training**Boosting (XGBoost, LightGBM):**- ✅ Want maximum accuracy- ✅ Have clean, well-prepared data- ✅ Can tune hyperparameters carefully**Stacking:**- ✅ Kaggle competitions- ✅ Have diverse strong models- ✅ Computational resources available**Next:** Lesson 7B - Production ensemble implementations!