# Bagging (Bootstrap Aggregating) 

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

sns.set(style="whitegrid")

In [3]:
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 1. Baseline: Single Decision Tree

We first train a **single decision tree** as our baseline model.

In [4]:
# Single Decision Tree
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
y_pred_dt = dt.predict(X_test)

print("Single Decision Tree Accuracy:", accuracy_score(y_test, y_pred_dt))

Single Decision Tree Accuracy: 1.0


## 2. Bagging with Decision Trees

Now we use **BaggingClassifier** with multiple decision trees (default: bootstrap sampling).

In [5]:
# Bagging with Decision Trees
bag_clf = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=50,
    random_state=42
)
bag_clf.fit(X_train, y_train)
y_pred_bag = bag_clf.predict(X_test)

print("Bagging (Decision Trees) Accuracy:", accuracy_score(y_test, y_pred_bag))

Bagging (Decision Trees) Accuracy: 1.0




## Key Takeaways

- A single decision tree may overfit the training data.

- Bagging trains **many decision trees on different samples** and combines their results.

- This usually improves accuracy and reduces overfitting.

- Bagging is the foundation of more advanced ensemble methods like **Random Forests**.


