## Can we use Bagging for regression problems?

Yes, Bagging works for regression by training multiple regressors on bootstrap samples and averaging predictions.

## Difference between multiple model training and single model training

Single model uses one algorithm; multiple models combine several weak learners to reduce variance and improve stability.

## Feature randomness in Random Forest

Each split considers a random subset of features, increasing diversity and reducing correlation between trees.

## OOB (Out-of-Bag) Score

Performance measured on samples not selected in bootstrap; acts like built-in cross-validation.

## Feature importance in Random Forest

Measured using Gini importance or permutation importance.

## Working principle of Bagging Classifier

Creates multiple bootstrap datasets, trains base estimators, and aggregates predictions by voting.

## Evaluate a Bagging Classifier

Use accuracy, precision, recall, F1-score, or confusion matrix.

## How a Bagging Regressor works

Trains regressors on bootstrap samples and averages outputs.

## Main advantage of ensemble techniques

Higher accuracy, reduced variance, and robust predictions.

## Main challenge of ensemble methods

High computation and less interpretability.

## Key idea behind ensemble techniques

Combine weak learners to build a strong model.

## Random Forest Classifier

Ensemble of decision trees using bagging and feature randomness.

## Types of ensemble techniques

Bagging, Boosting, Stacking, Voting.

## Ensemble learning

Combines multiple models for improved performance.

## When to avoid ensemble methods

Avoid when data is small, model must be interpretable, or low compute.

## How Bagging reduces overfitting

Reduces variance via bootstrap aggregation.

## Why Random Forest better than a single Decision Tree

Lower variance and more stable performance.

## Role of bootstrap sampling in Bagging

Creates diverse datasets for diverse models.

## Real-world applications of ensemble techniques

Fraud detection, credit scoring, recommendation, diagnosis.

## Difference between Bagging and Boosting

Bagging reduces variance with independent models; boosting reduces bias with sequential models.

## Bagging Classifier using Decision Trees

In [1]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = BaggingClassifier(DecisionTreeClassifier(), n_estimators=50)
model.fit(X_train, y_train)

pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, pred))

Accuracy: 1.0


## Bagging Regressor – MSE

In [2]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = BaggingRegressor(DecisionTreeRegressor(), n_estimators=50)
model.fit(X_train, y_train)

pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, pred))

MSE: 3863.819847191011


## Random Forest Classifier – Feature Importance

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)

print(model.feature_importances_)

[0.03073657 0.01832235 0.06103862 0.03736485 0.00798306 0.00452134
 0.04425565 0.12000526 0.00262873 0.00299603 0.00959061 0.00402896
 0.01655065 0.03035065 0.00256894 0.00426386 0.00550855 0.00654252
 0.0034482  0.00599889 0.11662505 0.02262954 0.11892621 0.10548021
 0.01172715 0.00922915 0.02476114 0.15365065 0.01042811 0.00783848]


## Random Forest Regressor vs Single Tree

In [4]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.metrics import r2_score

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

tree = DecisionTreeRegressor()
rf = RandomForestRegressor()

tree.fit(X_train, y_train)
rf.fit(X_train, y_train)

print("Tree R2:", r2_score(y_test, tree.predict(X_test)))
print("RF R2:", r2_score(y_test, rf.predict(X_test)))

Tree R2: -0.16717320093143462
RF R2: 0.4987935418156928


## Compute OOB Score

In [5]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

rf = RandomForestClassifier(oob_score=True, bootstrap=True)
rf.fit(X, y)

print("OOB Score:", rf.oob_score_)

OOB Score: 0.9533333333333334


## Bagging Classifier with SVM

In [6]:
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = BaggingClassifier(SVC(), n_estimators=25)
model.fit(X_train, y_train)

print("Accuracy:", model.score(X_test, y_test))

Accuracy: 1.0


## RF with different trees

In [7]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

for n in [10, 50, 100, 200]:
    rf = RandomForestClassifier(n_estimators=n)
    rf.fit(X_train, y_train)
    print(n, rf.score(X_test, y_test))

10 0.8666666666666667
50 0.8666666666666667
100 0.8666666666666667
200 0.8666666666666667
