# What is Ensemble Learning?

Ensemble Learning = Combining multiple models to make a stronger, more accurate model.

Simple idea:

‚ÄúMany weak learners together become a strong learner‚Äù

Why?

One model may make mistakes

Multiple models reduce error

Increases accuracy

Reduces overfitting

Improves generalization

# Types of Ensemble Techniques

* There are 3 main categories

* Type	        Meaning	Example
* Bagging	       Train models independently on different samples	-> Random Forest
* Boosting	   Train models sequentially, next model fixes previous mistakes	->AdaBoost, Gradient Boosting, XGBoost
* Stacking	   Combine predictions of multiple models using another model	->Meta Learner

# BAGGING (Bootstrap Aggregating)
üéØ Theory (Easy Words)

Steps:
1Ô∏è‚É£ Randomly pick different samples from dataset (with replacement)
2Ô∏è‚É£ Train separate models on each dataset
3Ô∏è‚É£ Combine predictions by:
‚úî Majority Voting (Classification)
‚úî Averaging (Regression)

Most popular bagging model = Random Forest

Why Random Forest?

Creates many Decision Trees

Each tree trained on different data

Final output = majority vote

‚úî Reduces Overfitting
‚úî Stable and Accurate

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [4]:
data = load_iris()
X = data.data
y = data.target

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

In [10]:
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,           # control tree growth
    min_samples_split=5,    # avoid too small leaf
    min_samples_leaf=3,
    random_state=42
)

In [11]:
model.fit(X_train, y_train)

In [12]:
y_pred = model.predict(X_test)

In [13]:
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0


# BOOSTING
üéØ Theory (Very Simple)

Boosting trains models one after another.

Working Idea:
1Ô∏è‚É£ Train first weak model
2Ô∏è‚É£ See mistakes
3Ô∏è‚É£ Give more weight to wrong samples
4Ô∏è‚É£ Train next model to correct mistakes
5Ô∏è‚É£ Repeat
6Ô∏è‚É£ Final output = weighted vote

Boosting = focus on hard samples

Advantages:
‚úî Very high accuracy
‚úî Works well for complex data
‚úî Reduces bias

Popular Boosting Algorithms:

AdaBoost

Gradient Boosting

XGBoost

LightGBM

CatBoost

# Adaboost

In [24]:
from sklearn.ensemble import AdaBoostClassifier

In [20]:
model = AdaBoostClassifier(
    n_estimators=100,
    learning_rate=0.8
)

In [21]:
model.fit(X_train, y_train)



In [22]:
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.9777777777777777


# Gradientboosting

In [23]:
from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier(
    n_estimators=150,
    learning_rate=0.1,
    max_depth=3
)

gb.fit(X_train, y_train)
y_pred = gb.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0


# xgb

In [14]:
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

xgb = XGBClassifier(
    n_estimators=150,
    learning_rate=0.1,
    max_depth=3,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    use_label_encoder=False,
    eval_metric='logloss'
)

xgb.fit(X_train, y_train)
y_pred = xgb.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))


Parameters: { "use_label_encoder" } are not used.



Accuracy: 1.0


# STACKING

Stacking = combining different model types

Example:

Model 1 ‚Üí Logistic Regression

Model 2 ‚Üí SVM

Model 3 ‚Üí RandomForest

Their outputs are fed into
‚û°Ô∏è Final Model (Meta Model)
‚û°Ô∏è Makes final decision

Why Stacking?
‚úî Uses strengths of multiple models
‚úî Highest accuracy usually

In [25]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

estimators = [
    ('svm', SVC(probability=True)),
    ('tree', DecisionTreeClassifier())
]

final_model = LogisticRegression()

stack_model = StackingClassifier(
    estimators=estimators,
    final_estimator=final_model
)

stack_model.fit(X_train, y_train)
y_pred = stack_model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0
