<a href="https://colab.research.google.com/github/samiha-mahin/A-Machine-Learning-Models-Repo/blob/main/Stacking_Ensemble.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # **Stacking Ensemble**



## 🌟 What is Stacking Ensemble?

**Stacking** is an ensemble learning method where **multiple different models** are trained, and then **another model (called a meta-model or blender)** learns to combine their predictions for a better final result.

> 🔸 It’s like forming a team where each member is good at different things, and a smart leader decides how much to trust each member’s opinion.

---

## 🏠 Real-Life Example: Predicting House Prices Again

Suppose you're trying to predict house prices based on features like:

* Area
* Number of rooms
* Location
* Age of building
* Nearby schools

You try different models:

* **Linear Regression** (simple but fast)
* **Decision Tree** (understands non-linear rules)
* **KNN** (looks at nearby examples)
* **XGBoost** (powerful but might overfit)

Each model gives its **own prediction** for a house price.

---

## 💡 Here's Where Stacking Comes In:

Instead of choosing the **best one**, stacking says:

> "Why not use **all of them** and then **train another model** to learn from their predictions?"

---

### 🔧 How Stacking Works (Step-by-Step):

#### 🔹 Step 1: Base Models (Level 0)

Train different models independently on your training data. For example:

* Model 1: Linear Regression
* Model 2: Decision Tree
* Model 3: XGBoost

Each of these models gives their **own prediction** on the data.

#### 🔹 Step 2: Meta-Model (Level 1)

Now you train a **new model** (like Logistic Regression or XGBoost) on the **predictions of those models**.

This model learns:

* When to trust model 1 more
* When to rely on model 2 or 3
* How to combine them to make the **best possible prediction**

This final model is called the **meta-model** or **blender**.

---

## 🧠 Intuition with Example

Imagine you’re the judge of a painting contest 🎨.

You ask:

* An **art teacher** (Model 1)
* A **student artist** (Model 2)
* An **AI model** (Model 3)

They each give a score out of 100 for a painting.

You (Meta-Model) don’t just average them. You’ve learned over time:

* The teacher is very accurate for realism.
* The student is great for creativity.
* The AI is balanced but sometimes off.

So you **learn** how much to trust each score based on context.
That’s exactly what stacking does!

---

## 🔁 Stacking vs Other Ensembles

| Method       | How It Works                                                                                   | Models Used                        |
| ------------ | ---------------------------------------------------------------------------------------------- | ---------------------------------- |
| Bagging      | Trains many **same-type** models on different data (e.g., Random Forest = many decision trees) | Same model type                    |
| Boosting     | Trains models **sequentially**, each fixing the last one’s errors                              | Same model type                    |
| **Stacking** | Trains **different models** together and blends their predictions                              | Different model types + meta-model |

---

## 📌 Summary Table

| Feature                | Explanation                               |
| ---------------------- | ----------------------------------------- |
| Goal                   | Combine strengths of different models     |
| Uses Different Models? | ✅ Yes                                     |
| Learns How to Combine? | ✅ Yes, using a meta-model                 |
| Powerful?              | ✅ Very powerful for complex data          |
| Example Models Used    | Linear, Tree, SVM, XGBoost + a meta-model |
| Good For?              | Tabular data, competitions, robust models |



# **K-Fold Stacking** and **Multi-Layer Stacking**

##  1. What is **K-Fold Stacking**?

### ✨ Quick Reminder:

In **normal stacking**, base models are trained on the training set and their predictions are used to train the meta-model.

But this has a **problem**:

> If the same data is used to train base models **and** train the meta-model, it can **overfit** — the meta-model may just memorize mistakes.

### ✅ Solution: **K-Fold Stacking**

This fixes the overfitting problem by using **cross-validation**.

---

### 🧠 How It Works:

#### 🔄 Step-by-Step:

1. Split your training data into **K folds** (e.g., K = 5).
2. For each fold:

   * Train the base models on K-1 folds.
   * Make predictions on the 1 fold left out (validation fold).
3. Collect all those predictions across the K folds.
4. Now use these predictions as input to train the **meta-model**.

> 🔁 This way, the meta-model only sees **out-of-fold predictions** (not trained-on data), making it more honest and reliable.

---

### 📦 Simple Example (5-Fold Stacking):

| Fold | Base Model Trained On | Predicts For |
| ---- | --------------------- | ------------ |
| 1    | Folds 2–5             | Fold 1       |
| 2    | Folds 1,3,4,5         | Fold 2       |
| 3    | Folds 1,2,4,5         | Fold 3       |
| 4    | Folds 1,2,3,5         | Fold 4       |
| 5    | Folds 1–4             | Fold 5       |

Then these predictions are stacked and used to train the meta-model.

---

## 🌟 Why K-Fold Stacking is Awesome:

| Benefit             | Why It Matters                                  |
| ------------------- | ----------------------------------------------- |
| 🧠 Less Overfitting | Meta-model doesn't see training labels directly |
| 📊 Better Accuracy  | Uses out-of-fold predictions                    |
| 💡 More Reliable    | Mimics real test-time conditions                |

---

## 🌠 2. What is **Multi-Layer Stacking** (a.k.a Deep Stacking)?

### ✨ Idea:

Just like neural networks have multiple **layers**, **stacking can also be multi-layered**.

> In **multi-layer stacking**, the output of one stack becomes the **input to the next level of stacking**.

---

### 🧠 How It Works:

Let’s say you use 3 levels:

* **Level 0**: Base models (e.g., Decision Tree, KNN, XGBoost) → Each gives a prediction
* **Level 1**: Meta-model (e.g., Logistic Regression) learns from Level 0 predictions
* **Level 2**: Another model (e.g., Random Forest) learns from Level 1's predictions

Each layer tries to **learn from the mistakes** or **patterns** missed by the previous layer.

---

### 🏠 Simple Real-Life Analogy:

Think of:

* **Level 0**: School teachers giving you feedback
* **Level 1**: Your private tutor combining all that feedback
* **Level 2**: A coach helping you apply everything effectively in exams

Each level **builds smarter predictions** based on the previous layer’s knowledge.

---

## 🔁 Comparison Table

| Feature             | K-Fold Stacking                  | Multi-Layer Stacking                 |
| ------------------- | -------------------------------- | ------------------------------------ |
| Solves Overfitting? | ✅ Yes (uses out-of-fold preds)   | ✅ Yes, if combined with K-fold       |
| Complexity          | Medium                           | High                                 |
| Layers              | Single (base + meta)             | Multiple layers (like deep learning) |
| Best For            | Avoiding overfitting in stacking | Very complex problems, competitions  |

---

## ✅ Summary

| Term                     | Meaning in Simple Words                                                 |
| ------------------------ | ----------------------------------------------------------------------- |
| **K-Fold Stacking**      | Makes stacking smarter by using cross-validation                        |
| **Multi-Layer Stacking** | Stacking inside stacking! One level learns from another, layer by layer |




In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('/content/heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
X = df.drop(columns=['target'])
y = df['target']

In [4]:
X

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3


In [5]:
y

Unnamed: 0,target
0,1
1,1
2,1
3,1
4,1
...,...
298,0
299,0
300,0
301,0


In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=8)

In [7]:
X_train.shape

(242, 13)

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier

In [13]:
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=10)),
    ('gbdt',GradientBoostingClassifier())
]

In [14]:
from sklearn.ensemble import StackingClassifier

clf = StackingClassifier (
    estimators = estimators,
    final_estimator = LogisticRegression(),
    cv = 10
)

In [15]:
clf.fit(X_train,y_train)

In [16]:
y_pred = clf.predict(X_test)

In [17]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

0.8688524590163934