### Stacking Method 

**Stacking** is an **ensemble learning** method where multiple models' predictions are combined using a **meta-model** to improve accuracy. 🚀

**Stacking (Stacked Ensemble) 📊🔥**  

✅ **How it works?**  
1️⃣ **Train Different Models** 🏋️‍♂️  
   - Train multiple models (e.g., Decision Tree, SVM, Random Forest) on the same dataset.  

2️⃣ **Meta-Model Training** 🤖  
   - The predictions from these models are used as **inputs** to a **meta-model** (e.g., Logistic Regression).  
   - The meta-model learns how to combine predictions for better accuracy.  

3️⃣ **Final Prediction** 🎯  
   - The meta-model makes the final decision based on the stacked outputs.  

🔹 **Why use it?**  
✅ More powerful than voting  
✅ Learns better model combinations  
✅ Reduces individual model weaknesses 🚀

In [1]:
import warnings
import pandas as pd 
import numpy as np 

# models 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression 

# model selection 
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import GridSearchCV 

# stacking
from sklearn.ensemble import StackingClassifier

# metric 
from sklearn.metrics import accuracy_score, confusion_matrix

# visualization 
import matplotlib.pyplot as plt 
import seaborn as sns

warnings.filterwarnings('ignore')

In [2]:
# import dataset 

df = pd.read_csv('./preprocessedData/cleanData.csv')

In [3]:
independent = df.drop(columns = 'target')

dependent = df[['target']]

In [4]:
x_train, x_test, y_train, y_test = train_test_split(independent, dependent, test_size = 0.3, stratify = dependent, random_state = 42)

In [5]:
# params 

rf_params = {
    "n_estimators": [50, 100, 200, 500],  
    "max_depth": [10, 20, 30], 
    "min_samples_split": [10, 20, 30],  
    "min_samples_leaf": [10, 20, 30], 
    "bootstrap": [True, False] 
}

dt_params = {
    "criterion": ["gini", "entropy"], 
    "max_depth": [10, 20, 30],  
    "min_samples_split": [2, 5, 10],  
    "min_samples_leaf": [1, 2, 4]  
}

lg_params = {
    "penalty": ["l1", "l2", "elasticnet"],  
    "C": [0.1, 1, 10, 100], 
    "solver": ["liblinear", "lbfgs", "saga"] 
}

knn_params = {
    "n_neighbors": [3, 5, 7, 9],  
    "weights": ["uniform", "distance"],  
    "metric": ["euclidean", "manhattan", "minkowski"] 
}

In [6]:
# models 

rf_model = RandomForestClassifier()

dt_model = DecisionTreeClassifier()

lg_model = LogisticRegression()

knn_model = KNeighborsClassifier()

In [7]:
modelsPreforms = [
    ("Random Forest", rf_params, rf_model),
    ("Decision Tree", dt_params, dt_model),
    ("Logistic Regression", lg_params, lg_model),
    ("K-Neighbors", knn_params, knn_model),
]

In [8]:
models = []

for modelAsset in modelsPreforms:   
    model = GridSearchCV(estimator = modelAsset[2], param_grid = modelAsset[1], cv = 5, n_jobs = -1)
    model.fit(x_train, y_train)
    models.append((modelAsset[0], model.best_estimator_))

In [9]:
estimators = [
    ('Random Forest',  models[0][1]),
    ('Decision Tree', models[1][1]),
    ('lg', models[2][1]),
    ('knn', models[3][1]),
]

In [10]:
meta_model = LogisticRegression()

In [11]:
stackModel = StackingClassifier(estimators = estimators, final_estimator = meta_model)

In [12]:
stackModel.fit(x_train, y_train)

In [13]:
# Test

predT = stackModel.predict(x_test)
accuracy_score(y_true = y_test, y_pred = predT)

0.8916666666666667

In [14]:
predT = stackModel.predict(x_train)
accuracy_score(y_true = y_train, y_pred = predT)

0.9071428571428571