<p style="background-color:mediumseagreen;font-family:newtimeroman;font-size:200%;color:black;text-align:center;border-radius:60px 20px;"><b>Stacking Method - Titanic Dataset</b></p>

# <font color='mediumseagreen'> <b>Introduction</b><font color='black'>  

In machine learning, combining multiple models to create a stronger and more generalizable model is a common approach. This technique is known as **ensemble methods**, and one of the most popular among them is **Stacking**.

In this notebook, we will thoroughly explore the stacking method and apply it as a classification problem on the **Titanic** dataset.

---

**What is Stacking?**

**Stacking** is an ensemble method used to improve prediction performance by combining multiple machine learning models. In this technique, multiple models work together to create a more robust and general prediction model.

**The Logic of Stacking:**
- Different machine learning algorithms may make different predictions on the data.
- Each algorithm has its strengths and weaknesses. Stacking aims to leverage the strengths of these models.
- First, several base models are trained, and their predictions are then used by a **meta model** (upper model).
- The meta model combines the predictions of the base models to make the final prediction.

**Stacking Steps:**
1. **Base Models:** Various machine learning algorithms are trained in the first layer.
2. **Meta Model:** In the second layer, a meta model is trained that takes the predictions of the base models as input and makes the final prediction based on them.

**Advantages of Stacking:**
- **Generalization Ability:** Reduces the risk of overfitting that a single model may suffer from.
- **Performance:** By combining the strengths of different models, stacking can achieve higher accuracy.
- **Flexibility:** Different model types (tree-based models, linear models, etc.) can be used together.

**Popular Stacking Applications:**
- **Kaggle Competitions:** Stacking is especially popular among users competing in Kaggle competitions.
- **Big Data and Complex Problems:** Stacking is very effective for complex datasets and difficult classifiation/regression problems.


# <font color='mediumseagreen'> <b>Loading The Dataset</b><font color='black'> 

The **Titanic** dataset consists of three files: 

- **train.csv:**
- **test.csv**
- **gender_submission.csv**

**train.csv** to train the model,

**test.csv**used to make predictions. 

**gender_submission.csv** shows how we should present the prediction results.

In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Suppressing warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
train_df = pd.read_csv("train.csv")
train_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [5]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


# <font color='mediumseagreen'> <b>Data Preprocessing</b><font color='black'> 

In this section, missing data will be dealt with and the dataset will be made suitable for machine learning models.

In [7]:
train_df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [9]:
train_df['Age'].fillna(train_df['Age'].median(), inplace=True)

In [11]:
train_df['Embarked'].fillna(train_df['Embarked'].mode()[0], inplace=True)

In [15]:
train_df.drop(columns=["Cabin"], inplace=True)
train_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,S


In [21]:
X = train_df.drop(columns= ["PassengerId", "Survived", "Name", "Ticket"])
y = train_df["Survived"]

In [25]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [42]:
X.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,3,male,22.0,1,0,7.25,S
1,1,female,38.0,1,0,71.2833,C
2,3,female,26.0,0,0,7.925,S
3,1,female,35.0,1,0,53.1,S
4,3,male,35.0,0,0,8.05,S


In [31]:
categorical_features = ["Sex", "Embarked"]

In [33]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(drop='first', sparse_output=False), categorical_features)
    ],
    remainder=StandardScaler()  # Applies StandardScaler to non-categorical columns
)

In [35]:
pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

In [37]:
X_train = pipeline.fit_transform(X_train)
X_val = pipeline.transform(X_val)

In [39]:
print("Transformed Training Data:")
print(X_train)
print("Transformed Validation Data:")
print(X_val)

Transformed Training Data:
[[ 1.          0.          1.         ... -0.47072241 -0.47934164
  -0.07868358]
 [ 1.          0.          1.         ... -0.47072241 -0.47934164
  -0.37714494]
 [ 1.          0.          1.         ... -0.47072241 -0.47934164
  -0.47486697]
 ...
 [ 1.          0.          1.         ...  1.23056874 -0.47934164
  -0.35580399]
 [ 0.          0.          1.         ...  0.37992316  2.04874166
   1.68320121]
 [ 1.          0.          1.         ... -0.47072241  0.78470001
   0.86074761]]
Transformed Validation Data:
[[ 1.          0.          0.         ...  0.37992316  0.78470001
  -0.33390078]
 [ 1.          0.          1.         ... -0.47072241 -0.47934164
  -0.42528387]
 [ 1.          0.          1.         ... -0.47072241 -0.47934164
  -0.47486697]
 ...
 [ 0.          0.          1.         ...  0.37992316  5.8408666
  -0.02308312]
 [ 0.          0.          1.         ... -0.47072241 -0.47934164
  -0.42528387]
 [ 0.          0.          1.         ...  

In [44]:
train_df.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64

# <font color='mediumseagreen'> <b>Training Base Learners</b><font color='black'> 

In this section, various basic models such as Logistic Regression, Random Forest and K-Nearest Neighbor will be trained.

In [46]:
# V1 Base Learners - Logistic Regression, RandomForest, KNN

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

In [48]:
# Define and train V1 base learners
log_clf = LogisticRegression()
log_clf.fit(X_train, y_train)

In [50]:
rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)

In [52]:
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)

In [54]:
# V2 Base Learners - CatBoost, XGBoost, LightGBM

from catboost import CatBoostClassifier
import xgboost as xgb
import lightgbm as lgb

In [56]:
# Define and train V2 base learners
catboost_clf = CatBoostClassifier(verbose=0)
catboost_clf.fit(X_train, y_train)

<catboost.core.CatBoostClassifier at 0x2ae13277f50>

In [58]:
xgboost_clf = xgb.XGBClassifier()
xgboost_clf.fit(X_train, y_train)

In [60]:
lightgbm_clf = lgb.LGBMClassifier()
lightgbm_clf.fit(X_train, y_train)

[LightGBM] [Info] Number of positive: 268, number of negative: 444
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000517 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 198
[LightGBM] [Info] Number of data points in the train set: 712, number of used features: 8
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.376404 -> initscore=-0.504838
[LightGBM] [Info] Start training from score -0.504838


# <font color='mediumseagreen'> <b>Training the Meta Learner</b><font color='black'> 

In this section, the meta learner will be trained using the stacking classifier. This learner will make final predictions using the output of the underlying models.

In [62]:
# Stacking Classifier for V1
from sklearn.ensemble import StackingClassifier, GradientBoostingClassifier

In [64]:
# Define the stacking classifier for V1
estimators_v1 = [
    ('lr', log_clf),
    ('rf', rf_clf),
    ('knn', knn_clf)
]

In [90]:
stacking_clf_v1 = StackingClassifier(
    estimators=estimators_v1,
    final_estimator=GradientBoostingClassifier(),
    cv=15
)

In [92]:
# Train V1 stacking classifier
stacking_clf_v1.fit(X_train, y_train)

In [94]:
# Stacking Classifier for V2
from sklearn.ensemble import StackingClassifier, GradientBoostingClassifier
# Define the stacking classifier for V2
estimators_v2 = [
    ('catboost', catboost_clf),
    ('xgboost', xgboost_clf),
    ('lightgbm', lightgbm_clf)
]

In [96]:
stacking_clf_v2 = StackingClassifier(
    estimators=estimators_v2,
    final_estimator=GradientBoostingClassifier(),
    cv=15
)

In [98]:
# Train V2 stacking classifier
stacking_clf_v2.fit(X_train, y_train)

[LightGBM] [Info] Number of positive: 268, number of negative: 444
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000062 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 198
[LightGBM] [Info] Number of data points in the train set: 712, number of used features: 8
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.376404 -> initscore=-0.504838
[LightGBM] [Info] Start training from score -0.504838
[LightGBM] [Info] Number of positive: 250, number of negative: 414
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000055 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 194
[LightGBM] [Info] Number of data points in the train set: 664, number of used features: 8
[LightGBM] [Info] [binary:BoostFro

# <font color='mediumseagreen'> <b>Model Performance Evaluation</b><font color='black'> 

In this section, the performance of the stacking model will be evaluated with accuracy, ROC-AUC and confusion matrix.

In [100]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Predict and evaluate V1 on validation set
y_val_pred_v1 = stacking_clf_v1.predict(X_val)

# Evaluation metrics for V1
accuracy_v1 = accuracy_score(y_val, y_val_pred_v1)
precision_v1 = precision_score(y_val, y_val_pred_v1)
recall_v1 = recall_score(y_val, y_val_pred_v1)
f1_v1 = f1_score(y_val, y_val_pred_v1)

In [102]:
# Print V1 metrics
print("V1 Metrics:")
print(f"Accuracy: {accuracy_v1:.4f}")
print(f"Precision: {precision_v1:.4f}")
print(f"Recall: {recall_v1:.4f}")
print(f"F1 Score: {f1_v1:.4f}")

V1 Metrics:
Accuracy: 0.8324
Precision: 0.8143
Recall: 0.7703
F1 Score: 0.7917


In [104]:
# Predict and evaluate V2 on validation set
y_val_pred_v2 = stacking_clf_v2.predict(X_val)

In [106]:
# Evaluation metrics for V2
accuracy_v2 = accuracy_score(y_val, y_val_pred_v2)
precision_v2 = precision_score(y_val, y_val_pred_v2)
recall_v2 = recall_score(y_val, y_val_pred_v2)
f1_v2 = f1_score(y_val, y_val_pred_v2)

In [108]:
# Print V2 metrics
print("V2 Metrics:")
print(f"Accuracy: {accuracy_v2:.4f}")
print(f"Precision: {precision_v2:.4f}")
print(f"Recall: {recall_v2:.4f}")
print(f"F1 Score: {f1_v2:.4f}")

V2 Metrics:
Accuracy: 0.8101
Precision: 0.8333
Recall: 0.6757
F1 Score: 0.7463


In [110]:
print("Comparison between V1 and V2:")
print(f"Accuracy V1: {accuracy_v1:.4f} | V2: {accuracy_v2:.4f}")
print(f"Precision V1: {precision_v1:.4f} | V2: {precision_v2:.4f}")
print(f"Recall V1: {recall_v1:.4f} | V2: {recall_v2:.4f}")
print(f"F1 Score V1: {f1_v1:.4f} | V2: {f1_v2:.4f}")

Comparison between V1 and V2:
Accuracy V1: 0.8324 | V2: 0.8101
Precision V1: 0.8143 | V2: 0.8333
Recall V1: 0.7703 | V2: 0.6757
F1 Score V1: 0.7917 | V2: 0.7463


# <font color='mediumseagreen'> <b>GridSearchCV</b><font color='black'> 

## <font color='lightgreen'> <b>V1</b><font color='black'>  

In [112]:
from sklearn.model_selection import GridSearchCV
# Defining V1 base learners
base_learners_v1 = [
    ('lr', LogisticRegression()),
    ('rf', RandomForestClassifier()),
    ('knn', KNeighborsClassifier())
]

In [114]:
# Defining the stacking classifier for V1
stacking_clf_v1 = StackingClassifier(
    estimators=base_learners_v1,
    final_estimator=GradientBoostingClassifier(),
    cv=3
)

In [122]:
# Defining the parameter grid for GridSearchCV for V1
param_grid_v1 = {
    'rf__n_estimators': [50, 100],  # Random Forest hyperparameters
    'rf__max_depth': [3, 5, 10, 15],
    'knn__n_neighbors': [3, 5, 7, 9],  # KNN hyperparameters
    'final_estimator__learning_rate': [0.01, 0.1, 0.2],  # Gradient Boosting hyperparameters
    'final_estimator__n_estimators': [50, 100]
}

In [124]:
# Using GridSearchCV for hyperparameter optimization for V1
grid_search_v1 = GridSearchCV(estimator=stacking_clf_v1, param_grid=param_grid_v1, cv=5, n_jobs=-1, verbose=0)

In [126]:
# Fit the grid search for V1
grid_search_v1.fit(X_train, y_train)

In [128]:
# Get the best parameters and score for V1
print("V1 Best parameters found: ", grid_search_v1.best_params_)
print("V1 Best cross-validation score: {:.4f}".format(grid_search_v1.best_score_))

V1 Best parameters found:  {'final_estimator__learning_rate': 0.1, 'final_estimator__n_estimators': 50, 'knn__n_neighbors': 7, 'rf__max_depth': 5, 'rf__n_estimators': 50}
V1 Best cross-validation score: 0.8286


In [130]:
# Evaluate the optimized model on the validation set for V1
best_model_v1 = grid_search_v1.best_estimator_
y_val_pred_v1 = best_model_v1.predict(X_val)

In [132]:
# Performance evaluation for V1
accuracy_v1 = accuracy_score(y_val, y_val_pred_v1)
precision_v1 = precision_score(y_val, y_val_pred_v1)
recall_v1 = recall_score(y_val, y_val_pred_v1)
f1_v1 = f1_score(y_val, y_val_pred_v1)

In [134]:
# Print evaluation metrics
print("V1 Metrics:")
print(f"Accuracy: {accuracy_v1:.4f}")
print(f"Precision: {precision_v1:.4f}")
print(f"Recall: {recall_v1:.4f}")
print(f"F1 Score: {f1_v1:.4f}")

V1 Metrics:
Accuracy: 0.8101
Precision: 0.8125
Recall: 0.7027
F1 Score: 0.7536


## <font color='lightgreen'> <b>V2</b><font color='black'>  

In [136]:
# Defining V2 base learners
base_learners_v2 = [
    ('catboost', CatBoostClassifier(verbose=0)),
    ('xgboost', xgb.XGBClassifier()),
    ('lightgbm', lgb.LGBMClassifier())
]

In [138]:
# Defining the stacking classifier for V2
stacking_clf_v2 = StackingClassifier(
    estimators=base_learners_v2,
    final_estimator=GradientBoostingClassifier(),
    cv=3
)

In [140]:
# Defining the parameter grid for GridSearchCV for V2
param_grid_v2 = {
    'xgboost__n_estimators': [50, 100],  # XGBoost hyperparameters
    'xgboost__max_depth': [5, 10],
    'lightgbm__n_estimators': [50, 100],  # LightGBM hyperparameters
    'final_estimator__learning_rate': [0.01, 0.1],  # Gradient Boosting hyperparameters
    'final_estimator__n_estimators': [50, 100]
}

In [142]:
# Using GridSearchCV for hyperparameter optimization for V2
grid_search_v2 = GridSearchCV(estimator=stacking_clf_v2, param_grid=param_grid_v2, cv=3, n_jobs=-1, verbose=2)

In [144]:
# Fit the grid search for V2
grid_search_v2.fit(X_train, y_train)

Fitting 3 folds for each of 32 candidates, totalling 96 fits
[LightGBM] [Info] Number of positive: 268, number of negative: 444
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000057 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 198
[LightGBM] [Info] Number of data points in the train set: 712, number of used features: 8
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.376404 -> initscore=-0.504838
[LightGBM] [Info] Start training from score -0.504838
[LightGBM] [Info] Number of positive: 178, number of negative: 296
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000046 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 165
[LightGBM] [Info] Number of data points in the train set: 474, 

In [146]:
# Get the best parameters and score for V2
print("V2 Best parameters found: ", grid_search_v2.best_params_)
print("V2 Best cross-validation score: {:.4f}".format(grid_search_v2.best_score_))

V2 Best parameters found:  {'final_estimator__learning_rate': 0.01, 'final_estimator__n_estimators': 100, 'lightgbm__n_estimators': 100, 'xgboost__max_depth': 10, 'xgboost__n_estimators': 100}
V2 Best cross-validation score: 0.8174


In [148]:
# Evaluate the optimized model on the validation set for V2
best_model_v2 = grid_search_v2.best_estimator_
y_val_pred_v2 = best_model_v2.predict(X_val)

In [150]:
# Performance evaluation for V2
accuracy_v2 = accuracy_score(y_val, y_val_pred_v2)
precision_v2 = precision_score(y_val, y_val_pred_v2)
recall_v2 = recall_score(y_val, y_val_pred_v2)
f1_v2 = f1_score(y_val, y_val_pred_v2)

In [152]:
# Print evaluation metrics
print("V2 Metrics:")
print(f"Accuracy: {accuracy_v2:.4f}")
print(f"Precision: {precision_v2:.4f}")
print(f"Recall: {recall_v2:.4f}")
print(f"F1 Score: {f1_v2:.4f}")

V2 Metrics:
Accuracy: 0.7989
Precision: 0.8276
Recall: 0.6486
F1 Score: 0.7273


In [154]:
print("Comparison between V1 and V2 with hyperparameter tuning:")
print(f"Accuracy V1: {accuracy_v1:.4f} | V2: {accuracy_v2:.4f}")
print(f"Precision V1: {precision_v1:.4f} | V2: {precision_v2:.4f}")
print(f"Recall V1: {recall_v1:.4f} | V2: {recall_v2:.4f}")
print(f"F1 Score V1: {f1_v1:.4f} | V2: {f1_v2:.4f}")

Comparison between V1 and V2 with hyperparameter tuning:
Accuracy V1: 0.8101 | V2: 0.7989
Precision V1: 0.8125 | V2: 0.8276
Recall V1: 0.7027 | V2: 0.6486
F1 Score V1: 0.7536 | V2: 0.7273


# <font color='mediumseagreen'> <b>Comparison of Multiple Classification Algorithms</b><font color='black'> 

In [156]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
import lightgbm as lgb
import xgboost as xgb
from catboost import CatBoostClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [158]:
# Define all models
models = {
    'Logistic Regression': LogisticRegression(),
    'KNN': KNeighborsClassifier(),
    'SVM': SVC(),
    'Decision Tree': DecisionTreeClassifier(),
    'Random Forest': RandomForestClassifier(),
    'AdaBoost': AdaBoostClassifier(),
    'Gradient Boosting': GradientBoostingClassifier(),
    'LightGBM': lgb.LGBMClassifier(),
    'XGBoost': xgb.XGBClassifier(),
    'CatBoost': CatBoostClassifier(verbose=0)
}

In [162]:
# Empty list to store performance results for other models
performance_data = []

# Calculate performance metrics for other models
for name, model in models.items():
    # Train the model
    model.fit(X_train, y_train)
    # Make predictions on the validation set
    y_val_pred = model.predict(X_val)
    # Calculate performance metrics
    accuracy = accuracy_score(y_val, y_val_pred)
    precision = precision_score(y_val, y_val_pred)
    recall = recall_score(y_val, y_val_pred)
    f1 = f1_score(y_val, y_val_pred)
    # Append the results to the list
    performance_data.append([name, accuracy, precision, recall, f1])

[LightGBM] [Info] Number of positive: 268, number of negative: 444
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000061 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 198
[LightGBM] [Info] Number of data points in the train set: 712, number of used features: 8
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.376404 -> initscore=-0.504838
[LightGBM] [Info] Start training from score -0.504838


In [168]:
# Add v1 stacking results
performance_data.append([
    'Stacking (V1)',
    accuracy_v1,
    precision_v1,
    recall_v1,
    f1_v1
])

# Add v2 stacking results
performance_data.append([
    'Stacking (V2)',
    accuracy_v2,
    precision_v2,
    recall_v2,
    f1_v2
])

# Convert the list to a DataFrame
df = pd.DataFrame(performance_data, columns=['Model', 'Accuracy', 'Precision', 'Recall', 'F1 Score'])

# Display the DataFrame
df

Unnamed: 0,Model,Accuracy,Precision,Recall,F1 Score
0,Logistic Regression,0.810056,0.785714,0.743243,0.763889
1,KNN,0.804469,0.782609,0.72973,0.755245
2,SVM,0.815642,0.80597,0.72973,0.765957
3,Decision Tree,0.776536,0.723684,0.743243,0.733333
4,Random Forest,0.815642,0.788732,0.756757,0.772414
5,AdaBoost,0.804469,0.767123,0.756757,0.761905
6,Gradient Boosting,0.804469,0.819672,0.675676,0.740741
7,LightGBM,0.826816,0.794521,0.783784,0.789116
8,XGBoost,0.815642,0.788732,0.756757,0.772414
9,CatBoost,0.821229,0.85,0.689189,0.761194


<p style="background-color:mediumseagreen;font-family:newtimeroman;font-size:200%;color:black;text-align:center;border-radius:60px 20px;"><b>THANK YOU!</b></p>