# 🚀 Hyperparameter Tuning & Final Model Selection for Telco Churn Prediction

This notebook performs advanced hyperparameter optimization for the top 3 baseline models and selects the best model for deployment.

## 📂 Load Data
Load the feature-engineered training dataset and prepare features (X) and target (y).

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Load data
data = pd.read_csv('../Data/output/feature_engineered_train.csv')
print(f'Dataset shape: {data.shape}')

# Separate features and target
X = data.drop(columns=['customerID', 'Churn'])
y = data['Churn']
if y.dtype == 'object' or y.dtype.name == 'category':
    y = LabelEncoder().fit_transform(y)

# Check for missing values
missing = data.isnull().sum().sum()
print(f'Missing values: {missing}')
assert missing == 0, 'There are missing values in the data!'
print(f'Features shape: {X.shape}, Target shape: {y.shape}')

Dataset shape: (5625, 22)
Missing values: 0
Features shape: (5625, 20), Target shape: (5625,)


## 🎯 Top 3 Models for Tuning
We will tune the following models based on baseline results:
- Gradient Boosting Classifier
- CatBoost Classifier
- AdaBoost Classifier

In [2]:
# Install required packages
%pip install catboost optuna
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier
from catboost import CatBoostClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, StratifiedKFold, cross_val_score
from sklearn.metrics import make_scorer, roc_auc_score, accuracy_score, precision_score, recall_score
import optuna
import time

Collecting optuna
  Downloading optuna-4.4.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.2-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting sqlalchemy>=1.4.2 (from optuna)
  Downloading sqlalchemy-2.0.41-cp313-cp313-win_amd64.whl.metadata (9.8 kB)
Collecting tqdm (from optuna)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting PyYAML (from optuna)
  Using cached PyYAML-6.0.2-cp313-cp313-win_amd64.whl.metadata (2.1 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading mako-1.3.10-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.12 (from alembic>=1.5.0->optuna)
  Using cached typing_extensions-4.14.0-py3-none-any.whl.metadata (3.0 kB)
Collecting greenlet>=1 (from sqlalchemy>=1.4.2->optuna)
  Downloading greenlet-3.2.3-cp313-cp313-win_amd64.whl.metadata (4.2 kB)
Collecting MarkupSafe>=0.9.2 (f


[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip
  from .autonotebook import tqdm as notebook_tqdm
  from .autonotebook import tqdm as notebook_tqdm


## 🚀 Hyperparameter Search Spaces
Defined for each model as follows:

In [3]:
# Gradient Boosting
gb_params = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5],
    'learning_rate': [0.01, 0.1, 0.2],
    'subsample': [0.8, 1.0]
}
# CatBoost
cb_params = {
    'iterations': [100, 200],
    'depth': [4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1],
    'l2_leaf_reg': [1, 3, 5]
}
# AdaBoost
ada_params = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1.0]
}

## 🔍 Tuning Gradient Boosting Classifier
We use GridSearchCV and RandomizedSearchCV with 5-fold stratified CV and AUC-ROC as the main metric.

In [12]:
gbc = GradientBoostingClassifier(random_state=42)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# GridSearchCV
start = time.time()
gbc_grid = GridSearchCV(gbc, gb_params, scoring='roc_auc', cv=cv, n_jobs=-1, verbose=1)
gbc_grid.fit(X, y)
grid_time = time.time() - start

# RandomizedSearchCV
start = time.time()
gbc_rand = RandomizedSearchCV(gbc, gb_params, n_iter=10, scoring='roc_auc', cv=cv, n_jobs=-1, random_state=42, verbose=1)
gbc_rand.fit(X, y)
rand_time = time.time() - start

# Best model
gbc_best = gbc_grid.best_estimator_ if gbc_grid.best_score_ >= gbc_rand.best_score_ else gbc_rand.best_estimator_
gbc_best_params = gbc_grid.best_params_ if gbc_grid.best_score_ >= gbc_rand.best_score_ else gbc_rand.best_params_
gbc_best_time = grid_time if gbc_grid.best_score_ >= gbc_rand.best_score_ else rand_time

# Cross-validated metrics
gbc_scores = {
    'AUC-ROC': cross_val_score(gbc_best, X, y, cv=cv, scoring='roc_auc'),
    'Accuracy': cross_val_score(gbc_best, X, y, cv=cv, scoring='accuracy'),
    'Precision': cross_val_score(gbc_best, X, y, cv=cv, scoring='precision'),
    'Recall': cross_val_score(gbc_best, X, y, cv=cv, scoring='recall')
}
print('GradientBoosting Best Params:', gbc_best_params)
print('Mean/Std (AUC-ROC):', gbc_scores['AUC-ROC'].mean(), gbc_scores['AUC-ROC'].std())
print('Mean/Std (Accuracy):', gbc_scores['Accuracy'].mean(), gbc_scores['Accuracy'].std())
print('Mean/Std (Precision):', gbc_scores['Precision'].mean(), gbc_scores['Precision'].std())
print('Mean/Std (Recall):', gbc_scores['Recall'].mean(), gbc_scores['Recall'].std())
print('Training time (s):', gbc_best_time)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Fitting 5 folds for each of 10 candidates, totalling 50 fits
GradientBoosting Best Params: {'learning_rate': 0.01, 'max_depth': 4, 'n_estimators': 300, 'subsample': 0.8}
Mean/Std (AUC-ROC): 0.8411865216581503 0.008264120198961965
Mean/Std (Accuracy): 0.7953777777777777 0.004546386436460221
Mean/Std (Precision): 0.6627881944008334 0.008149543051864126
Mean/Std (Recall): 0.4688963210702341 0.03360496772463002
Training time (s): 131.3987877368927
GradientBoosting Best Params: {'learning_rate': 0.01, 'max_depth': 4, 'n_estimators': 300, 'subsample': 0.8}
Mean/Std (AUC-ROC): 0.8411865216581503 0.008264120198961965
Mean/Std (Accuracy): 0.7953777777777777 0.004546386436460221
Mean/Std (Precision): 0.6627881944008334 0.008149543051864126
Mean/Std (Recall): 0.4688963210702341 0.03360496772463002
Training time (s): 131.3987877368927


## 🔍 Tuning CatBoost Classifier
We use GridSearchCV and RandomizedSearchCV with 5-fold stratified CV and AUC-ROC as the main metric.

In [None]:
cbc = CatBoostClassifier(random_state=42, verbose=0)

# GridSearchCV
start = time.time()
cbc_grid = GridSearchCV(cbc, cb_params, scoring='roc_auc', cv=cv, n_jobs=-1, verbose=1)
cbc_grid.fit(X, y)
grid_time = time.time() - start

# RandomizedSearchCV
start = time.time()
cbc_rand = RandomizedSearchCV(cbc, cb_params, n_iter=10, scoring='roc_auc', cv=cv, n_jobs=-1, random_state=42, verbose=1)
cbc_rand.fit(X, y)
rand_time = time.time() - start

cbc_best = cbc_grid.best_estimator_ if cbc_grid.best_score_ >= cbc_rand.best_score_ else cbc_rand.best_estimator_
cbc_best_params = cbc_grid.best_params_ if cbc_grid.best_score_ >= cbc_rand.best_score_ else cbc_rand.best_params_
cbc_best_time = grid_time if cbc_grid.best_score_ >= cbc_rand.best_score_ else rand_time

cbc_scores = {
    'AUC-ROC': cross_val_score(cbc_best, X, y, cv=cv, scoring='roc_auc'),
    'Accuracy': cross_val_score(cbc_best, X, y, cv=cv, scoring='accuracy'),
    'Precision': cross_val_score(cbc_best, X, y, cv=cv, scoring='precision'),
    'Recall': cross_val_score(cbc_best, X, y, cv=cv, scoring='recall')
}
print('CatBoost Best Params:', cbc_best_params)
print('Mean/Std (AUC-ROC):', scores['AUC-ROC'].mean(), scores['AUC-ROC'].std())
print('Mean/Std (Accuracy):', scores['Accuracy'].mean(), scores['Accuracy'].std())
print('Mean/Std (Precision):', scores['Precision'].mean(), scores['Precision'].std())
print('Mean/Std (Recall):', scores['Recall'].mean(), scores['Recall'].std())
print('Training time (s):', cbc_best_time)

Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Fitting 5 folds for each of 10 candidates, totalling 50 fits
CatBoost Best Params: {'depth': 6, 'iterations': 100, 'l2_leaf_reg': 1, 'learning_rate': 0.05}
Mean/Std (AUC-ROC): 0.8419979431033227 0.007000831926725443
Mean/Std (Accuracy): 0.7960888888888891 0.004757009123647853
Mean/Std (Precision): 0.6611600519127133 0.006680873340373868
Mean/Std (Recall): 0.4775919732441472 0.03231537746938104
Training time (s): 42.93901515007019
CatBoost Best Params: {'depth': 6, 'iterations': 100, 'l2_leaf_reg': 1, 'learning_rate': 0.05}
Mean/Std (AUC-ROC): 0.8419979431033227 0.007000831926725443
Mean/Std (Accuracy): 0.7960888888888891 0.004757009123647853
Mean/Std (Precision): 0.6611600519127133 0.006680873340373868
Mean/Std (Recall): 0.4775919732441472 0.03231537746938104
Training time (s): 42.93901515007019


## 🔍 Tuning AdaBoost Classifier
We use GridSearchCV and RandomizedSearchCV with 5-fold stratified CV and AUC-ROC as the main metric.

In [6]:
ada = AdaBoostClassifier(random_state=42)

# GridSearchCV
start = time.time()
ada_grid = GridSearchCV(ada, ada_params, scoring='roc_auc', cv=cv, n_jobs=-1, verbose=1)
ada_grid.fit(X, y)
grid_time = time.time() - start

# RandomizedSearchCV
start = time.time()
ada_rand = RandomizedSearchCV(ada, ada_params, n_iter=10, scoring='roc_auc', cv=cv, n_jobs=-1, random_state=42, verbose=1)
ada_rand.fit(X, y)
rand_time = time.time() - start

ada_best = ada_grid.best_estimator_ if ada_grid.best_score_ >= ada_rand.best_score_ else ada_rand.best_estimator_
ada_best_params = ada_grid.best_params_ if ada_grid.best_score_ >= ada_rand.best_score_ else ada_rand.best_params_
ada_best_time = grid_time if ada_grid.best_score_ >= ada_rand.best_score_ else rand_time

scores = {
    'AUC-ROC': cross_val_score(ada_best, X, y, cv=cv, scoring='roc_auc'),
    'Accuracy': cross_val_score(ada_best, X, y, cv=cv, scoring='accuracy'),
    'Precision': cross_val_score(ada_best, X, y, cv=cv, scoring='precision'),
    'Recall': cross_val_score(ada_best, X, y, cv=cv, scoring='recall')
}
print('AdaBoost Best Params:', ada_best_params)
print('Mean/Std (AUC-ROC):', scores['AUC-ROC'].mean(), scores['AUC-ROC'].std())
print('Mean/Std (Accuracy):', scores['Accuracy'].mean(), scores['Accuracy'].std())
print('Mean/Std (Precision):', scores['Precision'].mean(), scores['Precision'].std())
print('Mean/Std (Recall):', scores['Recall'].mean(), scores['Recall'].std())
print('Training time (s):', ada_best_time)

Fitting 5 folds for each of 9 candidates, totalling 45 fits




Fitting 5 folds for each of 9 candidates, totalling 45 fits
AdaBoost Best Params: {'learning_rate': 0.1, 'n_estimators': 200}
Mean/Std (AUC-ROC): 0.8371241507203188 0.008334270054189174
Mean/Std (Accuracy): 0.788088888888889 0.005722125134019947
Mean/Std (Precision): 0.6905307860047716 0.03115869602513254
Mean/Std (Recall): 0.37257525083612036 0.04505005549293039
Training time (s): 5.8091208934783936
AdaBoost Best Params: {'learning_rate': 0.1, 'n_estimators': 200}
Mean/Std (AUC-ROC): 0.8371241507203188 0.008334270054189174
Mean/Std (Accuracy): 0.788088888888889 0.005722125134019947
Mean/Std (Precision): 0.6905307860047716 0.03115869602513254
Mean/Std (Recall): 0.37257525083612036 0.04505005549293039
Training time (s): 5.8091208934783936


## 📊 Model Comparison Table
Summarize the best results for each model.

In [8]:
# Collect results
results = [
    {
        'Model': 'GradientBoosting',
        'AUC-ROC': scores['AUC-ROC'].mean(),
        'Accuracy': scores['Accuracy'].mean(),
        'Precision': scores['Precision'].mean(),
        'Recall': scores['Recall'].mean(),
        'Best Params': gbc_best_params
    },
    {
        'Model': 'CatBoost',
        'AUC-ROC': scores['AUC-ROC'].mean(),
        'Accuracy': scores['Accuracy'].mean(),
        'Precision': scores['Precision'].mean(),
        'Recall': scores['Recall'].mean(),
        'Best Params': cbc_best_params
    },
    {
        'Model': 'AdaBoost',
        'AUC-ROC': scores['AUC-ROC'].mean(),
        'Accuracy': scores['Accuracy'].mean(),
        'Precision': scores['Precision'].mean(),
        'Recall': scores['Recall'].mean(),
        'Best Params': ada_best_params
    }
]
results_df = pd.DataFrame(results)
display(results_df)

Unnamed: 0,Model,AUC-ROC,Accuracy,Precision,Recall,Best Params
0,GradientBoosting,0.837124,0.788089,0.690531,0.372575,"{'learning_rate': 0.01, 'max_depth': 4, 'n_est..."
1,CatBoost,0.837124,0.788089,0.690531,0.372575,"{'depth': 6, 'iterations': 100, 'l2_leaf_reg':..."
2,AdaBoost,0.837124,0.788089,0.690531,0.372575,"{'learning_rate': 0.1, 'n_estimators': 200}"


## 🏆 Final Model Selection
Select the best model based on AUC-ROC, efficiency, and interpretability.

In [9]:
# Example: Select CatBoost if it has the highest AUC-ROC
best_model = cbc_best if results_df.loc[1, 'AUC-ROC'] == results_df['AUC-ROC'].max() else (
    gbc_best if results_df.loc[0, 'AUC-ROC'] == results_df['AUC-ROC'].max() else ada_best
)
best_model_name = 'CatBoost' if best_model == cbc_best else ('GradientBoosting' if best_model == gbc_best else 'AdaBoost')
print(f'✅ Final model selected: {best_model_name}')

✅ Final model selected: CatBoost


## 📝 Model Selection Justification
The final model is selected based on the following criteria:
- **AUC-ROC (primary):** The model with the highest mean AUC-ROC across 5-fold CV is preferred.
- **Computational efficiency:** Training time and resource usage are considered, especially for large datasets.
- **Model complexity:** Simpler models are preferred if performance is similar, and interpretability/ease of tuning is considered.

**Example justification:**
CatBoost was selected as the final model because it achieved the highest AUC-ROC score among all candidates, while also offering efficient training and robust handling of categorical features. Although Gradient Boosting and AdaBoost are strong contenders, CatBoost's native support for categorical variables and ease of hyperparameter tuning make it especially suitable for this dataset. If computational efficiency or interpretability were a higher priority and performance was similar, AdaBoost or Gradient Boosting could be considered.

In [13]:
# CatBoost
print("\\n" + "="*50)
print("CatBoost Hyperparameter Tuning")
print("="*50)

cbc_grid = GridSearchCV(cbc, cb_params, cv=cv, scoring='roc_auc', n_jobs=-1, verbose=1)
start = time.time()
cbc_grid.fit(X, y)
grid_time = (time.time() - start) / 60

cbc_rand = RandomizedSearchCV(cbc, cb_params, cv=cv, scoring='roc_auc', n_jobs=-1, 
                             verbose=1, n_iter=50, random_state=42)
start = time.time()
cbc_rand.fit(X, y)
rand_time = (time.time() - start) / 60

cbc_best_params = cbc_grid.best_params_ if cbc_grid.best_score_ >= cbc_rand.best_score_ else cbc_rand.best_params_
cbc_best = CatBoostClassifier(**cbc_best_params, random_state=42, verbose=False)
cbc_best_time = grid_time if cbc_grid.best_score_ >= cbc_rand.best_score_ else rand_time

cbc_scores = {
    'AUC-ROC': cross_val_score(cbc_best, X, y, cv=cv, scoring='roc_auc'),
    'Accuracy': cross_val_score(cbc_best, X, y, cv=cv, scoring='accuracy'),
    'Precision': cross_val_score(cbc_best, X, y, cv=cv, scoring='precision'),
    'Recall': cross_val_score(cbc_best, X, y, cv=cv, scoring='recall')
}
print('CatBoost Best Params:', cbc_best_params)
print('Mean/Std (AUC-ROC):', cbc_scores['AUC-ROC'].mean(), cbc_scores['AUC-ROC'].std())

CatBoost Hyperparameter Tuning
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 50 candidates, totalling 250 fits
Fitting 5 folds for each of 50 candidates, totalling 250 fits
CatBoost Best Params: {'depth': 6, 'iterations': 100, 'l2_leaf_reg': 1, 'learning_rate': 0.05}
Mean/Std (AUC-ROC): 0.8419979431033227 0.007000831926725443
CatBoost Best Params: {'depth': 6, 'iterations': 100, 'l2_leaf_reg': 1, 'learning_rate': 0.05}
Mean/Std (AUC-ROC): 0.8419979431033227 0.007000831926725443


In [14]:
# AdaBoost
print("\\n" + "="*50)
print("AdaBoost Hyperparameter Tuning")
print("="*50)

ada_grid = GridSearchCV(ada, ada_params, cv=cv, scoring='roc_auc', n_jobs=-1, verbose=1)
start = time.time()
ada_grid.fit(X, y)
grid_time = (time.time() - start) / 60

ada_rand = RandomizedSearchCV(ada, ada_params, cv=cv, scoring='roc_auc', n_jobs=-1, 
                             verbose=1, n_iter=50, random_state=42)
start = time.time()
ada_rand.fit(X, y)
rand_time = (time.time() - start) / 60

ada_best_params = ada_grid.best_params_ if ada_grid.best_score_ >= ada_rand.best_score_ else ada_rand.best_params_
ada_best = AdaBoostClassifier(**ada_best_params, random_state=42)
ada_best_time = grid_time if ada_grid.best_score_ >= ada_rand.best_score_ else rand_time

ada_scores = {
    'AUC-ROC': cross_val_score(ada_best, X, y, cv=cv, scoring='roc_auc'),
    'Accuracy': cross_val_score(ada_best, X, y, cv=cv, scoring='accuracy'),
    'Precision': cross_val_score(ada_best, X, y, cv=cv, scoring='precision'),
    'Recall': cross_val_score(ada_best, X, y, cv=cv, scoring='recall')
}
print('AdaBoost Best Params:', ada_best_params)
print('Mean/Std (AUC-ROC):', ada_scores['AUC-ROC'].mean(), ada_scores['AUC-ROC'].std())

AdaBoost Hyperparameter Tuning
Fitting 5 folds for each of 9 candidates, totalling 45 fits




Fitting 5 folds for each of 9 candidates, totalling 45 fits
AdaBoost Best Params: {'learning_rate': 0.1, 'n_estimators': 200}
Mean/Std (AUC-ROC): 0.8371241507203188 0.008334270054189174
AdaBoost Best Params: {'learning_rate': 0.1, 'n_estimators': 200}
Mean/Std (AUC-ROC): 0.8371241507203188 0.008334270054189174


In [17]:
# Model Comparison Results
print("\\n" + "="*60)
print("HYPERPARAMETER TUNING RESULTS COMPARISON")
print("="*60)

results = [
    {
        'Model': 'Gradient Boosting',
        'AUC-ROC': gbc_scores['AUC-ROC'].mean(),
        'Accuracy': gbc_scores['Accuracy'].mean(),
        'Precision': gbc_scores['Precision'].mean(),
        'Recall': gbc_scores['Recall'].mean(),
        'Training Time (min)': gbc_best_time,
        'Best Params': gbc_best_params
    },
    {
        'Model': 'CatBoost',
        'AUC-ROC': cbc_scores['AUC-ROC'].mean(),
        'Accuracy': cbc_scores['Accuracy'].mean(),
        'Precision': cbc_scores['Precision'].mean(),
        'Recall': cbc_scores['Recall'].mean(),
        'Training Time (min)': cbc_best_time,
        'Best Params': cbc_best_params
    },
    {
        'Model': 'AdaBoost',
        'AUC-ROC': ada_scores['AUC-ROC'].mean(),
        'Accuracy': ada_scores['Accuracy'].mean(),
        'Precision': ada_scores['Precision'].mean(),
        'Recall': ada_scores['Recall'].mean(),
        'Training Time (min)': ada_best_time,
        'Best Params': ada_best_params
    }
]

results_df = pd.DataFrame(results)
display(results_df)

# Select best model based on AUC-ROC score
best_model = cbc_best if results_df.loc[1, 'AUC-ROC'] == results_df['AUC-ROC'].max() else (
    gbc_best if results_df.loc[0, 'AUC-ROC'] == results_df['AUC-ROC'].max() else ada_best
)

best_model_name = results_df.loc[results_df['AUC-ROC'].idxmax(), 'Model']
print(f"\\n🏆 Best performing model: {best_model_name}")
print(f"Best AUC-ROC Score: {results_df['AUC-ROC'].max():.4f}")

# Save the final model to data output folder
import joblib
import os

# Ensure the data output directory exists
output_dir = '../Data/output'
os.makedirs(output_dir, exist_ok=True)

# Save the final model
model_path = os.path.join(output_dir, 'final_model.pkl')
joblib.dump(best_model, model_path)
print(f"\\n✅ Final model saved as '{model_path}'")

# Also save model metadata for reference
metadata = {
    'model_name': best_model_name,
    'model_type': type(best_model).__name__,
    'best_params': results_df.loc[results_df['AUC-ROC'].idxmax(), 'Best Params'],
    'performance_metrics': {
        'AUC-ROC': results_df['AUC-ROC'].max(),
        'Accuracy': results_df.loc[results_df['AUC-ROC'].idxmax(), 'Accuracy'],
        'Precision': results_df.loc[results_df['AUC-ROC'].idxmax(), 'Precision'],
        'Recall': results_df.loc[results_df['AUC-ROC'].idxmax(), 'Recall'],
        'Training Time (min)': results_df.loc[results_df['AUC-ROC'].idxmax(), 'Training Time (min)']
    },
    'selection_date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
}

metadata_path = os.path.join(output_dir, 'final_model_metadata.json')
import json
with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=2, default=str)
print(f"📋 Model metadata saved as '{metadata_path}'")

HYPERPARAMETER TUNING RESULTS COMPARISON


Unnamed: 0,Model,AUC-ROC,Accuracy,Precision,Recall,Training Time (min),Best Params
0,Gradient Boosting,0.841187,0.795378,0.662788,0.468896,131.398788,"{'learning_rate': 0.01, 'max_depth': 4, 'n_est..."
1,CatBoost,0.841998,0.796089,0.66116,0.477592,0.729742,"{'depth': 6, 'iterations': 100, 'l2_leaf_reg':..."
2,AdaBoost,0.837124,0.788089,0.690531,0.372575,0.124253,"{'learning_rate': 0.1, 'n_estimators': 200}"


\n🏆 Best performing model: CatBoost
Best AUC-ROC Score: 0.8420
\n✅ Final model saved as '../Data/output\final_model.pkl'
📋 Model metadata saved as '../Data/output\final_model_metadata.json'


## 📊 Final Model Selection

### Key Findings:

1. **Performance Comparison**: The hyperparameter tuning revealed distinct performance differences between the three models
2. **Best Model**: Based on AUC-ROC score, the optimal model was automatically selected
3. **Training Efficiency**: All models completed training within reasonable time constraints
4. **Robustness**: Cross-validation ensured reliable performance estimates

### Next Steps:
- The selected model will be used in the final evaluation notebook
- Further validation on the test set will confirm generalization capability
- The tuned model is ready for deployment consideration

**Note**: This corrected version uses separate score variables (`gbc_scores`, `cbc_scores`, `ada_scores`) to ensure accurate model comparison, fixing the variable overwriting issue identified in the error analysis.