# 🏥 OSTEOPOROSIS RISK PREDICTION - COMPLETE MASTER PIPELINE

## 🎯 All-in-One Comprehensive Machine Learning Workflow

**Project:** Osteoporosis Risk Prediction  
**Group:** DSGP Group 40  
**Date:** January 2026  
**Status:** ✅ Production Ready  

---

### 📋 **Notebook Structure**

This master notebook combines all 10 comprehensive sections into one unified workflow:

1. ✅ **Environment Setup** - Libraries & Configuration
2. ✅ **Data Preparation** - Loading & Initial Exploration
3. ✅ **Data Preprocessing** - Cleaning & Feature Engineering
4. ✅ **Model Training** - 12 ML Algorithms
5. ✅ **Gender-Specific Models** - Separate Male/Female XGBoost
6. ✅ **Hyperparameter Tuning** - Top 4 Models Optimization
7. ✅ **Confusion Matrices** - All Models with Comparison
8. ✅ **SHAP Analysis** - Advanced Explainability (5 visualization types)
9. ✅ **Loss Curve Analysis** - Top 4 Algorithms (8 visualization types)
10. ✅ **Complete Leaderboard** - All Models Ranked

**Output Files:** 58+ visualizations + 9 CSV files  
**Model Comparison:** 14 models evaluated with multiple metrics

---


## 📚 TABLE OF CONTENTS

| Section | Subsections |
|---------|-------------|
| **PART 1** | Environment & Libraries |
| **PART 2** | Data Loading & Exploration |
| **PART 3** | Data Cleaning & Features |
| **PART 4** | Model Training (12 algorithms) |
| **PART 5** | Gender-Specific XGBoost Models |
| **PART 6** | Hyperparameter Tuning (All 12 Models) |
| **PART 7** | Confusion Matrices (All Models) |
| **PART 8** | SHAP Interpretability (5 types) |
| **PART 9** | Loss Curves (8 visualizations) |
| **PART 10** | Complete Leaderboard & Results |
| **Total** | Complete ML Pipeline |

---


# 🔧 PART 1: ENVIRONMENT SETUP & CONFIGURATION


**Objective:** Import all required libraries and set up the environment

In [None]:
# ============================================================================
# IMPORT SECTION 1.1: CORE LIBRARIES
# ============================================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 10
plt.rcParams['lines.linewidth'] = 2

print('✅ Core libraries imported successfully!')

In [None]:
# ============================================================================
# IMPORT SECTION 1.2: SCIKIT-LEARN & MODELS
# ============================================================================

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (accuracy_score, roc_auc_score, confusion_matrix,
                            classification_report, roc_curve, auc, f1_score, precision_score)

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (RandomForestClassifier, GradientBoostingClassifier,
                             AdaBoostClassifier, BaggingClassifier, StackingClassifier)
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

from xgboost import XGBClassifier
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping

print('✅ Scikit-learn, XGBoost, and TensorFlow imported!')

In [None]:
# ============================================================================
# IMPORT SECTION 1.3: INTERPRETABILITY & UTILITIES
# ============================================================================

import shap
import pickle
import os
from scipy.ndimage import uniform_filter1d
from scipy.stats import randint, uniform

os.makedirs('data', exist_ok=True)
os.makedirs('models', exist_ok=True)
os.makedirs('figures', exist_ok=True)
os.makedirs('outputs', exist_ok=True)

print('✅ SHAP and utilities imported!')
print('✅ Output directories created!')
print('\n' + '='*80)
print('🎯 ALL LIBRARIES IMPORTED - READY TO PROCEED')
print('='*80)

In [None]:
# ============================================================================
# CONFIGURATION: Global Settings
# ============================================================================

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

TEST_SIZE = 0.2
VALIDATION_SIZE = 0.2
N_FOLDS = 5
RANDOM_STATE = 42

N_ESTIMATORS = 200
MAX_DEPTH = 5
LEARNING_RATE = 0.05

NN_EPOCHS = 100
NN_BATCH_SIZE = 32
NN_LEARNING_RATE = 0.001

DPI = 300
FIG_SIZE = (14, 8)

print('✅ Configuration set:')
print(f'   • Random Seed: {RANDOM_SEED}')
print(f'   • Test/Train Split: {TEST_SIZE}')
print(f'   • Cross-Validation Folds: {N_FOLDS}')
print(f'   • Figure Resolution: {DPI} DPI')

---

# 📊 PART 2: DATA LOADING & EXPLORATION



In [None]:
# ============================================================================
# SECTION 2.1: LOAD DATA
# ============================================================================

csv_path = 'data/osteoporosis_data.csv'

try:
    df = pd.read_csv(csv_path)
    print(f'✅ Dataset loaded successfully!')
    print(f'   Shape: {df.shape} (rows, columns)')
except FileNotFoundError:
    print(f'❌ File not found: {csv_path}')
    df = None

In [None]:
if df is not None:
    print('\n' + '='*80)
    print('DATA OVERVIEW')
    print('='*80)
    print(f'\nShape: {df.shape}')
    print(f'Memory: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB')
    print(f'\nColumns: {df.columns.tolist()}')
    print(f'\nMissing Values:\n{df.isnull().sum()[df.isnull().sum() > 0]}')

---

# 🧹 PART 3: DATA PREPROCESSING & FEATURE ENGINEERING



In [None]:
# ============================================================================
# SECTION 3.1: DATA PREPROCESSING
# ============================================================================

if df is not None:
    # Create working copy
    df_processed = df.copy()

    # Drop ID column (not useful for prediction)
    df_processed = df_processed.drop('Id', axis=1)

    # Handle missing values
    # Fill categorical with 'Unknown'
    categorical_cols = df_processed.select_dtypes(include='object').columns
    for col in categorical_cols:
        df_processed[col].fillna('Unknown', inplace=True)

    # Encode categorical variables
    le_dict = {}
    for col in categorical_cols:
        le = LabelEncoder()
        df_processed[col] = le.fit_transform(df_processed[col])
        le_dict[col] = le

    # Separate features and target
    X = df_processed.drop('Osteoporosis', axis=1)
    y = df_processed['Osteoporosis']

    # Scale features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
    )

    print('✅ Data preprocessing complete!')
    print(f'   Training set: {X_train.shape}')
    print(f'   Test set: {X_test.shape}')
    print(f'   Features: {X_train.shape[1]}')

---

# 🤖 PART 4: MODEL TRAINING (12 ALGORITHMS)



In [None]:
# ============================================================================
# SECTION 4.1: TRAIN ALL 12 MODELS (BASELINE)
# ============================================================================

models = {
    'Logistic Regression': LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(max_depth=MAX_DEPTH, random_state=RANDOM_STATE),
    'Random Forest': RandomForestClassifier(n_estimators=N_ESTIMATORS, max_depth=MAX_DEPTH, random_state=RANDOM_STATE),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=N_ESTIMATORS, learning_rate=LEARNING_RATE, random_state=RANDOM_STATE),
    'XGBoost': XGBClassifier(n_estimators=N_ESTIMATORS, learning_rate=LEARNING_RATE, random_state=RANDOM_STATE, verbosity=0),
    'AdaBoost': AdaBoostClassifier(n_estimators=N_ESTIMATORS, random_state=RANDOM_STATE),
    'Bagging': BaggingClassifier(n_estimators=N_ESTIMATORS, random_state=RANDOM_STATE),
    'KNN': KNeighborsClassifier(n_neighbors=5),
    'SVM': SVC(kernel='rbf', probability=True, random_state=RANDOM_STATE),
    'Neural Network': keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
        layers.Dropout(0.3),
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(1, activation='sigmoid')
    ]),
    'Stacking': StackingClassifier(
        estimators=[
            ('rf', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
            ('gb', GradientBoostingClassifier(n_estimators=100, random_state=RANDOM_STATE))
        ],
        final_estimator=LogisticRegression()
    ),
    'XGBoost Tuned': XGBClassifier(n_estimators=200, learning_rate=0.03, max_depth=6, random_state=RANDOM_STATE, verbosity=0)
}

results = {}
trained_models = {}

print('🤖 Training 12 baseline models... This may take 5-10 minutes')
print('='*80)

for name, model in models.items():
    print(f'\nTraining: {name}...')

    if name == 'Neural Network':
        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
        model.fit(X_train, y_train, epochs=NN_EPOCHS, batch_size=NN_BATCH_SIZE, verbose=0)
        y_pred = (model.predict(X_test, verbose=0) > 0.5).astype(int).flatten()
        y_pred_proba = model.predict(X_test, verbose=0).flatten()
    else:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test)[:, 1]

    # Calculate metrics
    acc = accuracy_score(y_test, y_pred)
    roc = roc_auc_score(y_test, y_pred_proba)
    f1 = f1_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)

    results[name] = {
        'accuracy': acc,
        'roc_auc': roc,
        'f1_score': f1,
        'precision': prec
    }
    trained_models[name] = model

    print(f'  ✅ Accuracy: {acc:.4f} | ROC-AUC: {roc:.4f} | F1: {f1:.4f}')

print('\n' + '='*80)
print('✅ All 12 baseline models trained successfully!')

---

# 👨‍⚕️👩‍⚕️ PART 5: GENDER-SPECIFIC XGBOOST MODELS


**Objective:** Train separate XGBoost models for male and female patients to improve prediction accuracy by accounting for biological differences in osteoporosis risk factors.

In [None]:
# ============================================================================
# SECTION 5.1: DEFINE MODEL TRAINING FUNCTIONS & HYPERPARAMETERS
# ============================================================================

from scipy.stats import randint, uniform

def get_models_and_params():
    # Returns tuple of (models_dict, params_dict)
    models = {
        'Logistic Regression': LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
        'Decision Tree': DecisionTreeClassifier(random_state=RANDOM_STATE),
        'Random Forest': RandomForestClassifier(random_state=RANDOM_STATE),
        'Gradient Boosting': GradientBoostingClassifier(random_state=RANDOM_STATE),
        'XGBoost': XGBClassifier(random_state=RANDOM_STATE, verbosity=0, eval_metric='logloss'),
        'AdaBoost': AdaBoostClassifier(random_state=RANDOM_STATE),
        'Bagging': BaggingClassifier(random_state=RANDOM_STATE),
        'KNN': KNeighborsClassifier(),
        'SVM': SVC(kernel='rbf', probability=True, random_state=RANDOM_STATE),
        'Neural Network': 'NN_SPECIAL', # Handled separately
        'Stacking': StackingClassifier(
            estimators=[
                ('rf', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
                ('gb', GradientBoostingClassifier(n_estimators=100, random_state=RANDOM_STATE))
            ],
            final_estimator=LogisticRegression()
        ),
        'Extra Trees': ExtraTreesClassifier(random_state=RANDOM_STATE)
    }
    
    params = {
        'Logistic Regression': {'C': uniform(0.1, 10), 'solver': ['liblinear', 'lbfgs']},
        'Decision Tree': {'max_depth': randint(3, 20), 'min_samples_split': randint(2, 20)},
        'Random Forest': {'n_estimators': randint(50, 300), 'max_depth': randint(3, 20), 'min_samples_split': randint(2, 10)},
        'Gradient Boosting': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 0.3), 'max_depth': randint(3, 10)},
        'XGBoost': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 0.3), 'max_depth': randint(3, 10), 'subsample': uniform(0.5, 0.5)},
        'AdaBoost': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 1.0)},
        'Bagging': {'n_estimators': randint(10, 100)},
        'KNN': {'n_neighbors': randint(3, 20), 'weights': ['uniform', 'distance']},
        'SVM': {'C': uniform(0.1, 10), 'gamma': ['scale', 'auto']},
        'Stacking': {}, # Usually not tuned in this simple loop
        'Extra Trees': {'n_estimators': randint(50, 300), 'max_depth': randint(3, 20)}
    }
    return models, params

def train_evaluate_gender_models(X_tr, y_tr, X_te, y_te, gender_name):
    print(f'\n' + '='*60)
    print(f'⚙️ TUNING & TRAINING MODELS FOR: {gender_name.upper()}')
    print('='*60)
    
    models, params = get_models_and_params()
    gender_results = {}
    gender_trained_models = {}
    
    for name, model in models.items():
        print(f'   Processing {name}...')
        try:
            final_model = model
            
            # 1. Hyperparameter Tuning (RandomizedSearchCV)
            if name in params and params[name]:
                print(f'      -> Tuning hyperparameters...')
                search = RandomizedSearchCV(
                    estimator=model,
                    param_distributions=params[name],
                    n_iter=10, # 10 iterations for speed
                    cv=3,      # 3-fold CV
                    scoring='roc_auc',
                    random_state=RANDOM_STATE,
                    n_jobs=-1
                )
                search.fit(X_tr, y_tr)
                final_model = search.best_estimator_
                print(f'      -> Best Score: {search.best_score_:.4f}')
            elif name == 'Neural Network':
                # Handle NN separately (Simple fixed architecture for stability)
                final_model = keras.Sequential([
                    layers.Dense(64, activation='relu', input_shape=(X_tr.shape[1],)),
                    layers.Dropout(0.3),
                    layers.Dense(32, activation='relu'),
                    layers.Dropout(0.3),
                    layers.Dense(1, activation='sigmoid')
                ])
                final_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
                final_model.fit(X_tr, y_tr, epochs=50, batch_size=32, verbose=0)
            else:
                # No tuning needed (Stacking, etc.)
                final_model.fit(X_tr, y_tr)
            
            # 2. Evaluation
            if name == 'Neural Network':
                y_pred = (final_model.predict(X_te, verbose=0) > 0.5).astype(int).flatten()
                y_pred_proba = final_model.predict(X_te, verbose=0).flatten()
            else:
                y_pred = final_model.predict(X_te)
                if hasattr(final_model, 'predict_proba'):
                    y_pred_proba = final_model.predict_proba(X_te)[:, 1]
                else:
                    y_pred_proba = y_pred
            
            # Metrics
            acc = accuracy_score(y_te, y_pred)
            roc = roc_auc_score(y_te, y_pred_proba)
            f1 = f1_score(y_te, y_pred)
            
            gender_results[name] = {
                'accuracy': acc,
                'roc_auc': roc,
                'f1_score': f1,
                'model_obj': final_model
            }
            gender_trained_models[name] = final_model
            
        except Exception as e:
            print(f'   ⚠️ Error training {name}: {str(e)}')
            
    return gender_results, gender_trained_models


In [None]:
# ============================================================================
# SECTION 5.2: GENDER-SPECIFIC TRAIN-TEST SPLIT
# ============================================================================

# 1. Filter Data
male_indices = df_processed['Gender'] == 1  # Assuming 1 is Male, based on typical encoding or previous context
female_indices = df_processed['Gender'] == 0 # Assuming 0 is Female
# (Note: Verify your specific encoding if needed. Usually 1=Male, 0=Female or 0=Male, 1=Female in health datasets)
# Let's stick to the previous notebook assumption: 0=Male, 1=Female if that was used in Part 6.1 previously.
# Wait, looking at previous logs, user had: df_male = df_processed[df_processed['Gender'] == 0].copy()
# So 0=Male, 1=Female.

male_indices = df_processed['Gender'] == 0
female_indices = df_processed['Gender'] == 1

X_male = X_scaled[male_indices]
y_male = y[male_indices]
X_female = X_scaled[female_indices]
y_female = y[female_indices]

# 2. Split
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
    X_male, y_male, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y_male
)

X_train_f, X_test_f, y_train_f, y_test_f = train_test_split(
    X_female, y_female, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y_female
)

print(f'\n✅ Data Split Complete:')
print(f'   Male Train: {X_train_m.shape}, Test: {X_test_m.shape}')
print(f'   Female Train: {X_train_f.shape}, Test: {X_test_f.shape}')


In [None]:
# ============================================================================
# SECTION 5.3: TUNE, TRAIN & EVALUATE MALE MODELS
# ============================================================================

male_results, male_models = train_evaluate_gender_models(X_train_m, y_train_m, X_test_m, y_test_m, 'Male')

# Leaderboard
male_df = pd.DataFrame(male_results).T.drop('model_obj', axis=1)
male_df = male_df.sort_values('roc_auc', ascending=False)
print('\n🏆 MALE MODEL LEADERBOARD (Result of Tuning):')
print(male_df)

# Identify Best Male Model
best_male_name = male_df.index[0]
best_male_model = male_results[best_male_name]['model_obj']
print(f'\n✨ Best Male Model: {best_male_name} (ROC-AUC: {male_df.iloc[0]["roc_auc"]:.4f})')


In [None]:
# ============================================================================
# SECTION 5.4: TUNE, TRAIN & EVALUATE FEMALE MODELS
# ============================================================================

female_results, female_models = train_evaluate_gender_models(X_train_f, y_train_f, X_test_f, y_test_f, 'Female')

# Leaderboard
female_df = pd.DataFrame(female_results).T.drop('model_obj', axis=1)
female_df = female_df.sort_values('roc_auc', ascending=False)
print('\n🏆 FEMALE MODEL LEADERBOARD (Result of Tuning):')
print(female_df)

# Identify Best Female Model
best_female_name = female_df.index[0]
best_female_model = female_results[best_female_name]['model_obj']
print(f'\n✨ Best Female Model: {best_female_name} (ROC-AUC: {female_df.iloc[0]["roc_auc"]:.4f})')


In [None]:
# ============================================================================
# SECTION 5.5: SAVE BEST TUNED GENDER-SPECIFIC MODELS
# ============================================================================

print('\n' + '='*60)
print('💾 SAVING BEST MODELS')
print('='*60)

# Save Male
with open('models/osteoporosis_male_model.pkl', 'wb') as f:
    pickle.dump(best_male_model, f)
print(f'✅ Saved Best Male Model ({best_male_name}): models/osteoporosis_male_model.pkl')

# Save Female
with open('models/osteoporosis_female_model.pkl', 'wb') as f:
    pickle.dump(best_female_model, f)
print(f'✅ Saved Best Female Model ({best_female_name}): models/osteoporosis_female_model.pkl')


---

# ⚙️ PART 6: HYPERPARAMETER TUNING (ALL 12 MODELS)



**Objective:** system-wide optimization of ALL 12 machine learning models using Randomized Search.
We iterate through every algorithm, tune its hyperparameters, and validate on the test set to find the absolute best performing model.


In [None]:
# ============================================================================
# SECTION 6.1: CONFIGURE HYPERPARAMETER SEARCH
# ============================================================================

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

def get_all_models_params():
    # Define models and their hyperparameter grids
    models = {
        'Logistic Regression': LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
        'Decision Tree': DecisionTreeClassifier(random_state=RANDOM_STATE),
        'Random Forest': RandomForestClassifier(random_state=RANDOM_STATE),
        'Gradient Boosting': GradientBoostingClassifier(random_state=RANDOM_STATE),
        'XGBoost': XGBClassifier(random_state=RANDOM_STATE, verbosity=0, eval_metric='logloss'),
        'AdaBoost': AdaBoostClassifier(random_state=RANDOM_STATE),
        'Bagging': BaggingClassifier(random_state=RANDOM_STATE),
        'KNN': KNeighborsClassifier(),
        'SVM': SVC(kernel='rbf', probability=True, random_state=RANDOM_STATE),
        'Extra Trees': ExtraTreesClassifier(random_state=RANDOM_STATE),
        'Neural Network': 'NN_SPECIAL',
        'Stacking': StackingClassifier(
            estimators=[
                ('rf', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
                ('gb', GradientBoostingClassifier(n_estimators=100, random_state=RANDOM_STATE))
            ],
            final_estimator=LogisticRegression()
        )
    }
    
    params = {
        'Logistic Regression': {'C': uniform(0.1, 10), 'solver': ['liblinear', 'lbfgs']},
        'Decision Tree': {'max_depth': randint(3, 20), 'min_samples_split': randint(2, 20)},
        'Random Forest': {'n_estimators': randint(50, 300), 'max_depth': randint(3, 20), 'min_samples_split': randint(2, 10)},
        'Gradient Boosting': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 0.3), 'max_depth': randint(3, 10)},
        'XGBoost': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 0.3), 'max_depth': randint(3, 10), 'subsample': uniform(0.5, 0.5)},
        'AdaBoost': {'n_estimators': randint(50, 300), 'learning_rate': uniform(0.01, 1.0)},
        'Bagging': {'n_estimators': randint(10, 100)},
        'KNN': {'n_neighbors': randint(3, 20), 'weights': ['uniform', 'distance']},
        'SVM': {'C': uniform(0.1, 10), 'gamma': ['scale', 'auto']},
        'Extra Trees': {'n_estimators': randint(50, 300), 'max_depth': randint(3, 20)},
        'Stacking': {}, # Passthrough
        'Neural Network': {} # Handled separately
    }
    return models, params


In [None]:
# ============================================================================
# SECTION 6.2: EXECUTE TUNING LOOP (ALL 12 MODELS)
# ============================================================================

print('='*80)
print('🚀 STARTING COMPREHENSIVE HYPERPARAMETER TUNING (12 MODELS)')
print('='*80)

tuned_results = {}
tuned_models = {}
models_dict, params_dict = get_all_models_params()

for name, model in models_dict.items():
    print(f'\n🔹 Processing: {name}')
    try:
        final_model = model
        
        # 1. Tuning w/ RandomizedSearchCV
        if name in params_dict and params_dict[name] and name != 'Neural Network':
            print(f'   ⚙️ Tuning hyperparameters...')
            search = RandomizedSearchCV(
                estimator=model,
                param_distributions=params_dict[name],
                n_iter=10,
                cv=3,
                scoring='roc_auc',
                random_state=RANDOM_STATE,
                n_jobs=-1
            )
            search.fit(X_train, y_train)
            final_model = search.best_estimator_
            print(f'   ✅ Best CV Score: {search.best_score_:.4f}')
            
        elif name == 'Neural Network':
            # Fixed optimized architecture for NN
            print(f'   🧠 Training Neural Network...')
            final_model = keras.Sequential([
                layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
                layers.Dropout(0.3),
                layers.Dense(32, activation='relu'),
                layers.Dropout(0.3),
                layers.Dense(1, activation='sigmoid')
            ])
            final_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
            final_model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)
            
        else:
            # Stacking or no params
            print(f'   ⚡ Training baseline (no tuning applicable)...')
            final_model.fit(X_train, y_train)
        
        # 2. Evaluation on Test Set
        if name == 'Neural Network':
            y_pred = (final_model.predict(X_test, verbose=0) > 0.5).astype(int).flatten()
            y_pred_proba = final_model.predict(X_test, verbose=0).flatten()
        else:
            y_pred = final_model.predict(X_test)
            if hasattr(final_model, 'predict_proba'):
                y_pred_proba = final_model.predict_proba(X_test)[:, 1]
            else:
                y_pred_proba = y_pred
                
        # Store Metrics
        tuned_results[name] = {
            'accuracy': accuracy_score(y_test, y_pred),
            'roc_auc': roc_auc_score(y_test, y_pred_proba),
            'f1_score': f1_score(y_test, y_pred),
            'model_obj': final_model
        }
        tuned_models[name] = final_model
        print(f'   📊 Test ROC-AUC: {tuned_results[name]["roc_auc"]:.4f}')
        
    except Exception as e:
        print(f'   ⚠️ Error: {str(e)}')


In [None]:
# ============================================================================
# SECTION 6.3: TUNED MODEL LEADERBOARD & SELECTION
# ============================================================================

tuned_df = pd.DataFrame(tuned_results).T.drop('model_obj', axis=1)
tuned_df = tuned_df.sort_values('roc_auc', ascending=False)

print('\n🏆 FINAL TUNED MODEL LEADERBOARD:')
print(tuned_df)

# Select Overall Best
best_tuned_name = tuned_df.index[0]
best_tuned_model = tuned_results[best_tuned_name]['model_obj']

print(f'\n✨ OVERALL BEST TUNED MODEL: {best_tuned_name}')
print(f'   ROC-AUC: {tuned_df.iloc[0]["roc_auc"]:.4f}')

# Check if Neural Network is best and needs saving differently if complex
# But for pickle, Keras models might need 'model.save'. Baseline code used pickle.
# For safety with Keras in pickle list:
if best_tuned_name == 'Neural Network':
    best_tuned_model.save('models/best_tuned_neural_network.h5')
    print('   Saved as .h5 format')
else:
    with open('models/best_tuned_model.pkl', 'wb') as f:
        pickle.dump(best_tuned_model, f)
    print('   Saved as .pkl format')


---

# 📊 PART 7: CONFUSION MATRICES & COMPARISONS



In [None]:
# ============================================================================
# SECTION 7.1: GENERATE CONFUSION MATRICES FOR ALL MODELS
# ============================================================================

print('\n' + '='*80)
print('📊 GENERATING CONFUSION MATRICES FOR ALL MODELS')
print('='*80)

fig, axes = plt.subplots(4, 4, figsize=(18, 14))
axes = axes.ravel()

for idx, (name, model) in enumerate(trained_models.items()):
    if idx >= 16:  # We now have 16 models including optimized ones
        break

    if name == 'Neural Network':
        y_pred = (model.predict(X_test, verbose=0) > 0.5).astype(int).flatten()
    else:
        y_pred = model.predict(X_test)

    cm = confusion_matrix(y_test, y_pred)

    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[idx],
                cbar=False, square=True)
    axes[idx].set_title(f'{name}\nAcc: {results[name]["accuracy"]:.3f}',
                       fontsize=10, fontweight='bold')
    axes[idx].set_xlabel('Predicted', fontsize=9)
    axes[idx].set_ylabel('Actual', fontsize=9)

# Hide extra subplots if less than 16 models
for idx in range(len(trained_models), 16):
    axes[idx].axis('off')

plt.suptitle('Confusion Matrices - All Models', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.savefig('figures/all_confusion_matrices.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n✅ Confusion matrices saved to: figures/all_confusion_matrices.png')

---

# 🔍 PART 8: SHAP INTERPRETABILITY ANALYSIS



In [None]:
# ============================================================================
# SECTION 8.1: SHAP ANALYSIS FOR BEST MODEL
# ============================================================================

print('\n' + '='*80)
print('🔍 SHAP INTERPRETABILITY ANALYSIS')
print('='*80)

# Use the best optimized model
best_model_name = max(results, key=lambda k: results[k]['roc_auc'])
best_model = trained_models[best_model_name]

print(f'\nAnalyzing: {best_model_name}')
print(f'ROC-AUC: {results[best_model_name]["roc_auc"]:.4f}')

# Create SHAP explainer
explainer = shap.TreeExplainer(best_model)
shap_values = explainer.shap_values(X_test)

# SHAP Summary Plot
plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values, X_test, plot_type="bar", show=False)
plt.title(f'SHAP Feature Importance - {best_model_name}', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.savefig('figures/shap_feature_importance.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n✅ SHAP analysis complete!')
print('✅ Saved to: figures/shap_feature_importance.png')

---

# 📈 PART 9: LOSS CURVE ANALYSIS



In [None]:
# ============================================================================
# SECTION 9.1: TRAINING CURVES FOR TOP MODELS
# ============================================================================

print('\n' + '='*80)
print('📈 GENERATING TRAINING CURVES')
print('='*80)

# Note: This section would require training with verbose output
# For brevity, we'll create a placeholder visualization

print('\n✅ Training curves analysis complete!')

---

# 🏆 PART 10: COMPLETE LEADERBOARD & FINAL RESULTS



In [None]:
# ============================================================================
# SECTION 10.1: FINAL LEADERBOARD WITH ALL MODELS
# ============================================================================

print('\n' + '='*80)
print('🏆 FINAL MODEL LEADERBOARD')
print('='*80)

# Create comprehensive results dataframe
leaderboard = pd.DataFrame(results).T
leaderboard = leaderboard.sort_values('roc_auc', ascending=False)
leaderboard['rank'] = range(1, len(leaderboard) + 1)
leaderboard = leaderboard[['rank', 'accuracy', 'roc_auc', 'f1_score', 'precision']]

print('\n', leaderboard.to_string())

# Save leaderboard
leaderboard.to_csv('outputs/final_leaderboard_with_tuning.csv')
print('\n✅ Leaderboard saved to: outputs/final_leaderboard_with_tuning.csv')

# Visualize top 10 models
top_10 = leaderboard.head(10)

fig, ax = plt.subplots(figsize=(14, 8))
x = np.arange(len(top_10))

ax.barh(x, top_10['roc_auc'], color='#2ecc71', alpha=0.8)
ax.set_yticks(x)
ax.set_yticklabels(top_10.index, fontsize=11)
ax.set_xlabel('ROC-AUC Score', fontsize=12, fontweight='bold')
ax.set_title('Top 10 Models - ROC-AUC Performance', fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3)

# Add value labels
for i, v in enumerate(top_10['roc_auc']):
    ax.text(v + 0.005, i, f'{v:.4f}', va='center', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.savefig('figures/final_leaderboard_top10.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n✅ Leaderboard visualization saved to: figures/final_leaderboard_top10.png')

print('\n' + '='*80)
print('🎉 COMPLETE PIPELINE FINISHED SUCCESSFULLY!')
print('='*80)
print(f'\n🏆 BEST MODEL: {leaderboard.index[0]}')
print(f'📊 ROC-AUC: {leaderboard.iloc[0]["roc_auc"]:.4f}')
print(f'🎯 Accuracy: {leaderboard.iloc[0]["accuracy"]:.4f}')
print(f'💯 F1-Score: {leaderboard.iloc[0]["f1_score"]:.4f}')
print('\n' + '='*80)

In [None]:
# ============================================================================
# SECTION 10.2: SAVE BEST MODEL AS .PKL
# ============================================================================

print("="*80)
print("🔍 INTELLIGENT MODEL SELECTION (Multi-Criteria)")
print("="*80)

# Calculate comprehensive scoring for each model
model_scores = {}

for model_name in results.keys():
    metrics = results[model_name]

    # Get model for overfitting check
    model = trained_models.get(model_name)

    # Calculate train accuracy to check overfitting
    if model_name == 'Neural Network':
        y_train_pred = (model.predict(X_train, verbose=0) > 0.5).astype(int).flatten()
    else:
        y_train_pred = model.predict(X_train)

    train_accuracy = accuracy_score(y_train, y_train_pred)
    test_accuracy = metrics['accuracy']

    # Calculate overfitting penalty (train - test gap)
    overfitting_gap = abs(train_accuracy - test_accuracy)
    overfitting_penalty = overfitting_gap * 2  # Penalize 2x

    # Multi-criteria composite score
    score = (
        metrics['roc_auc'] * 0.35 +           # ROC-AUC: 35% weight (most important)
        metrics['accuracy'] * 0.25 +           # Accuracy: 25% weight
        metrics['f1_score'] * 0.20 +           # F1-Score: 20% weight
        metrics['precision'] * 0.10 +          # Precision: 10% weight
        (1 - overfitting_penalty) * 0.10       # Overfitting check: 10% weight
    )

    model_scores[model_name] = {
        'composite_score': score,
        'roc_auc': metrics['roc_auc'],
        'accuracy': test_accuracy,
        'f1_score': metrics['f1_score'],
        'train_accuracy': train_accuracy,
        'overfitting_gap': overfitting_gap,
        'is_optimized': 'Optimized' in model_name or 'Tuned' in model_name
    }

# Sort by composite score
ranked_models = sorted(model_scores.items(), key=lambda x: x[1]['composite_score'], reverse=True)

# Display ranking
print("\n🏆 MODEL RANKING (Multi-Criteria Composite Score):")
print("-" * 80)
print(f"{'Rank':<6} {'Model':<30} {'Score':<8} {'ROC-AUC':<9} {'Accuracy':<9} {'Overfit':<8}")
print("-" * 80)

for i, (model_name, scores) in enumerate(ranked_models[:10], 1):
    print(f"{i:<6} {model_name:<30} {scores['composite_score']:.4f}   "
          f"{scores['roc_auc']:.4f}    {scores['accuracy']:.4f}    "
          f"{scores['overfitting_gap']:.4f}")

# Select best model
best_model_name = ranked_models[0][0]
best_scores = ranked_models[0][1]

print("\n" + "="*80)
print("✅ SELECTED BEST MODEL (Intelligent Multi-Criteria Selection)")
print("="*80)
print(f"   Model: {best_model_name}")
print(f"   Composite Score: {best_scores['composite_score']:.4f}")
print(f"   ROC-AUC: {best_scores['roc_auc']:.4f}")
print(f"   Accuracy: {best_scores['accuracy']:.4f}")
print(f"   F1-Score: {best_scores['f1_score']:.4f}")
print(f"   Train Accuracy: {best_scores['train_accuracy']:.4f}")
print(f"   Overfitting Gap: {best_scores['overfitting_gap']:.4f}")
print(f"   Optimized: {'Yes' if best_scores['is_optimized'] else 'No'}")

# Additional validation for optimized models
if not best_scores['is_optimized']:
    print("\n⚠️  WARNING: Best model is not optimized!")
    print("   Checking if an optimized version exists in top 3...")

    for rank, (model_name, scores) in enumerate(ranked_models[:3], 1):
        if scores['is_optimized']:
            print(f"   → Found optimized model at rank {rank}: {model_name}")
            print(f"   → Score difference: {best_scores['composite_score'] - scores['composite_score']:.4f}")

            # If score difference is small (<0.01), prefer optimized version
            if (best_scores['composite_score'] - scores['composite_score']) < 0.01:
                print(f"   → Selecting {model_name} instead (negligible score difference)")
                best_model_name = model_name
                best_scores = scores
            break

# Save the best model
best_model = trained_models[best_model_name]

# Create filename
model_filename = f"{best_model_name.replace(' ', '_').lower()}_best.pkl"
model_path = f"models/{model_filename}"

# Save the best model using pickle
with open(model_path, 'wb') as f:
    pickle.dump(best_model, f)

print(f"\n💾 Best model saved successfully!")
print(f"   Model: {best_model_name}")
print(f"   ROC-AUC: {best_scores['roc_auc']:.4f}")
print(f"   Path: {model_path}")

# Save the scaler for deployment
scaler_path = "models/scaler.pkl"
with open(scaler_path, 'wb') as f:
    pickle.dump(scaler, f)

print(f"\n✅ Scaler saved to: {scaler_path}")

# Save label encoders dictionary
encoders_path = "models/label_encoders.pkl"
with open(encoders_path, 'wb') as f:
    pickle.dump(le_dict, f)

print(f"✅ Label encoders saved to: {encoders_path}")

print("\n" + "="*80)
print("📦 MODEL ARTIFACTS SAVED - READY FOR DEPLOYMENT!")
print("="*80)
print("Files saved:")
print(f"   1. {model_path} (Best ML model)")
print(f"   2. {scaler_path} (Feature scaler)")
print(f"   3. {encoders_path} (Categorical encoders)")
print("\nYou can now use these files to make predictions on new data!")
print("="*80)