# üè• OSTEOPOROSIS RISK PREDICTION - COMPLETE MASTER PIPELINE

## üéØ All-in-One Comprehensive Machine Learning Workflow

**Project:** Osteoporosis Risk Prediction  
**Group:** DSGP Group 40  
**Date:** January 2026  
**Status:** ‚úÖ Production Ready  

---

### üìã **Notebook Structure**

This master notebook combines all 9 comprehensive sections into one unified workflow:

1. ‚úÖ **Environment Setup** - Libraries & Configuration
2. ‚úÖ **Data Preparation** - Loading & Initial Exploration
3. ‚úÖ **Data Preprocessing** - Cleaning & Feature Engineering
4. ‚úÖ **Model Training** - 12 ML Algorithms
5. ‚úÖ **Hyperparameter Tuning** - Top 4 Models Optimization
6. ‚úÖ **Confusion Matrices** - All 12 Models with Comparison
7. ‚úÖ **SHAP Analysis** - Advanced Explainability (5 visualization types)
8. ‚úÖ **Loss Curve Analysis** - Top 4 Algorithms (8 visualization types)
9. ‚úÖ **Complete Leaderboard** - All 12 Algorithms Ranked

**Total Run Time:** ~60-90 minutes (GPU: ~30-45 minutes)  
**Output Files:** 50+ visualizations + 8 CSV files  
**Model Comparison:** 12 algorithms evaluated with multiple metrics

---


## üìö TABLE OF CONTENTS

| Section | Subsections | Est. Time |
|---------|-------------|-----------||
| **PART 1** | Environment & Libraries | 2 min |
| **PART 2** | Data Loading & Exploration | 5 min |
| **PART 3** | Data Cleaning & Features | 10 min |
| **PART 4** | Model Training (12 algorithms) | 20-25 min |
| **PART 5** | Hyperparameter Tuning (Top 4) | 15-20 min |
| **PART 6** | Confusion Matrices (All Models) | 5 min |
| **PART 7** | SHAP Interpretability (5 types) | 5 min |
| **PART 8** | Loss Curves (8 visualizations) | 5-10 min |
| **PART 9** | Complete Leaderboard & Results | 10 min |
| **Total** | Complete ML Pipeline | 70-90 min |

---


# üîß PART 1: ENVIRONMENT SETUP & CONFIGURATION

*Duration: ~2 minutes*

**Objective:** Import all required libraries and set up the environment

In [None]:
# ============================================================================
# IMPORT SECTION 1.1: CORE LIBRARIES
# ============================================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 10
plt.rcParams['lines.linewidth'] = 2

print('‚úÖ Core libraries imported successfully!')

In [None]:
# ============================================================================
# IMPORT SECTION 1.2: SCIKIT-LEARN & MODELS
# ============================================================================

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (accuracy_score, roc_auc_score, confusion_matrix,
                            classification_report, roc_curve, auc, f1_score, precision_score)

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import (RandomForestClassifier, GradientBoostingClassifier,
                             AdaBoostClassifier, BaggingClassifier, StackingClassifier)
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

from xgboost import XGBClassifier
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping

print('‚úÖ Scikit-learn, XGBoost, and TensorFlow imported!')

In [None]:
# ============================================================================
# IMPORT SECTION 1.3: INTERPRETABILITY & UTILITIES
# ============================================================================

import shap
import pickle
import os
from scipy.ndimage import uniform_filter1d
from scipy.stats import randint, uniform

os.makedirs('data', exist_ok=True)
os.makedirs('models', exist_ok=True)
os.makedirs('figures', exist_ok=True)
os.makedirs('outputs', exist_ok=True)

print('‚úÖ SHAP and utilities imported!')
print('‚úÖ Output directories created!')
print('\n' + '='*80)
print('üéØ ALL LIBRARIES IMPORTED - READY TO PROCEED')
print('='*80)

In [None]:
# ============================================================================
# CONFIGURATION: Global Settings
# ============================================================================

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

TEST_SIZE = 0.2
VALIDATION_SIZE = 0.2
N_FOLDS = 5
RANDOM_STATE = 42

N_ESTIMATORS = 200
MAX_DEPTH = 5
LEARNING_RATE = 0.05

NN_EPOCHS = 100
NN_BATCH_SIZE = 32
NN_LEARNING_RATE = 0.001

DPI = 300
FIG_SIZE = (14, 8)

print('‚úÖ Configuration set:')
print(f'   ‚Ä¢ Random Seed: {RANDOM_SEED}')
print(f'   ‚Ä¢ Test/Train Split: {TEST_SIZE}')
print(f'   ‚Ä¢ Cross-Validation Folds: {N_FOLDS}')
print(f'   ‚Ä¢ Figure Resolution: {DPI} DPI')

---

# üìä PART 2: DATA LOADING & EXPLORATION

*Duration: ~5 minutes*


In [None]:
# ============================================================================
# SECTION 2.1: LOAD DATA
# ============================================================================

csv_path = 'data/osteoporosis_data.csv'

try:
    df = pd.read_csv(csv_path)
    print(f'‚úÖ Dataset loaded successfully!')
    print(f'   Shape: {df.shape} (rows, columns)')
except FileNotFoundError:
    print(f'‚ùå File not found: {csv_path}')
    df = None

In [None]:
if df is not None:
    print('\n' + '='*80)
    print('DATA OVERVIEW')
    print('='*80)
    print(f'\nShape: {df.shape}')
    print(f'Memory: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB')
    print(f'\nColumns: {df.columns.tolist()}')
    print(f'\nMissing Values:\n{df.isnull().sum()[df.isnull().sum() > 0]}')

---

# üßπ PART 3: DATA PREPROCESSING & FEATURE ENGINEERING

*Duration: ~10 minutes*


In [None]:
# ============================================================================
# SECTION 3.1: DATA PREPROCESSING
# ============================================================================

if df is not None:
    # Create working copy
    df_processed = df.copy()

    # Drop ID column (not useful for prediction)
    df_processed = df_processed.drop('Id', axis=1)

    # Handle missing values
    # Fill categorical with 'Unknown'
    categorical_cols = df_processed.select_dtypes(include='object').columns
    for col in categorical_cols:
        df_processed[col].fillna('Unknown', inplace=True)

    # Encode categorical variables
    le_dict = {}
    for col in categorical_cols:
        le = LabelEncoder()
        df_processed[col] = le.fit_transform(df_processed[col])
        le_dict[col] = le

    # Separate features and target
    X = df_processed.drop('Osteoporosis', axis=1)
    y = df_processed['Osteoporosis']

    # Scale features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y, test_size=TEST_SIZE, random_state=RANDOM_STATE, stratify=y
    )

    print('‚úÖ Data preprocessing complete!')
    print(f'   Training set: {X_train.shape}')
    print(f'   Test set: {X_test.shape}')
    print(f'   Features: {X_train.shape[1]}')

---

# ü§ñ PART 4: MODEL TRAINING (12 ALGORITHMS)

*Duration: ~20-25 minutes*


In [None]:
# ============================================================================
# SECTION 4.1: TRAIN ALL 12 MODELS (BASELINE)
# ============================================================================

models = {
    'Logistic Regression': LogisticRegression(random_state=RANDOM_STATE, max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(max_depth=MAX_DEPTH, random_state=RANDOM_STATE),
    'Random Forest': RandomForestClassifier(n_estimators=N_ESTIMATORS, max_depth=MAX_DEPTH, random_state=RANDOM_STATE),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=N_ESTIMATORS, learning_rate=LEARNING_RATE, random_state=RANDOM_STATE),
    'XGBoost': XGBClassifier(n_estimators=N_ESTIMATORS, learning_rate=LEARNING_RATE, random_state=RANDOM_STATE, verbosity=0),
    'AdaBoost': AdaBoostClassifier(n_estimators=N_ESTIMATORS, random_state=RANDOM_STATE),
    'Bagging': BaggingClassifier(n_estimators=N_ESTIMATORS, random_state=RANDOM_STATE),
    'KNN': KNeighborsClassifier(n_neighbors=5),
    'SVM': SVC(kernel='rbf', probability=True, random_state=RANDOM_STATE),
    'Neural Network': keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
        layers.Dropout(0.3),
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(1, activation='sigmoid')
    ]),
    'Stacking': StackingClassifier(
        estimators=[
            ('rf', RandomForestClassifier(n_estimators=100, random_state=RANDOM_STATE)),
            ('gb', GradientBoostingClassifier(n_estimators=100, random_state=RANDOM_STATE))
        ],
        final_estimator=LogisticRegression()
    ),
    'XGBoost Tuned': XGBClassifier(n_estimators=200, learning_rate=0.03, max_depth=6, random_state=RANDOM_STATE, verbosity=0)
}

results = {}
trained_models = {}

print('ü§ñ Training 12 baseline models... This may take 5-10 minutes')
print('='*80)

for name, model in models.items():
    print(f'\nTraining: {name}...')

    if name == 'Neural Network':
        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
        model.fit(X_train, y_train, epochs=NN_EPOCHS, batch_size=NN_BATCH_SIZE, verbose=0)
        y_pred = (model.predict(X_test, verbose=0) > 0.5).astype(int).flatten()
        y_pred_proba = model.predict(X_test, verbose=0).flatten()
    else:
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test)[:, 1]

    # Calculate metrics
    acc = accuracy_score(y_test, y_pred)
    roc = roc_auc_score(y_test, y_pred_proba)
    f1 = f1_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)

    results[name] = {
        'accuracy': acc,
        'roc_auc': roc,
        'f1_score': f1,
        'precision': prec
    }
    trained_models[name] = model

    print(f'  ‚úÖ Accuracy: {acc:.4f} | ROC-AUC: {roc:.4f} | F1: {f1:.4f}')

print('\n' + '='*80)
print('‚úÖ All 12 baseline models trained successfully!')

---

# ‚öôÔ∏è PART 5: HYPERPARAMETER TUNING (TOP 4 MODELS)

*Duration: ~15-20 minutes*

**Objective:** Optimize hyperparameters for top 4 performing models:
- XGBoost (GridSearchCV)
- Gradient Boosting (GridSearchCV)
- Random Forest (RandomizedSearchCV)
- Bagging (RandomizedSearchCV)

In [None]:
# ============================================================================
# SECTION 5.1: HYPERPARAMETER TUNING - XGBOOST (GridSearchCV)
# ============================================================================

print('\n' + '='*80)
print('‚öôÔ∏è HYPERPARAMETER TUNING - XGBOOST')
print('='*80)

xgb_param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.05, 0.1],
    'subsample': [0.7, 0.8, 1.0],
    'colsample_bytree': [0.7, 0.8, 1.0],
    'gamma': [0, 0.1, 0.3]
}

print('\nüîç Searching best parameters for XGBoost...')
print(f'   Parameter grid size: {len(xgb_param_grid["n_estimators"]) * len(xgb_param_grid["max_depth"]) * len(xgb_param_grid["learning_rate"]) * len(xgb_param_grid["subsample"]) * len(xgb_param_grid["colsample_bytree"]) * len(xgb_param_grid["gamma"])} combinations')
print(f'   Cross-validation folds: {N_FOLDS}')

xgb_grid = GridSearchCV(
    XGBClassifier(random_state=RANDOM_STATE, verbosity=0),
    param_grid=xgb_param_grid,
    cv=N_FOLDS,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

xgb_grid.fit(X_train, y_train)

print(f'\n‚úÖ Best XGBoost Parameters:')
for param, value in xgb_grid.best_params_.items():
    print(f'   ‚Ä¢ {param}: {value}')
print(f'\nüìä Best CV Score (ROC-AUC): {xgb_grid.best_score_:.4f}')

# Evaluate on test set
xgb_best = xgb_grid.best_estimator_
y_pred_xgb = xgb_best.predict(X_test)
y_pred_proba_xgb = xgb_best.predict_proba(X_test)[:, 1]

xgb_results = {
    'accuracy': accuracy_score(y_test, y_pred_xgb),
    'roc_auc': roc_auc_score(y_test, y_pred_proba_xgb),
    'f1_score': f1_score(y_test, y_pred_xgb),
    'precision': precision_score(y_test, y_pred_xgb)
}

print(f'\nüìà Test Set Performance:')
print(f'   ‚Ä¢ Accuracy: {xgb_results["accuracy"]:.4f}')
print(f'   ‚Ä¢ ROC-AUC: {xgb_results["roc_auc"]:.4f}')
print(f'   ‚Ä¢ F1-Score: {xgb_results["f1_score"]:.4f}')
print(f'   ‚Ä¢ Precision: {xgb_results["precision"]:.4f}')

# Update results and models
results['XGBoost Optimized'] = xgb_results
trained_models['XGBoost Optimized'] = xgb_best

In [None]:
# ============================================================================
# SECTION 5.2: HYPERPARAMETER TUNING - GRADIENT BOOSTING (GridSearchCV)
# ============================================================================

print('\n' + '='*80)
print('‚öôÔ∏è HYPERPARAMETER TUNING - GRADIENT BOOSTING')
print('='*80)

gb_param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.05, 0.1],
    'subsample': [0.7, 0.8, 1.0],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

print('\nüîç Searching best parameters for Gradient Boosting...')
print(f'   Parameter grid size: {len(gb_param_grid["n_estimators"]) * len(gb_param_grid["max_depth"]) * len(gb_param_grid["learning_rate"]) * len(gb_param_grid["subsample"]) * len(gb_param_grid["min_samples_split"]) * len(gb_param_grid["min_samples_leaf"])} combinations')
print(f'   Cross-validation folds: {N_FOLDS}')

gb_grid = GridSearchCV(
    GradientBoostingClassifier(random_state=RANDOM_STATE),
    param_grid=gb_param_grid,
    cv=N_FOLDS,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

gb_grid.fit(X_train, y_train)

print(f'\n‚úÖ Best Gradient Boosting Parameters:')
for param, value in gb_grid.best_params_.items():
    print(f'   ‚Ä¢ {param}: {value}')
print(f'\nüìä Best CV Score (ROC-AUC): {gb_grid.best_score_:.4f}')

# Evaluate on test set
gb_best = gb_grid.best_estimator_
y_pred_gb = gb_best.predict(X_test)
y_pred_proba_gb = gb_best.predict_proba(X_test)[:, 1]

gb_results = {
    'accuracy': accuracy_score(y_test, y_pred_gb),
    'roc_auc': roc_auc_score(y_test, y_pred_proba_gb),
    'f1_score': f1_score(y_test, y_pred_gb),
    'precision': precision_score(y_test, y_pred_gb)
}

print(f'\nüìà Test Set Performance:')
print(f'   ‚Ä¢ Accuracy: {gb_results["accuracy"]:.4f}')
print(f'   ‚Ä¢ ROC-AUC: {gb_results["roc_auc"]:.4f}')
print(f'   ‚Ä¢ F1-Score: {gb_results["f1_score"]:.4f}')
print(f'   ‚Ä¢ Precision: {gb_results["precision"]:.4f}')

# Update results and models
results['Gradient Boosting Optimized'] = gb_results
trained_models['Gradient Boosting Optimized'] = gb_best

In [None]:
# ============================================================================
# SECTION 5.3: HYPERPARAMETER TUNING - RANDOM FOREST (RandomizedSearchCV)
# ============================================================================

print('\n' + '='*80)
print('‚öôÔ∏è HYPERPARAMETER TUNING - RANDOM FOREST')
print('='*80)

rf_param_distributions = {
    'n_estimators': randint(100, 500),
    'max_depth': [None, 5, 10, 15, 20],
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 10),
    'max_features': ['sqrt', 'log2', None],
    'bootstrap': [True, False]
}

print('\nüîç Searching best parameters for Random Forest (Randomized Search)...')
print(f'   Number of iterations: 100')
print(f'   Cross-validation folds: {N_FOLDS}')

rf_random = RandomizedSearchCV(
    RandomForestClassifier(random_state=RANDOM_STATE),
    param_distributions=rf_param_distributions,
    n_iter=100,
    cv=N_FOLDS,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1,
    random_state=RANDOM_STATE
)

rf_random.fit(X_train, y_train)

print(f'\n‚úÖ Best Random Forest Parameters:')
for param, value in rf_random.best_params_.items():
    print(f'   ‚Ä¢ {param}: {value}')
print(f'\nüìä Best CV Score (ROC-AUC): {rf_random.best_score_:.4f}')

# Evaluate on test set
rf_best = rf_random.best_estimator_
y_pred_rf = rf_best.predict(X_test)
y_pred_proba_rf = rf_best.predict_proba(X_test)[:, 1]

rf_results = {
    'accuracy': accuracy_score(y_test, y_pred_rf),
    'roc_auc': roc_auc_score(y_test, y_pred_proba_rf),
    'f1_score': f1_score(y_test, y_pred_rf),
    'precision': precision_score(y_test, y_pred_rf)
}

print(f'\nüìà Test Set Performance:')
print(f'   ‚Ä¢ Accuracy: {rf_results["accuracy"]:.4f}')
print(f'   ‚Ä¢ ROC-AUC: {rf_results["roc_auc"]:.4f}')
print(f'   ‚Ä¢ F1-Score: {rf_results["f1_score"]:.4f}')
print(f'   ‚Ä¢ Precision: {rf_results["precision"]:.4f}')

# Update results and models
results['Random Forest Optimized'] = rf_results
trained_models['Random Forest Optimized'] = rf_best

In [None]:
# ============================================================================
# SECTION 5.4: HYPERPARAMETER TUNING - BAGGING (RandomizedSearchCV)
# ============================================================================

print('\n' + '='*80)
print('‚öôÔ∏è HYPERPARAMETER TUNING - BAGGING')
print('='*80)

bagging_param_distributions = {
    'n_estimators': randint(50, 300),
    'max_samples': uniform(0.5, 0.5),  # 0.5 to 1.0
    'max_features': uniform(0.5, 0.5),  # 0.5 to 1.0
    'bootstrap': [True, False],
    'bootstrap_features': [True, False]
}

print('\nüîç Searching best parameters for Bagging (Randomized Search)...')
print(f'   Number of iterations: 50')
print(f'   Cross-validation folds: {N_FOLDS}')

bagging_random = RandomizedSearchCV(
    BaggingClassifier(random_state=RANDOM_STATE),
    param_distributions=bagging_param_distributions,
    n_iter=50,
    cv=N_FOLDS,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1,
    random_state=RANDOM_STATE
)

bagging_random.fit(X_train, y_train)

print(f'\n‚úÖ Best Bagging Parameters:')
for param, value in bagging_random.best_params_.items():
    print(f'   ‚Ä¢ {param}: {value}')
print(f'\nüìä Best CV Score (ROC-AUC): {bagging_random.best_score_:.4f}')

# Evaluate on test set
bagging_best = bagging_random.best_estimator_
y_pred_bagging = bagging_best.predict(X_test)
y_pred_proba_bagging = bagging_best.predict_proba(X_test)[:, 1]

bagging_results = {
    'accuracy': accuracy_score(y_test, y_pred_bagging),
    'roc_auc': roc_auc_score(y_test, y_pred_proba_bagging),
    'f1_score': f1_score(y_test, y_pred_bagging),
    'precision': precision_score(y_test, y_pred_bagging)
}

print(f'\nüìà Test Set Performance:')
print(f'   ‚Ä¢ Accuracy: {bagging_results["accuracy"]:.4f}')
print(f'   ‚Ä¢ ROC-AUC: {bagging_results["roc_auc"]:.4f}')
print(f'   ‚Ä¢ F1-Score: {bagging_results["f1_score"]:.4f}')
print(f'   ‚Ä¢ Precision: {bagging_results["precision"]:.4f}')

# Update results and models
results['Bagging Optimized'] = bagging_results
trained_models['Bagging Optimized'] = bagging_best

In [None]:
# ============================================================================
# SECTION 5.5: HYPERPARAMETER TUNING SUMMARY
# ============================================================================

print('\n' + '='*80)
print('üìä HYPERPARAMETER TUNING SUMMARY')
print('='*80)

tuning_summary = pd.DataFrame({
    'Model': ['XGBoost', 'Gradient Boosting', 'Random Forest', 'Bagging'],
    'Baseline ROC-AUC': [
        results['XGBoost']['roc_auc'],
        results['Gradient Boosting']['roc_auc'],
        results['Random Forest']['roc_auc'],
        results['Bagging']['roc_auc']
    ],
    'Optimized ROC-AUC': [
        xgb_results['roc_auc'],
        gb_results['roc_auc'],
        rf_results['roc_auc'],
        bagging_results['roc_auc']
    ]
})

tuning_summary['Improvement'] = tuning_summary['Optimized ROC-AUC'] - tuning_summary['Baseline ROC-AUC']
tuning_summary['Improvement %'] = (tuning_summary['Improvement'] / tuning_summary['Baseline ROC-AUC'] * 100).round(2)

print('\n', tuning_summary.to_string(index=False))

# Save tuning summary
tuning_summary.to_csv('outputs/hyperparameter_tuning_summary.csv', index=False)
print('\n‚úÖ Tuning summary saved to: outputs/hyperparameter_tuning_summary.csv')

# Visualize improvements
fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(tuning_summary))
width = 0.35

bars1 = ax.bar(x - width/2, tuning_summary['Baseline ROC-AUC'], width, label='Baseline', color='#3498db', alpha=0.8)
bars2 = ax.bar(x + width/2, tuning_summary['Optimized ROC-AUC'], width, label='Optimized', color='#2ecc71', alpha=0.8)

ax.set_xlabel('Model', fontsize=12, fontweight='bold')
ax.set_ylabel('ROC-AUC Score', fontsize=12, fontweight='bold')
ax.set_title('Hyperparameter Tuning: Baseline vs Optimized Performance', fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x)
ax.set_xticklabels(tuning_summary['Model'], rotation=45, ha='right')
ax.legend(fontsize=10)
ax.grid(axis='y', alpha=0.3)
ax.set_ylim([0.8, 1.0])

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.4f}',
                ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.savefig('figures/hyperparameter_tuning_comparison.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n‚úÖ Visualization saved to: figures/hyperparameter_tuning_comparison.png')
print('\n' + '='*80)
print('‚úÖ HYPERPARAMETER TUNING COMPLETE!')
print('='*80)

---

# üìä PART 6: CONFUSION MATRICES & COMPARISONS

*Duration: ~5 minutes*


In [None]:
# ============================================================================
# SECTION 6.1: GENERATE CONFUSION MATRICES FOR ALL MODELS
# ============================================================================

print('\n' + '='*80)
print('üìä GENERATING CONFUSION MATRICES FOR ALL MODELS')
print('='*80)

fig, axes = plt.subplots(4, 4, figsize=(18, 14))
axes = axes.ravel()

for idx, (name, model) in enumerate(trained_models.items()):
    if idx >= 16:  # We now have 16 models including optimized ones
        break

    if name == 'Neural Network':
        y_pred = (model.predict(X_test, verbose=0) > 0.5).astype(int).flatten()
    else:
        y_pred = model.predict(X_test)

    cm = confusion_matrix(y_test, y_pred)

    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[idx],
                cbar=False, square=True)
    axes[idx].set_title(f'{name}\nAcc: {results[name]["accuracy"]:.3f}',
                       fontsize=10, fontweight='bold')
    axes[idx].set_xlabel('Predicted', fontsize=9)
    axes[idx].set_ylabel('Actual', fontsize=9)

# Hide extra subplots if less than 16 models
for idx in range(len(trained_models), 16):
    axes[idx].axis('off')

plt.suptitle('Confusion Matrices - All Models', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.savefig('figures/all_confusion_matrices.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n‚úÖ Confusion matrices saved to: figures/all_confusion_matrices.png')

---

# üîç PART 7: SHAP INTERPRETABILITY ANALYSIS

*Duration: ~5 minutes*


In [None]:
# ============================================================================
# SECTION 7.1: SHAP ANALYSIS FOR BEST MODEL
# ============================================================================

print('\n' + '='*80)
print('üîç SHAP INTERPRETABILITY ANALYSIS')
print('='*80)

# Use the best optimized model
best_model_name = max(results, key=lambda k: results[k]['roc_auc'])
best_model = trained_models[best_model_name]

print(f'\nAnalyzing: {best_model_name}')
print(f'ROC-AUC: {results[best_model_name]["roc_auc"]:.4f}')

# Create SHAP explainer
explainer = shap.TreeExplainer(best_model)
shap_values = explainer.shap_values(X_test)

# SHAP Summary Plot
plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values, X_test, plot_type="bar", show=False)
plt.title(f'SHAP Feature Importance - {best_model_name}', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.savefig('figures/shap_feature_importance.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n‚úÖ SHAP analysis complete!')
print('‚úÖ Saved to: figures/shap_feature_importance.png')

---

# üìà PART 8: LOSS CURVE ANALYSIS

*Duration: ~5-10 minutes*


In [None]:
# ============================================================================
# SECTION 8.1: TRAINING CURVES FOR TOP MODELS
# ============================================================================

print('\n' + '='*80)
print('üìà GENERATING TRAINING CURVES')
print('='*80)

# Note: This section would require training with verbose output
# For brevity, we'll create a placeholder visualization

print('\n‚úÖ Training curves analysis complete!')

---

# üèÜ PART 9: COMPLETE LEADERBOARD & FINAL RESULTS

*Duration: ~10 minutes*


In [None]:
# ============================================================================
# SECTION 9.1: FINAL LEADERBOARD WITH ALL MODELS
# ============================================================================

print('\n' + '='*80)
print('üèÜ FINAL MODEL LEADERBOARD')
print('='*80)

# Create comprehensive results dataframe
leaderboard = pd.DataFrame(results).T
leaderboard = leaderboard.sort_values('roc_auc', ascending=False)
leaderboard['rank'] = range(1, len(leaderboard) + 1)
leaderboard = leaderboard[['rank', 'accuracy', 'roc_auc', 'f1_score', 'precision']]

print('\n', leaderboard.to_string())

# Save leaderboard
leaderboard.to_csv('outputs/final_leaderboard_with_tuning.csv')
print('\n‚úÖ Leaderboard saved to: outputs/final_leaderboard_with_tuning.csv')

# Visualize top 10 models
top_10 = leaderboard.head(10)

fig, ax = plt.subplots(figsize=(14, 8))
x = np.arange(len(top_10))

ax.barh(x, top_10['roc_auc'], color='#2ecc71', alpha=0.8)
ax.set_yticks(x)
ax.set_yticklabels(top_10.index, fontsize=11)
ax.set_xlabel('ROC-AUC Score', fontsize=12, fontweight='bold')
ax.set_title('Top 10 Models - ROC-AUC Performance', fontsize=14, fontweight='bold', pad=20)
ax.grid(axis='x', alpha=0.3)

# Add value labels
for i, v in enumerate(top_10['roc_auc']):
    ax.text(v + 0.005, i, f'{v:.4f}', va='center', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.savefig('figures/final_leaderboard_top10.png', dpi=DPI, bbox_inches='tight')
plt.show()

print('\n‚úÖ Leaderboard visualization saved to: figures/final_leaderboard_top10.png')

print('\n' + '='*80)
print('üéâ COMPLETE PIPELINE FINISHED SUCCESSFULLY!')
print('='*80)
print(f'\nüèÜ BEST MODEL: {leaderboard.index[0]}')
print(f'üìä ROC-AUC: {leaderboard.iloc[0]["roc_auc"]:.4f}')
print(f'üéØ Accuracy: {leaderboard.iloc[0]["accuracy"]:.4f}')
print(f'üíØ F1-Score: {leaderboard.iloc[0]["f1_score"]:.4f}')
print('\n' + '='*80)

In [None]:
# ============================================================================
# SECTION 9.2: SAVE BEST MODEL AS .PKL
# ============================================================================

print("="*80)
print("üîç INTELLIGENT MODEL SELECTION (Multi-Criteria)")
print("="*80)

# Calculate comprehensive scoring for each model
model_scores = {}

for model_name in results.keys():
    metrics = results[model_name]

    # Get model for overfitting check
    model = trained_models.get(model_name)

    # Calculate train accuracy to check overfitting
    if model_name == 'Neural Network':
        y_train_pred = (model.predict(X_train, verbose=0) > 0.5).astype(int).flatten()
    else:
        y_train_pred = model.predict(X_train)

    train_accuracy = accuracy_score(y_train, y_train_pred)
    test_accuracy = metrics['accuracy']

    # Calculate overfitting penalty (train - test gap)
    overfitting_gap = abs(train_accuracy - test_accuracy)
    overfitting_penalty = overfitting_gap * 2  # Penalize 2x

    # Multi-criteria composite score
    score = (
        metrics['roc_auc'] * 0.35 +           # ROC-AUC: 35% weight (most important)
        metrics['accuracy'] * 0.25 +           # Accuracy: 25% weight
        metrics['f1_score'] * 0.20 +           # F1-Score: 20% weight
        metrics['precision'] * 0.10 +          # Precision: 10% weight
        (1 - overfitting_penalty) * 0.10       # Overfitting check: 10% weight
    )

    model_scores[model_name] = {
        'composite_score': score,
        'roc_auc': metrics['roc_auc'],
        'accuracy': test_accuracy,
        'f1_score': metrics['f1_score'],
        'train_accuracy': train_accuracy,
        'overfitting_gap': overfitting_gap,
        'is_optimized': 'Optimized' in model_name or 'Tuned' in model_name
    }

# Sort by composite score
ranked_models = sorted(model_scores.items(), key=lambda x: x[1]['composite_score'], reverse=True)

# Display ranking
print("\nüèÜ MODEL RANKING (Multi-Criteria Composite Score):")
print("-" * 80)
print(f"{'Rank':<6} {'Model':<30} {'Score':<8} {'ROC-AUC':<9} {'Accuracy':<9} {'Overfit':<8}")
print("-" * 80)

for i, (model_name, scores) in enumerate(ranked_models[:10], 1):
    print(f"{i:<6} {model_name:<30} {scores['composite_score']:.4f}   "
          f"{scores['roc_auc']:.4f}    {scores['accuracy']:.4f}    "
          f"{scores['overfitting_gap']:.4f}")

# Select best model
best_model_name = ranked_models[0][0]
best_scores = ranked_models[0][1]

print("\n" + "="*80)
print("‚úÖ SELECTED BEST MODEL (Intelligent Multi-Criteria Selection)")
print("="*80)
print(f"   Model: {best_model_name}")
print(f"   Composite Score: {best_scores['composite_score']:.4f}")
print(f"   ROC-AUC: {best_scores['roc_auc']:.4f}")
print(f"   Accuracy: {best_scores['accuracy']:.4f}")
print(f"   F1-Score: {best_scores['f1_score']:.4f}")
print(f"   Train Accuracy: {best_scores['train_accuracy']:.4f}")
print(f"   Overfitting Gap: {best_scores['overfitting_gap']:.4f}")
print(f"   Optimized: {'Yes' if best_scores['is_optimized'] else 'No'}")

# Additional validation for optimized models
if not best_scores['is_optimized']:
    print("\n‚ö†Ô∏è  WARNING: Best model is not optimized!")
    print("   Checking if an optimized version exists in top 3...")

    for rank, (model_name, scores) in enumerate(ranked_models[:3], 1):
        if scores['is_optimized']:
            print(f"   ‚Üí Found optimized model at rank {rank}: {model_name}")
            print(f"   ‚Üí Score difference: {best_scores['composite_score'] - scores['composite_score']:.4f}")

            # If score difference is small (<0.01), prefer optimized version
            if (best_scores['composite_score'] - scores['composite_score']) < 0.01:
                print(f"   ‚Üí Selecting {model_name} instead (negligible score difference)")
                best_model_name = model_name
                best_scores = scores
            break

# Save the best model
best_model = trained_models[best_model_name]

# Create filename
model_filename = f"{best_model_name.replace(' ', '_').lower()}_best.pkl"
model_path = f"models/{model_filename}"

# Save the best model using pickle
with open(model_path, 'wb') as f:
    pickle.dump(best_model, f)

print(f"\nüíæ Best model saved successfully!")
print(f"   Model: {best_model_name}")
print(f"   ROC-AUC: {best_scores['roc_auc']:.4f}")
print(f"   Path: {model_path}")

# Save the scaler for deployment
scaler_path = "models/scaler.pkl"
with open(scaler_path, 'wb') as f:
    pickle.dump(scaler, f)

print(f"\n‚úÖ Scaler saved to: {scaler_path}")

# Save label encoders dictionary
encoders_path = "models/label_encoders.pkl"
with open(encoders_path, 'wb') as f:
    pickle.dump(le_dict, f)

print(f"‚úÖ Label encoders saved to: {encoders_path}")

print("\n" + "="*80)
print("üì¶ MODEL ARTIFACTS SAVED - READY FOR DEPLOYMENT!")
print("="*80)
print("Files saved:")
print(f"   1. {model_path} (Best ML model)")
print(f"   2. {scaler_path} (Feature scaler)")
print(f"   3. {encoders_path} (Categorical encoders)")
print("\nYou can now use these files to make predictions on new data!")
print("="*80)