# Model Training and Evaluation

In this notebook, we train a baseline machine learning model to predict credit default.
The goal is to build a model that balances performance with interpretability and business relevance.


# 02 - Model Training and Evaluation

This notebook covers model training using Gradient Boosting.

In [1]:
import sys
import os
sys.path.append(os.path.abspath('../src'))


## Why Gradient Boosting?

Gradient Boosting Machines (GBMs) like XGBoost and LightGBM are popular in tabular data tasks.
They offer:
- Excellent accuracy on structured data
- Support for handling missing values and mixed feature types
- Flexibility in tuning

We’ll start with scikit-learn's `GradientBoostingClassifier` or switch to XGBoost for GPU acceleration.


## Model Evaluation: AUC

We use **AUC (Area Under the Curve)** to evaluate how well the model distinguishes between defaulters and non-defaulters.
It’s widely accepted in financial modeling due to its ability to evaluate imbalanced datasets.


In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from preprocessing import preprocess_data
from modeling import train_model, evaluate_model,train_model_xgboost


In [None]:

# Load dataset
df = pd.read_csv('../data/raw/application_train.csv')

# Drop rows with missing target
df = df.dropna(subset=['TARGET'])

# Preprocess
df = preprocess_data(df)

# Train/test split
X = df.drop('TARGET', axis=1)
y = df['TARGET']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [9]:

# Train model
model = train_model(X_train, y_train)

# Evaluate
auc = evaluate_model(model, X_test, y_test)
print(f"AUC: {auc:.4f}")


      Iter       Train Loss   Remaining Time 
         1           0.5531            3.99m
         2           0.5470            3.76m
         3           0.5421            3.78m
         4           0.5383            3.72m
         5           0.5350            3.61m
         6           0.5322            3.56m
         7           0.5298            3.51m
         8           0.5277            3.49m
         9           0.5259            3.50m
        10           0.5241            3.47m
        20           0.5135            3.06m
        30           0.5069            2.73m
        40           0.5029            2.33m
        50           0.4999            1.92m
        60           0.4977            1.52m
        70           0.4960            1.13m
        80           0.4947           44.91s
        90           0.4936           22.35s
       100           0.4925            0.00s
AUC: 0.7519


In [10]:
# Save model
import joblib
joblib.dump(model, '../models/credit_risk_model.pkl')
print("Model saved to models/credit_risk_model.pkl")


Model saved to models/credit_risk_model.pkl
