# Credit Card Behaviour Score Prediction
### Forward-Looking Risk Classification using Financial Behavior Features

**Author:** Aditya Kumar Verma [cite: 2]
**Institution:** B.Tech (Electronics and Communication), IIT Roorkee [cite: 3, 4]
**Contact:** aditya_kv@ece.iitr.ac.in [cite: 4]

**Project Objective:** 
Predict whether a customer will default on their credit card payment in the next billing cycle. The model is optimized for **Recall** and the **$F_2$ score** (Target: 0.6092) to support proactive risk management[cite: 5, 6, 74].

In [None]:
# 1. Setup and Libraries
!pip install lightgbm xgboost imbalanced-learn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, precision_score, fbeta_score, classification_report
from lightgbm import LGBMClassifier
import warnings
warnings.filterwarnings('ignore')

In [None]:
# 2. Load Data and Imputation
train_df = pd.read_csv("train_dataset_final1.csv")
val_df = pd.read_csv("validate_dataset_final.csv")

# Handle missing age values [cite: 18]
train_df['age'].fillna(train_df['age'].median(), inplace=True)
print(f"Data loaded. Train shape: {train_df.shape}")

## 3. Feature Engineering
Incorporating behavioral metrics to enhance predictive power[cite: 42]:
* **Utilization_Ratio**: Average bill divided by credit limit[cite: 43].
* **Delinquency_Streak**: Count of months with late payments ($PAY \ge 1$)[cite: 43].
* **Repayment_Std**: Volatility in repayment amounts[cite: 44].

In [None]:
pay_cols = ['pay_0', 'pay_2', 'pay_3', 'pay_4', 'pay_5', 'pay_6']
pay_amt_cols = ['pay_amt1', 'pay_amt2', 'pay_amt3', 'pay_amt4', 'pay_amt5', 'pay_amt6']

train_df['Utilization_Ratio'] = train_df['AVG_Bill_amt'] / train_df['LIMIT_BAL']
train_df['Delinquency_Streak'] = (train_df[pay_cols] >= 1).sum(axis=1)
train_df['Repayment_Std'] = train_df[pay_amt_cols].std(axis=1)

## 4. Model Training and Evaluation
Using **LightGBM** with class weights to handle data imbalance[cite: 49, 51]. We apply the optimized threshold of **0.36** to maximize the $F_2$ score[cite: 54, 78].

In [None]:
X = train_df.drop(['Customer_ID', 'next_month_default'], axis=1)
y = train_df['next_month_default']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

lgbm = LGBMClassifier(class_weight='balanced', random_state=42)
lgbm.fit(X_train, y_train)

# Apply optimized threshold from report [cite: 74, 75, 76]
y_probs = lgbm.predict_proba(X_test)[:, 1]
threshold = 0.36
y_pred = (y_probs >= threshold).astype(int)

print(f"Recall: {recall_score(y_test, y_pred):.2f}")
print(f"F2 Score: {fbeta_score(y_test, y_pred, beta=2):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))