# Q3: How Robust Credit Risk Models are Over Time

This script compares Naive Bayes, Support Vector Machines, Decision Trees, [INSERT 2 NEURAL NET CLASSIFIERS] in their performance in credit risk prediction when trained and tested data differ in economic periods. The dataset used is that of https://www.kaggle.com/datasets/wordsforthewise/lending-club/data, where the economic periods detailed are between 2007 to 2018. This is split in half such that 2007-2012 and 2013-2018 denote periods 1 and 2, respectively. All models will be trained on period 1 and tested on period 2. The results of this will then be compared to determine temporal stability of each model. The classification models were developed to determine whether a candidate would have a low or high credit risk.  

## Imports

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import subprocess
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

import warnings

# Add project root to path
PROJECT_ROOT = Path().resolve().parent
sys.path.append(str(PROJECT_ROOT))
warnings.filterwarnings("ignore")

from data_processings.datasets import LendingClubDataset
from data_processings.feature_engineering import process_q3_features

## Pre-Processing of Data

Loading Dataset for Accepted Loans

In [None]:
num_samples = 100000
dataloader = LendingClubDataset()
accepted_df = dataloader.load(num_samples)

Feature Construction

In [None]:
# Retain relevant columns and build new features from existing features
accepted_df = process_q3_features(accepted_df)
accepted_df

In [None]:
# Issue Year 
accepted_df["issue_year"].unique()

Feature Type Conversion

In [None]:
# Binary Mapping for Binary Categorical Features 
accepted_df = accepted_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x) # Remove leading and trailing spaces
accepted_df['loan_status'] = accepted_df['loan_status'].map({'Fully Paid': 0, 'Charged Off': 1})
accepted_df['application_type'] = accepted_df['application_type'].map({'Individual': 0, 'Joint App': 1})
accepted_df['term'] = accepted_df['term'].map({'36 months': 0, '60 months': 1})
accepted_df

In [None]:
# One Hot Encoding for Non-Binary Categorical Features
categorical_features = ["purpose", "home_ownership", "emp_length", "verification_status"]
accepted_df = pd.get_dummies(accepted_df, columns=categorical_features, drop_first=True, dtype=int)
accepted_df

In [None]:
# Verify Types
for col, dtype in accepted_df.dtypes.items():
    print(f"{col}: {dtype}")

Train-Test Split Based on Economic Periods: Period 1 (Train) and Period (Test)

where: Period 1 (2007-2012) and Period 2 (2013-2018)

In [None]:
year_indicator_col = 'issue_year'
target_col = 'loan_status'
train_df = accepted_df[accepted_df[year_indicator_col] <= 2012].copy()
test_df = accepted_df[accepted_df[year_indicator_col] > 2013].copy()
X_train = train_df.drop(columns=[target_col])
y_train = train_df[target_col]
X_test = test_df.drop(columns=[target_col])
y_test = test_df[target_col]

Scaling with MinMax

In [None]:
MinMax_scaler = MinMaxScaler()
X_train_scaled = MinMax_scaler.fit_transform(X_train)
X_test_scaled = MinMax_scaler.transform(X_test)

## Model Training and Evaluation

In [None]:
class_mapping = {0: 'Low Risk',
                 1: 'High Risk'}

**Model 1: Naive Bayes**

Training

In [None]:
GNB = GaussianNB()
GNB.fit(X_train, y_train)

Evaluation

In [None]:
GNB_preds = GNB.predict(X_test)
GNB_summary = classification_report(y_true=y_test, y_pred=GNB_preds, labels=list(class_mapping.keys()), target_names=list(class_mapping.values()))
print(f"Naive Bayes Accuracy: {GNB_summary["accuracy"]}")

**Model 2: Support Vector Machine**

Training

In [None]:
C_val = 0.1
SVM = SVC(kernel='linear', C=C_val, random_state=10)
SVM.fit(X_train, y_train)

Evaluation

In [None]:
SVM_preds = SVM.predict(X_test)
SVM_summary = classification_report(y_true=y_test, y_pred=SVM_preds, labels=list(class_mapping.keys()), target_names=list(class_mapping.values()))
print(f"Support Vector Machine Accuracy: {SVM_summary["accuracy"]}")

**Model 3: Decision Tree**

Training

In [None]:
DTC = DecisionTreeClassifier(random_state=10, criterion="entropy")
DTC.fit(X_train, y_train)

Evaluation

In [None]:
DTC_preds = DTC.predict(X_test)
DTC_summary = classification_report(y_true=y_test, y_pred=DTC_preds, labels=list(class_mapping.keys()), target_names=list(class_mapping.values()))
print(f"Decision Tree Accuracy: {DTC_summary["accuracy"]}")

Model 4: Neural Net

Model 5: Neural Net