## IS453 Financial Analytics
## Week 11 - Credit Scoring Lab Data

### Credit risk scorecard construction with scorecardpy

## HMEQ Dataset

The data set HMEQ reports characteristics and delinquency information for 5,960 home equity loans. A home equity loan is a loan where the obligor uses the equity of his or her home as the underlying collateral. 
The data is originally taken from the Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS book website - https://www.bartbaesens.com/book/6/credit-risk-analytics.
A cleaner version of the data is on Kaggle - https://www.kaggle.com/akhil14shukla/loan-defaulter-prediction/data


**Variables definition**

1. BAD: Binary response variable
    - 1 = applicant defaulted on loan or seriously delinquent; 
    - 0 = applicant paid loan or customer is current on loan payments. This is the class column.
2. LOAN: Requested loan amount
3. MORTDUE: Amount due on existing mortgage
4. VALUE: Value of current property
5. REASON: 
    - DebtCon = debt consolidation(customer uses home equity loan to pay back high interest loans)
    - HomeImp = home improvement
6. JOB: Occupational categories
    - ProfExe
    - Mgr
    - Office
    - Self
    - Sales
    - Other
7. YOJ: Years at present job
8. DEROG: Number of major derogatory reports(issued for loans taken in the past when customer fails to keep up the contract or payback on time).
9. DELINQ: Number of delinquent credit lines
10. CLAGE: Age of oldest credit line in months
11. NINQ: Number of recent credit inquiries
12. CLNO: Number of credit lines
13. DEBTINC: Debt-to-income ratio in percent

**Install scorecardpy**
This is a python version of R package scorecard. The API link has more info :

https://pypi.org/project/scorecardpy/

https://github.com/shichenxie/scorecardpy/

https://cran.r-project.org/web/packages/scorecard/scorecard.pdf

In [None]:
# make sure you are running Python 3.9 or later

# depending on your environment, either pip install or conda install the following packages
# !pip install pandas==2.1.1
# !pip install scorecardpy==0.1.9.7

# after downloading, restart your kernel

In [None]:
# ignore scorecardpy compatability warnings
import warnings

import pandas as pd, numpy as np
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
from sklearn import linear_model, metrics
import scorecardpy as sc
import pprint

**Read in the original hmeq_data.csv file**

It will have missing values, but that is alright.

In [None]:
# sample code
hmeq_data = pd.read_csv('hmeq_data.csv')

# use a copy of hmeq_data for credit risk model
hmeq_data_forsc = hmeq_data.copy()

# check for missing values
hmeq_data_forsc.isnull().sum()

Drop MORTDUE, is highly correlated with VALUE


In [None]:
# sample code

hmeq_data_forsc.drop(columns='MORTDUE', inplace=True)
hmeq_data_forsc.info()

**Do train-test split**

`sc.split_df` returns a dictionary of train and test dataset. It uses a fixed random seed.

In [None]:
# sample code

# split data into 70% train and 30% test
train, test = sc.split_df(hmeq_data_forsc, y = 'BAD', ratio = .7).values()
print(train.shape)
print(test.shape)

**Generate WOE bins**

`sc.woebin()` generates groupings as a python dictionary object and also provides a method to plot WOE for the bins.  It will optimize for IV, but will not attempt to make the trend monotonic.

Scorecardpy will automatically do the one-hot encoding as part of the binning process so it is not neccesary to do that in advance.

It will also create missing bins for all the variables, so there is no need to imput or remove missing values.

*Ignore any Python warning messages.*

In [None]:
# automatically calculate bin ranges, bins is a dictionary
bins = sc.woebin(train, y = 'BAD')

for variables, bindetails in bins.items():
    print(variables, " : ")
    display(bindetails)
    print("--"*50)

### Logistic regression with WOE encoding

Use `sc.woebin_ply` to encode the WOE values

Generate the logistic regression model based on the encoded WOE values

In [None]:
# sample code

# prepare a dataset with the WOE values for Logistic Regression training
# woebin_ply() converts original values of input data into woe
train_woe = sc.woebin_ply(train, bins)
test_woe = sc.woebin_ply(test, bins)
train_woe

In [None]:
# sample code

# create the X, y parts of data for train and test
y_train = train_woe.loc[:, 'BAD']
X_train = train_woe.loc[:, train_woe.columns != 'BAD']
y_test = test_woe.loc[:, 'BAD']
X_test = test_woe.loc[:, train_woe.columns != 'BAD']

# create a logistic regression model object
lr = linear_model.LogisticRegression(class_weight='balanced')
lr.fit(X_train, y_train)
pd.Series(np.concatenate([lr.intercept_, lr.coef_[0]]),
          index = np.concatenate([['intercept'], lr.feature_names_in_]) )

### Generate scorecard

Use `sc.scorecard` to generate the scorecard

In [None]:
# sample code

# generate a card from the model and bins. The scores will be based on probability of default from the model
# bins = bins created from sc.woebin
# lr = fitted logistic regression model
# align target odds with probabity of default = 5%
# odds = p/(1-p) = 0.05/(1-0.05) = 0.0526 ~= 1/19
card = sc.scorecard(bins, lr, X_train.columns, points0 = 600, odds0 = 1/19, pdo = 20, basepoints_eq0 = True)

pprint.pprint(card)

**Ex Q1. Calculate the approval status for a new application**

Manually calcuate the score and approval status for a cutoff score of 600 and an application with the following information:<BR>
- LOAN = 88,900
- VALUE = 57,264
- REASON = DebtCon
- JOB = Other
- YOJ = 16.0
- DEROG = 0
- DELINQ = 0
- CLAGE = 221.8
- NINQ = 0
- CLNO = 16
- DEBTINC = 36.1

Your answer here

Use `sc.scorecard_ply` to score a new application with the same values

In [None]:
# sample code

# calulate credit score for new application
col = ['LOAN','VALUE','REASON','JOB','YOJ','DEROG','DELINQ','CLAGE','NINQ','CLNO','DEBTINC']
val = [[88900,57264,'DebtCon','Other',16.0,0.0,0.0,221.8,0.0,16.0,36.1]]
new_appl = pd.DataFrame(val, columns = col)

new_appl_score = sc.scorecard_ply(new_appl, card, only_total_score = False).transpose()
new_appl_score.index = new_appl_score.index.str.replace('_points', '')

summary = pd.concat([new_appl.transpose(), new_appl_score], axis=1)
summary.columns = ['App Value', 'Points']
print(summary)

### Score all the test and train data

Use `sc.scorecard_ply` to score all the test and train data

In [None]:
# sample code

# credit score for samples in test and train
train_score = sc.scorecard_ply(train, card)
test_score = sc.scorecard_ply(test, card)

### Evaluate the model's performance

**Calculate Percentage Correctly Classified measures on the scorecard model**


In [None]:
# sample code

# check model performance at 5:1 odds of default
cutoff=560

# create sets of predicted bad to compare with actual bad
predicted_bad_train = (train_score < cutoff)
predicted_bad_train_list = predicted_bad_train.astype(int).values.flatten().tolist()
predicted_bad_test = (test_score < cutoff)
predicted_bad_test_list = predicted_bad_test.astype(int).values.flatten().tolist()

print('*** Training Data Performance ***')
print('Confusion matrix:')
print(metrics.confusion_matrix(y_train, predicted_bad_train_list))
print('PCC measures:')
print(metrics.classification_report(y_train, predicted_bad_train_list))

print('*** Test Data Performance ***')
print('Confusion matrix:')
print(metrics.confusion_matrix(y_test, predicted_bad_test_list))
print('PCC measures:')
print(metrics.classification_report(y_test, predicted_bad_test_list))

**Ex Q2. Compare the train vs test model performance**

- How do the f1-scores for the training and test dataset compare?
- How do the recall and specificity compare?
- Does the model appear to be overfitting the training data? 

Your answer here

### Evaluate effect of changing the cutoff score

Examine the distribution of the scores

In [None]:
# combine scores for train and test data to assess distribution for entire population
combined_score = pd.concat([train_score, test_score], ignore_index=True)

# plot distribution of scores on copmbined data
combined_score.hist(figsize = (7, 4), bins = 60)
plt.tight_layout()

In [None]:
# sample code
cutoff = 560

approval_count = train_score[train_score["score"]>cutoff].count()['score']
approval_rate = approval_count/train_score.shape[0]
print(f'Cutoff score of {cutoff:.0f}: {approval_count:,.0f} applications approved ({approval_rate:.1%})')

In [None]:
# sample code

# calculate expected number of defaults
odds_at_cutoff = 5

default_prob = 1/(1+odds_at_cutoff)
defaults = default_prob*approval_count
print(f'Cutoff score of {cutoff:.0f}: {defaults:.0f} defaults expected')

**Ex Q3. Evaluate the effect of adjusting the cutoff score**

Change the cutoff score to 640
- What is the number of applications approved?
- What is the number of defaults expected? 
- How does the recall and specificity performance change?

In [None]:
# your code here

Your answer here

### DIY

**Use scorecardpy for your group assignment**