# Understanding LGD scoring models

This is my workbook associated with this LinkedIn [pulse](https://www.linkedin.com/pulse/understanding-lgd-risk-denis-burakov) by [Denis Burakov](https://linktr.ee/deburky).

- Data can be found [here](https://github.com/shawn-y-sun/Credit_Risk_Model_LoanDefaults/blob/main/loan_data_defaults.csv): <https://github.com/shawn-y-sun/Credit_Risk_Model_LoanDefaults>;
- Github repo for code reference: <https://github.com/deburky/lgd-scoring-models>.

## 1 Methodologies

Loss Given Default (LGD) risk management model is widely used in order to quantify 

## 2 Modeling

Required: `pandas`, `numpy`, `matplotlib`, `scikit-learn`, `scipy`, `lightgbm`

In [10]:
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'

### 2.1 Dataset

Loading dataset

In [8]:
loan_data = pd.read_csv('loan_data_defaults.csv', index_col=0, low_memory=False)
loan_data = loan_data.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])
loan_data.head(5)

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,addr_state:VT,addr_state:WA,addr_state:WI,addr_state:WV,addr_state:WY,initial_list_status:f,initial_list_status:w,good_bad,recovery_rate,CCF
1,1077430,1314167,2500,2500,2500.0,60 months,15.27,59.83,C,C4,...,0,0,0,0,0,1,0,0,0.046832,0.817416
8,1071795,1306957,5600,5600,5600.0,60 months,21.28,152.39,F,F2,...,0,0,0,0,0,1,0,0,0.033761,0.971068
9,1071570,1306721,5375,5375,5350.0,60 months,12.69,121.45,B,B5,...,0,0,0,0,0,1,0,0,0.0501,0.874701
12,1064687,1298717,9000,9000,9000.0,36 months,13.49,305.38,C,C1,...,0,0,0,0,0,1,0,0,0.049367,0.860429
14,1069057,1303503,10000,10000,10000.0,36 months,10.65,325.74,B,B2,...,0,0,0,0,0,1,0,0,0.06451,0.456653


Data dictionary (for variables that concerned in this topic)

- `int_rate`: Interest Rate on the loan
- `MRP`: Maximum recovery period (we assumed 36 months)
- `recoveries`: Post charge off gross recovery
- `collection_recovery_fee`: Post charge off collection fee, collected from the obligor
- `total_rec_prncp`: Principal received to date
- `total_rec_int`: Interest received to date
- `total_rec_late_fee`: Late fees received to date
- `funded_amnt`: The total amount committed to that loan at that point in time.


### 2.2 Some pre-processings

Discounting the recovery cash-flows

In [16]:
# interest rate
loan_data['int_rate'] /= 100

# maximum recovery period - assuming 3 years
MRP = 36

# discount factor
loan_data['discount_factor'] = (1 + loan_data['int_rate'] / 12) ** MRP

# recovery cash-flows
loan_data['recovery_cf'] = (loan_data['recoveries']
                            + loan_data['collection_recovery_fee'] # collected from the obligor on colateral assets
                            + loan_data['total_rec_prncp'] # recovery principle
                            + loan_data['total_rec_int'] # recovery interest of loan
                            + loan_data['total_rec_late_fee'] # recovery late fee of the loan
                            )


### 2.3 Train/Test split

In [None]:
from sklearn.model_selection import train_test_split
from scipy.stats import spearmanr

### 2.4 Linear regression

### 2.5 Logistic regression on WOE

### 2.6 Boosting

## 3 Testing

### 3.1 Discrimination testing

### 3.2 Visualization of discrimination (CLAR curve)