# Expected losses with Cat Boost

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gaarutyunov/credit-risk/blob/master/notebooks/colab_cat_boost_el.ipynb)

## Environment settings

For better performance change Colab runtime type to GPU

In [13]:
import numpy as np
import scipy.stats
!git clone https://github.com/gaarutyunov/credit-risk.git

Cloning into 'credit-risk'...
remote: Enumerating objects: 267, done.[K
remote: Counting objects: 100% (267/267), done.[K
remote: Compressing objects: 100% (171/171), done.[K
remote: Total 267 (delta 158), reused 198 (delta 89), pack-reused 0[K
Receiving objects: 100% (267/267), 2.89 MiB | 8.02 MiB/s, done.
Resolving deltas: 100% (158/158), done.


In [10]:
%cd credit-risk

[Errno 2] No such file or directory: 'credit-risk'
/content/credit-risk/credit-risk


In [15]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wing
  Cloning https://github.com/sberbank-ai/wing.git (to revision master) to /tmp/pip-install-6d5raia5/wing_74980b844825470a98b6c81a30adc786
  Running command git clone -q https://github.com/sberbank-ai/wing.git /tmp/pip-install-6d5raia5/wing_74980b844825470a98b6c81a30adc786


To get username and key follow instructions in [readme](https://github.com/Kaggle/kaggle-api)

In [None]:
%env KAGGLE_USERNAME=<username>
%env KAGGLE_KEY=<key>

In [17]:
!kaggle datasets download wordsforthewise/lending-club

Downloading lending-club.zip to /content/credit-risk/credit-risk
 99% 1.25G/1.26G [00:07<00:00, 198MB/s]
100% 1.26G/1.26G [00:07<00:00, 178MB/s]


In [18]:
!unzip lending-club.zip

Archive:  lending-club.zip
  inflating: accepted_2007_to_2018Q4.csv.gz  
  inflating: accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv  
  inflating: rejected_2007_to_2018Q4.csv.gz  
  inflating: rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv  


In [19]:
!mkdir data

In [20]:
!mv accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv data/accepted_2007_to_2018Q4.csv
!mv rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv data/rejected_2007_to_2018Q4.csv

## Preprocessing

In [None]:
from pipeline import get_pipeline

preprocessing = get_pipeline(
    name="cat_boost",
    group="preprocessing",
    overrides=[
        "preprocessing_pipeline=raw_data"
    ],
    debug=True,
)

In [None]:
X = preprocessing.fit_transform([], y=[])

In [None]:
import pandas as pd

X['issue_d'] = pd.to_datetime(X['issue_d'])

In [None]:
X = X[X['issue_d'] >= '01.01.2017']

In [None]:
X['issue_d'] = X['issue_d'].dt.strftime('%b-%Y')

## Evaluation

In [None]:
from pipeline import get_pipeline

prediction = get_pipeline(
    name="cat_boost",
    group="prediction",
    debug=True,
)

In [None]:
X["PD"] = prediction.predict_proba(X.drop(columns=["funded_amnt", "issue_d"]))[:, 1]

In [None]:
LGD = 1.0

In [58]:
X["EL"] = LGD * X["PD"] * X["funded_amnt"]

print(f"Expected losses: {X['EL'].sum():.2f}")

Expected losses: 1788118282.23


In [None]:
def vasicek(PD, rho, alpha):
    return ( scipy.stats.norm.ppf(PD) + np.sqrt(rho) * scipy.stats.norm.ppf(alpha) ) / np.sqrt(1 - rho)

In [None]:
X["VaR"] = vasicek(X["PD"], 0, .999)
X["VaR_6"] = vasicek(X["PD"], .06, .999)

In [63]:
print(f"VaR at 99.9% (no correlation): {X['VaR'].sum():.2f}")
print(f"VaR at 99.9% (6% correlation): {X['VaR_6'].sum():.2f}")

VaR at 99.9%: 14501883904.00


In [None]:
print(f"Required capital (no correlation): f{(X['VaR'] - X['EL']).sum():.2f}")
print(f"Required capital (6% correlation): f{(X['VaR_6'] - X['EL']).sum():.2f}")