# Expected losses with Cat Boost

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gaarutyunov/credit-risk/blob/master/notebooks/colab_cat_boost_el.ipynb)

## Environment settings

For better performance change Colab runtime type to GPU

In [13]:
!git clone https://github.com/gaarutyunov/credit-risk.git

Cloning into 'credit-risk'...
remote: Enumerating objects: 267, done.[K
remote: Counting objects: 100% (267/267), done.[K
remote: Compressing objects: 100% (171/171), done.[K
remote: Total 267 (delta 158), reused 198 (delta 89), pack-reused 0[K
Receiving objects: 100% (267/267), 2.89 MiB | 8.02 MiB/s, done.
Resolving deltas: 100% (158/158), done.


In [10]:
%cd credit-risk

[Errno 2] No such file or directory: 'credit-risk'
/content/credit-risk/credit-risk


In [15]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wing
  Cloning https://github.com/sberbank-ai/wing.git (to revision master) to /tmp/pip-install-6d5raia5/wing_74980b844825470a98b6c81a30adc786
  Running command git clone -q https://github.com/sberbank-ai/wing.git /tmp/pip-install-6d5raia5/wing_74980b844825470a98b6c81a30adc786


To get username and key follow instructions in [readme](https://github.com/Kaggle/kaggle-api)

In [None]:
%env KAGGLE_USERNAME=<username>
%env KAGGLE_KEY=<key>

In [17]:
!kaggle datasets download wordsforthewise/lending-club

Downloading lending-club.zip to /content/credit-risk/credit-risk
 99% 1.25G/1.26G [00:07<00:00, 198MB/s]
100% 1.26G/1.26G [00:07<00:00, 178MB/s]


In [18]:
!unzip lending-club.zip

Archive:  lending-club.zip
  inflating: accepted_2007_to_2018Q4.csv.gz  
  inflating: accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv  
  inflating: rejected_2007_to_2018Q4.csv.gz  
  inflating: rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv  


In [19]:
!mkdir data

In [20]:
!mv accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv data/accepted_2007_to_2018Q4.csv
!mv rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv data/rejected_2007_to_2018Q4.csv

## Preprocessing

In [3]:
from pipeline import get_pipeline

preprocessing = get_pipeline(
    name="cat_boost",
    group="preprocessing",
    overrides=[
        "preprocessing_pipeline=raw_data"
    ],
    debug=True,
)

_target_: pipeline.ReaderPipeline
memory: ./cache/preprocessing/raw
steps:
- - CSVReader
  - _target_: pipeline.CSVReader
    _convert_: all
    file: data/accepted_2007_to_2018Q4.csv
    columns:
    - loan_amnt
    - term
    - int_rate
    - emp_title
    - emp_length
    - home_ownership
    - annual_inc
    - verification_status
    - loan_status
    - purpose
    - addr_state
    - dti
    - earliest_cr_line
    - fico_range_high
    - inq_last_6mths
    - revol_bal
    - initial_list_status
    - out_prncp
    - total_rec_late_fee
    - collection_recovery_fee
    - last_fico_range_low
    - collections_12_mths_ex_med
    - application_type
    - tot_coll_amt
    - avg_cur_bal
    - bc_open_to_buy
    - chargeoff_within_12_mths
    - delinq_amnt
    - mo_sin_old_il_acct
    - mo_sin_old_rev_tl_op
    - mo_sin_rcnt_tl
    - mort_acc
    - mths_since_recent_bc
    - num_accts_ever_120_pd
    - num_actv_bc_tl
    - num_bc_tl
    - num_il_tl
    - num_sats
    - num_tl_120dpd_2m
   

In [4]:
X = preprocessing.fit_transform([], y=[])

In [5]:
import pandas as pd

X['issue_d'] = pd.to_datetime(X['issue_d'])

In [6]:
X = X[X['issue_d'] >= '01.01.2017']

In [7]:
X['issue_d'] = X['issue_d'].dt.strftime('%b-%Y')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


## Evaluation

In [12]:
from pipeline import get_pipeline

prediction = get_pipeline(
    name="cat_boost",
    group="prediction",
    debug=True,
)

_target_: sklearn.pipeline.Pipeline
memory: ./cache/prediction/cat_boost
steps:
- - Predictor
  - _target_: pipeline.CatBoostLoader
    _convert_: all
    load: models/cat_boost
    cat_features:
    - term
    - emp_title
    - emp_length
    - home_ownership
    - verification_status
    - purpose
    - addr_state
    - earliest_cr_line
    - initial_list_status
    - application_type
    - disbursement_method



In [56]:
X["PD"] = prediction.predict_proba(X.drop(columns=["funded_amnt", "issue_d"]))[:, 1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [57]:
LGD = 1.0

In [58]:
EL = LGD * X["PD"] * X["funded_amnt"]

print(f"Expected losses: {EL.sum():.2f}")

Expected losses: 1788118282.23


In [63]:
quant = X["PD"].quantile(.999)

quant_df = pd.DataFrame(data=quant, columns=['PD'], index=X.index)

VaR = LGD * quant * X['funded_amnt'][quant_df.index]

print(f"VaR at 99.9%: {VaR.sum():.2f}")

VaR at 99.9%: 14501883904.00


In [67]:
K = VaR.sum() - EL.sum()

print(f"Required capital: {K:.2f}")

Required capital: 12713765621.77
