# Cat Boost for default prediction

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gaarutyunov/credit-risk/blob/master/notebooks/colab_cat_boost.ipynb)

## Environment settings

For better performance change Colab runtime type to GPU

In [None]:
!git clone https://github.com/gaarutyunov/credit-risk.git

In [None]:
%cd credit-risk

In [None]:
!pip install -r requirements.txt

To get username and key follow instructions in [readme](https://github.com/Kaggle/kaggle-api)

In [None]:
%env KAGGLE_USERNAME=<username>
%env KAGGLE_KEY=<password>

In [6]:
!kaggle datasets download wordsforthewise/lending-club

Downloading lending-club.zip to /content/credit-risk
 98% 1.24G/1.26G [00:04<00:00, 282MB/s]
100% 1.26G/1.26G [00:05<00:00, 269MB/s]


In [7]:
!unzip lending-club.zip

Archive:  lending-club.zip
  inflating: accepted_2007_to_2018Q4.csv.gz  
  inflating: accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv  
  inflating: rejected_2007_to_2018Q4.csv.gz  
  inflating: rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv  


In [8]:
!mkdir data

In [9]:
!mv accepted_2007_to_2018q4.csv/accepted_2007_to_2018Q4.csv data/accepted_2007_to_2018Q4.csv
!mv rejected_2007_to_2018q4.csv/rejected_2007_to_2018Q4.csv data/rejected_2007_to_2018Q4.csv

## Preprocessing

In [None]:
from pipeline import get_pipeline

preprocessing = get_pipeline(
    name="cat_boost",
    group='preprocessing',
    debug=True,
)

In [None]:
X = preprocessing.fit_transform([], y=[])
y = preprocessing.label_transformer.label

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42, shuffle=True
)

## Training

To train with CPU remove overrides

In [None]:
from pipeline import get_pipeline

classifier = get_pipeline(
    name="cat_boost",
    group='classifier',
    overrides=["+classifier_pipeline.steps_config.0.Classifier.task_type=GPU"],
    debug=True,
)

In [None]:
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred)

In [None]:
from sklearn.metrics import balanced_accuracy_score

balanced_accuracy_score(y_test, y_pred)

In [None]:
from sklearn.metrics import RocCurveDisplay

RocCurveDisplay.from_predictions(y_test, y_pred)

In [None]:
from sklearn.metrics import PrecisionRecallDisplay

PrecisionRecallDisplay.from_predictions(y_test, y_pred)

In [None]:
from sklearn.metrics import average_precision_score

average_precision_score(y_test, y_pred)

In [None]:
classifier.steps[0][1].save_model("models/cat_boost")