# skorecard 

##### building (traditional) credit risk models in python

# Traditional credit risk modelling

When builing a credit risk model, the most commonly used algorithm is a Logisitic Regression, due to its simplicity and interpretability.<br>
<br>
A Logistic regression is a model that assumes a linear relationship between the variables (aka risk-drivers or features) and the target (the default flag). <br>

    


- The input variables are usually bucketed (according to statistical process or business knowledge) in order to address the non-linear relationship between variables and risk drivers


- This generates a set of buckets:
    - each feature value gets assigned to one bucket
    - the buckets are used to train the model
    - using the model output, each bucket is then assigned a score - the final model becomes a scorecard

# Some practical examples in python

- intro to scikit learn
- skorecard examples with sklearn integration
- sneak preview for the next features in the package

### Load Data

in `skorecard` there is a demo dataset with 4 features (2 categorical and 2 numerical) for demo and testing.<br>
We'll use it for the demo.

In [None]:
import pandas as pd
import numpy as np
from skorecard import datasets

df = datasets.load_uci_credit_card(as_frame=True)

X = df.drop(columns=["default"])
y = df["default"]
num_cols = ["LIMIT_BAL", "BILL_AMT1"]
cat_cols = ["EDUCATION", "MARRIAGE"]

df.head()

### Quick intro to scikit-learn

- scikit learn (sklearn) is the package that defined the Machine Learning workflow in python.
- scikit learn is a very extensive and complete package. In the upcoming two slides we want to introduce the concept of `transformer`, `model` and `pipeline`, as this is what 'skorecard' relates to


### sklearn transformers

- `transfromers` are classes in sklearn whose function is to perform a transformation on the data.<br>
- in general, a `transformer` preserves the number of rows in a dataset.<br>
- 'transformers` are characterized by two main functions:
    - `fit(X,y=None)` performs the necessar calculations
    - `transfrom(X,y=None)` applies the transformation to the (new) dataset
    
Example: `MinMaxScaler`: this is a transformer that changes the range of the input features X to a predifined range (normally -1 to 1 or 0,1), depending on the use case

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
mms = MinMaxScaler(feature_range=(0, 1)).fit(X)
X_transformed = mms.transform(X)
X_transformed

In [None]:
X_transformed[:,0].min(), X_transformed[:,0].max()

If we change the range for example, we see that the transformation is changed accordingly

In [None]:
mms = MinMaxScaler(feature_range=(-2, 2)).fit(X)
X_transformed = mms.transform(X)
X_transformed[:,0].min(), X_transformed[:,0].max()

## sklearn models

- models are classes that contain the (ML) models and all that comes along.
- A model has three main functions:
    - fit(X,y) - runs the optimization for the specific algorithms
    - predict(X) - returns the predictions for a new dataset
    - predict_proba(X) - returns the probabilities of the fitted model
    
Example: `Logistic Regression`

In [None]:
from sklearn.linear_model import LogisticRegression

lr = (
    LogisticRegression()
    .fit(X,y)
)
X_proba = lr.predict_proba(X)
X_proba

## sklearn pipeline - putting it all togeteher

A pipeline is a sequential set that puts together transformers and one model.<br>
The pipeline can have a sequence of multiple transformers and must finish with a model.

In [None]:
from sklearn.pipeline import Pipeline, make_pipeline

pipe = make_pipeline(
    MinMaxScaler(),
   LogisticRegression()
)

pipe.fit(X,y)

In [None]:
X_proba = pipe.predict_proba(X)
X_proba

## Skorecard - and how it fits in the sklearn API

When we consider the bucketing process, it fits in the concept of sklearn transformers.<br>
Therefore in skorecard, we implemented a set of transformers that map the input data to a set of buckets.

Example: bucket with Decision Trees

In [None]:
from skorecard.bucketers import DecisionTreeBucketer
from sklearn.preprocessing import OneHotEncoder

skorecard_pipeline = make_pipeline(
    DecisionTreeBucketer(variables=num_cols, max_n_bins=6, min_bin_size=0.1),
    OneHotEncoder(),
    LogisticRegression()
)

In [None]:
skorecard_pipeline.fit(X,y)

## Get the details of the bucketers

Generate a report of the bucketing process 

In [None]:
binner = skorecard_pipeline.steps[0][1] # get the first element of the pipeline, which is our bucketer
oh_encoder = skorecard_pipeline.steps[1][1] # get the second element of the pipeline, which is the one hot encoder
model = skorecard_pipeline.steps[2][1] # get the third element of the pipeline, which is our model

In [None]:
binner.features_bucket_mapping_['LIMIT_BAL']

In [None]:
from skorecard.reporting import create_report

create_report(X,y,num_cols[0],binner, verbose = True)

### Checking the model

In [None]:
model

In [None]:
print(f'Coefficients: {model.coef_}\n')
print(f'Intercept : {model.intercept_}\n')

# Fine and coarse classing (WIP)

Right now, we have shown an example where:
- the binning is defined through one transformer, which might not be optimized
- Ideally one would start with a lot of bins (fine classing), and then try to merge them together if they are similar enough (coarse classing)

#### skorecard support both and automatic bin merging (based on statistical properties), as well as manual merging

In [None]:
from skorecard.bucketers import OptimalBucketer

opti_skorecard_pipeline = make_pipeline(
    OptimalBucketer(variables=num_cols, max_n_bins=6, min_bin_size=0.1),
    LogisticRegression()
)

opti_skorecard_pipeline.fit(X,y)

In [None]:
opti_binner = opti_skorecard_pipeline.steps[0][1] # get the first element of the pipeline, which is our bucketer

### Sneak preview into the manual bucketing
In order to perfrom the manual bucketing, the steps are the following:

- The user defines the fine classing that is desired
- Optionally, the user can then also run the statistical optimiziation
- Once this is done, the whole pipeline is passed to thr `tweak_buckets` function
- This will launch a web ui (that can run in a notebook, as well as in the browser), where the user can merge the buckets accoring to the desired logic.
- If the statistical optimization is performed, a suggestion of the merging is presented.
- After the buckets are adapted, the user can store the object and immediately continue with the pipeline

In [None]:
from skorecard.pipeline import BucketingPipeline, tweak_buckets

prebucket_pipeline = make_pipeline(DecisionTreeBucketer(variables=num_cols, max_n_bins=100, min_bin_size=0.05))
bucket_pipeline = BucketingPipeline(make_pipeline(
    OptimalBucketer(variables=num_cols, max_n_bins=10, min_bin_size=0.05),
    OptimalBucketer(variables=cat_cols, max_n_bins=10, min_bin_size=0.05),
))
pipe = make_pipeline(prebucket_pipeline, bucket_pipeline)
pipe.fit(X, y)

pipe.transform(X).head()


Launch a web app where the manual tweaking can be done (this is still WIP)

In [None]:
tweak_buckets(pipe, X, y)

http://127.0.0.1:8050/