# Making a skorecard pipeline

A pipeline designed to solve build a scorecard follows a well defined structure

1) bucketer (maps input features in indipendent buckets/categories
    - DecisionTreeBucketer, EqualFrequencyBucketer, EqualWidthBucketer... for numerical features
    - OrdinalCategoricalBucketer for categorical features (optional)
    - **Bucketing processing (also known in the credit risk slang as "Fine and Coarse Classing")**
    - UserInputBucketer (where the bucket
2) encoder (encodes the categories in a way that they make sense to the classifier
    - WoEEncoder
    - One-Hot Encoder
    - ...
3) model
    - Logisitc Regression

First, load the demo data

In [1]:
from skorecard.datasets import load_uci_credit_card, load_credit_card

X, y = load_uci_credit_card(return_X_y=True)

cat_feat = ['EDUCATION', 'MARRIAGE']
num_feat = ['LIMIT_BAL', 'BILL_AMT1']

X.head(4)

Unnamed: 0,EDUCATION,MARRIAGE,LIMIT_BAL,BILL_AMT1
0,1,2,400000.0,201800.0
1,2,2,80000.0,80610.0
2,1,2,500000.0,499452.0
3,1,1,140000.0,450.0


# Pipelines with default bucketers

Example of a complete pipeline with bucketers

In [2]:
from skorecard.bucketers import DecisionTreeBucketer, OrdinalCategoricalBucketer
from skorecard.preprocessing import WoeEncoder
from skorecard.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline, Pipeline

In [3]:
pipe = Pipeline(
    [('Categorical-bucketer',OrdinalCategoricalBucketer(variables = cat_feat)),
     ('Numerical-Bucketer',DecisionTreeBucketer(variables =num_feat)),
     ('woe',WoeEncoder()),
     ('lr',LogisticRegression())
    ]
)

pipe.fit(X,y)

pipe.predict_proba(X)[:,1]

array([0.12936288, 0.21996592, 0.12936288, ..., 0.14723382, 0.30415108,
       0.20196604])

# Make a pipeline with the manually defined buckets

Let's manually define the bucket structure in a dictionary format (this can be saved and loaded from a json or yaml file).

In [4]:
bucket_maps = {
    'EDUCATION':{
        "feature_name":'EDUCATION', 
        "type":'categorical', 
        "missing_treatment":'separate', 
        "map":{2: 0, 1: 1, 3: 2}, 
        "right":True, 
        "specials":{}
    },
    'LIMIT_BAL':{
        "feature_name":'LIMIT_BAL', 
        "type":'numerical', 
        "missing_treatment":'separate', 
        "map":[ 25000.,  55000.,  105000., 225000., 275000., 325000.], 
        "right":True, 
        "specials":{}
    },
    'BILL_AMT1':{
        "feature_name":'BILL_AMT1', 
        "type":'numerical', 
        "missing_treatment":'separate', 
        "map":[  800. ,  12500 ,   50000,    77800, 195000. ],
        "right":True, 
        "specials":{}
    }
}


In [5]:
from skorecard.bucketers import UserInputBucketer

pipe = Pipeline(
    [('User-Bucketer',UserInputBucketer(bucket_maps)),
     ('woe',WoeEncoder()),
     ('lr',LogisticRegression())
    ]
)

pipe.fit(X,y)

pipe.predict_proba(X)[:,1]

array([0.13334997, 0.25160162, 0.13334997, ..., 0.17496323, 0.31553387,
       0.18799205])

# Make a pipeline with the bucketing process

Last but not least, a bucketing process can also be integrated in a pipeline.

Start by defining the bucketing process object as usual.

In [6]:
from skorecard.pipeline import BucketingProcess
from skorecard.bucketers import OptimalBucketer, DecisionTreeBucketer, OrdinalCategoricalBucketer

bucketing_process = BucketingProcess()#specials={'LIMIT_BAL': {'=400000.0' : [400000.0]}})
bucketing_process.register_prebucketing_pipeline(
                            DecisionTreeBucketer(variables=num_feat, max_n_bins=100, min_bin_size=0.05),
                            OrdinalCategoricalBucketer(variables=cat_feat,tol=0)
)
bucketing_process.register_bucketing_pipeline(
        OptimalBucketer(variables=num_feat, max_n_bins=4, min_bin_size=0.05),
        OptimalBucketer(variables=cat_feat,
                        variables_type='categorical',
                        max_n_bins=10,
                        min_bin_size=0.05),
)



and including it in the pipeline like any other bucketer (i.e. as first step)

In [7]:
pipe = Pipeline(
    [('bucketing_process',bucketing_process),
     ('woe',WoeEncoder()),
     ('lr',LogisticRegression())
    ]
)

pipe.fit(X,y)

pipe.predict_proba(X)[:,1]

array([0.11117531, 0.22957669, 0.11117531, ..., 0.16448902, 0.29547037,
       0.1669574 ])