# Manually defined buckets

Skorecard allows to manually define buckets.

Those can be usually loaded from a json or yaml file.

Start by loading the demo data

In [1]:
from skorecard.datasets import load_uci_credit_card, load_credit_card

X, y = load_uci_credit_card(return_X_y=True)

X.head(4)

Unnamed: 0,EDUCATION,MARRIAGE,LIMIT_BAL,BILL_AMT1
0,1,2,400000.0,201800.0
1,2,2,80000.0,80610.0
2,1,2,500000.0,499452.0
3,1,1,140000.0,450.0


## Define the buckets

Define the buckets in a python dictionary.

For every feature, the following keys must be present.

- `feature_name` (mandatory): must match the column name in the dataframe
- `type` (mandatory): type of feature (categorical or numerical)
- `missing_treatment` (optional, defaults to `separate`): define the missing treatment strategy
- `map` (mandatory): contains the actual mapping for the bins.
    - categorical features: expect a dictionary `{value:bin_index}`
    - numerical features: expect a list of boundaries `{value:bin_index}`
- `right` (optional, defaults to True): flag that indicates if to include the upper bound (True) or lower bound (False) in the bucket definition. Applicable only to numerical bucketers
- `specials` (optional, defaults to {}): dictionary of special values



In [2]:
bucket_maps = {
    'EDUCATION':{
        "feature_name":'EDUCATION', 
        "type":'categorical', 
        "missing_treatment":'separate', 
        "map":{2: 0, 1: 1, 3: 2}, 
        "right":True, 
        "specials":{} # optional field
    },
    'LIMIT_BAL':{
        "feature_name":'LIMIT_BAL', 
        "type":'numerical', 
        "missing_treatment":'separate', 
        "map":[ 25000.,  55000.,  105000., 225000., 275000., 325000.], 
        "right":True, 
        "specials":{}
    },
    'BILL_AMT1':{
        "feature_name":'BILL_AMT1', 
        "type":'numerical', 
        "missing_treatment":'separate', 
        "map":[  800. ,  12500 ,   50000,    77800, 195000. ],
        "right":True, 
        "specials":{}
    }
}



Load the `UserInputBucketer` and pass the dictionary to the object

In [3]:
from skorecard.bucketers import UserInputBucketer

uib = UserInputBucketer(bucket_maps)

Note that because the bins are already defined, UserInputBucketer does not require a fit step.

In [4]:
uib.transform(X).head(4)

Unnamed: 0,EDUCATION,MARRIAGE,LIMIT_BAL,BILL_AMT1
0,1,2,6,5
1,0,2,2,4
2,1,2,6,5
3,1,1,3,0
