<center><img src=https://raw.githubusercontent.com/feast-dev/feast/master/docs/assets/feast_logo.png width=400/></center>

# Credit Risk Model Training

### Introduction

AI models have played a central role in modern credit risk assessment systems. In this example, we develop a credit risk model to predict whether a future loan will be good or bad, given some context data (presumably supplied from the loan application process). We use the modeling process to demonstrate how Feast can be used to facilitate the serving of data for training and inference use-cases.

In this notebook, we train our AI model. We will use the popular scikit-learn library (sklearn) to train a RandomForestClassifier, as this is a relatively easy choice for a baseline model.

### Setup

*The following code assumes that you have read the example README.md file, and that you have setup an environment where the code can be run. Please make sure you have addressed the prerequisite needs.*

In [1]:
# Imports
import warnings
import datetime
import feast
import joblib
import pandas as pd

from feast import FeatureStore, RepoConfig
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

In [2]:
# Ignore warnings
warnings.filterwarnings(action="ignore")

In [3]:
# Random seed
SEED = 142

This notebook assumes that you have already done the following:

1. Run the [01_Credit_Risk_Data_Prep.ipynb](01_Credit_Risk_Data_Prep.ipynb) notebook to prepare the data.
2. Run the [02_Deploying_the_Feature_Store.ipynb](02_Deploying_the_Feature_Store.ipynb) notebook to configure the feature stores and launch the feature store servers.

If you have not completed the above steps, please go back and do so before continuing. This notebook relies on the data prepared by 1, and it uses the Feast offline server stood up by 2.

### Get Label (Outcome) Data

From our previous data exploration, remember that the label data represents whether the loan was classed as "good" (1) or "bad" (0). Let's pull the labels for training, as we will use them as our "entity dataframe" when pulling features.

This is also a good time to remember that the label timestamps are lagged by 30-90 days from the context data records.

In [4]:
labels = pd.read_parquet("Feature_Store/data/train_y.parquet")

In [5]:
labels.head()

Unnamed: 0,ID,class,outcome_timestamp
0,18,0.0,2023-11-25 06:50:13
1,764,1.0,2023-11-03 09:10:13
2,504,0.0,2023-11-30 10:06:03
3,454,0.0,2023-11-17 19:37:19
4,453,1.0,2023-12-01 00:01:48


### Pull Feature Data from Feast Offline Store

In order to pull feature data from the offline store, we create a FeatureStore object that connects to the offline server (continuously running in the previous notebook).

In [6]:
# Create FeatureStore object
# (connects to the offline server deployed in 02_Deploying_the_Feature_Store.ipynb) 
store = FeatureStore(config=RepoConfig(
    project="loan_applications",
    provider="local",
    registry="Feature_Store/data/registry.db",
    offline_store={
        "type": "remote",
        "host": "localhost",
        "port": 8815
    },
    entity_key_serialization_version=2
))

Now, we can retrieve feature data by supplying our entity dataframe and feature specifications to the `get_historical_features` function. Note that this function performs a fuzzy lookback ("point-in-time") join, matching the lagged outcome timestamp to the closest application timestamp (per ID) in the context data; it also joins the "a" and "b" features that we had previously split into two tables.

In [7]:
# Get feature data
# (Joins a and b data, and selects the right timestamp--in this case their is only one each)
features = store.get_historical_features(
    entity_df=labels,
    features=[
        "data_a:duration",
        "data_a:credit_amount",
        "data_a:installment_commitment",
        "data_a:checking_status_ord",
        "data_b:residence_since",
        "data_b:age",
        "data_b:existing_credits",
        "data_b:num_dependents",
        "data_b:housing_ord"
    ]
).to_df()



Using outcome_timestamp as the event timestamp. To specify a column explicitly, please name it event_timestamp.


In [8]:
# Check the data info (800 training records)
features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype              
---  ------                  --------------  -----              
 0   ID                      800 non-null    int64              
 1   class                   800 non-null    float64            
 2   outcome_timestamp       800 non-null    datetime64[ns, UTC]
 3   duration                800 non-null    float64            
 4   credit_amount           800 non-null    float64            
 5   installment_commitment  800 non-null    float64            
 6   checking_status_ord     800 non-null    float64            
 7   residence_since         800 non-null    float64            
 8   age                     800 non-null    float64            
 9   existing_credits        800 non-null    float64            
 10  num_dependents          800 non-null    float64            
 11  housing_ord             800 non-null    float

Let's list some records. The retrieval does reorder the data, so we use the labels ID column here to filter the same IDs from above.

In [9]:
# Check some data records
features.loc[features.ID.isin(labels.ID.head())]

Unnamed: 0,ID,class,outcome_timestamp,duration,credit_amount,installment_commitment,checking_status_ord,residence_since,age,existing_credits,num_dependents,housing_ord
230,764,1.0,2023-11-03 09:10:13+00:00,24.0,2463.0,4.0,3.0,3.0,27.0,2.0,1.0,1.0
500,504,0.0,2023-11-30 10:06:03+00:00,24.0,1207.0,4.0,1.0,4.0,24.0,1.0,1.0,2.0
538,18,0.0,2023-11-25 06:50:13+00:00,24.0,12579.0,4.0,0.0,2.0,44.0,1.0,1.0,0.0
584,453,1.0,2023-12-01 00:01:48+00:00,24.0,2670.0,4.0,3.0,4.0,35.0,1.0,1.0,1.0
614,454,0.0,2023-11-17 19:37:19+00:00,24.0,4817.0,2.0,1.0,3.0,31.0,1.0,1.0,1.0


Let's also retrieve the test data (used to validate the model).

In [10]:
# Pull the test data
test_labels = pd.read_parquet("Feature_Store/data/test_y.parquet")
test_features = store.get_historical_features(
    entity_df=test_labels,
    features=[
        "data_test:duration",
        "data_test:credit_amount",
        "data_test:installment_commitment",
        "data_test:checking_status_ord",
        "data_test:residence_since",
        "data_test:age",
        "data_test:existing_credits",
        "data_test:num_dependents",
        "data_test:housing_ord"
    ]
).to_df()



Using outcome_timestamp as the event timestamp. To specify a column explicitly, please name it event_timestamp.


### Train the Model

Now that we have pulled the feature data, we are ready to train the AI model. We train a random forest classifier with a few hyperparameters set. In particular, we have used the suggested class weights from the data set description that make a "bad" loan more important to identify.

In [11]:
# Specify the model
model = RandomForestClassifier(
    n_estimators=400,
    criterion="entropy",
    max_depth=4,
    min_samples_leaf=10,
    class_weight={0:5, 1:1},
    random_state=SEED
)

In [12]:
# Remove non-feature columns from context data
feature_cols = [ 
    "duration", "credit_amount", "installment_commitment", "checking_status_ord",
    "residence_since", "age", "existing_credits", "num_dependents", "housing_ord"
]
X = features.loc[:, feature_cols]
y = features["class"]

# Fit the model
model.fit(X, y)

### Evaluate the Model

Let's evaluate our base model performance. With credit risk, recall is going to be an important measure to look at. Let's view the performance on the training data, as well as on the test data.

In [13]:
# Evaluate training set performance
train_preds = model.predict(X)
print(classification_report(y, train_preds))

              precision    recall  f1-score   support

         0.0       0.44      0.91      0.59       239
         1.0       0.93      0.51      0.66       561

    accuracy                           0.63       800
   macro avg       0.69      0.71      0.62       800
weighted avg       0.78      0.63      0.64       800



In [14]:
# Evaluate test data performance
X_test = test_features.loc[:, feature_cols]
y_test = test_labels["class"]

print(classification_report(y_test, model.predict(X_test)))

              precision    recall  f1-score   support

         0.0       0.29      0.61      0.39        61
         1.0       0.67      0.35      0.45       139

    accuracy                           0.42       200
   macro avg       0.48      0.48      0.42       200
weighted avg       0.55      0.42      0.44       200



The recall on the test set for bad loans (0 class) is 0.61, meaning that the model correctly identified close to 60% of the bad loans. However, the precision of 0.29 tells us that the model is also classifying many loans that were actually good as bad. Precision and recall are technical metrics. In order to truly assess the models value, we would need feedback from the business on the impact of misclassifications (for both good and bad loans).

The difference in performance on the training vs. test data, tells us that the model is overfitting the data, and may have a trouble generalizing. Remember that this is just a quick baseline model. To improve further, we could do things like:
- gather more data
- engineer features
- experiment with hyperparameter settings
- experiment with other model types

In fact, this is just a start. Creating AI models that meet business needs often requires a lot of guided experimentation.

### Save the Model

The last thing we do is save our trained model, so that we can pick it up later in the serving environment.

In [15]:
# Save the model to a pickle file
joblib.dump(model, "rf_model.pkl")

['rf_model.pkl']

In the next notebook, [04_Credit_Risk_Model_Serving.ipynb](04_Credit_Risk_Model_Serving.ipynb), we will load the trained model and request predictions, with input features provided by the Feast online feature server.