# Feature Store Training Demo

This short demo will demonstrate interaction with a Feature Store, import a parquet data set, train a model, and make predictions on a new data point using interactions with that Feature Store to shape the model and make interaction with it easier.

## Sanity check

We start by installing the libraries we need to do basic interaction with the feature store from this minimal notebook

In [None]:
%pip install feast grpcio

### Confirm we can interact with the feature store using the information wired up for us

If the following doesn't work, make sure you have configured your Feature Store for use with the workbench as described at the end of the deployment notebook!

In [None]:
import feast
fs_banking = feast.FeatureStore(fs_yaml_file='/opt/app-root/src/feast-config/credit_scoring_local')
fs_banking.list_feature_views()

## Training a model with offline features

Up next, we're going to use the offline feature repository and some local copies of the data to train a very simple model.
We're using a [DecisionTreeClassifier from SciKit-Learn](https://scikit-learn.org/stable/modules/tree.html) for this simple example.
The training that we'll be performing here should be inside an AI Pipeline, Ray Job, or Kubeflow Training job in your environment.
Our notebook example is more interactive for educational reasons.

### Required Libraries

This is not necessary in most of OpenShift AI's workbench images, but for this toy example we'll ensure we have what we need.

In [None]:
%pip install -r requirements.txt

### Define a Class for managing our model and interactions with the feature store

This class simplifies our interactions with the feature store and model together by giving us a single abstraction that ties them together.
Building something similar in your pipelines may make sense, so that the lifecycle of features in model training and inference are described explicitly.
Here, we train and infer with this toy model on the CPU directly in the class.
More complex models might benefit from GPU usage, distributed training capabilities, and orchestration at larger scale for batch and realtime inference.

In [None]:
import joblib
import pandas as pd
from pathlib import Path
from sklearn import tree
from sklearn.exceptions import NotFittedError
from sklearn.preprocessing import OrdinalEncoder
from sklearn.utils.validation import check_is_fitted


class CreditScoringModel:
    categorical_features = [
        "person_home_ownership",
        "loan_intent",
        "city",
        "state",
        "location_type",
    ]

    feast_features = [
        "zipcode_features:city",
        "zipcode_features:state",
        "zipcode_features:location_type",
        "zipcode_features:tax_returns_filed",
        "zipcode_features:population",
        "zipcode_features:total_wages",
        "credit_history:credit_card_due",
        "credit_history:mortgage_due",
        "credit_history:student_loan_due",
        "credit_history:vehicle_loan_due",
        "credit_history:hard_pulls",
        "credit_history:missed_payments_2y",
        "credit_history:missed_payments_1y",
        "credit_history:missed_payments_6m",
        "credit_history:bankruptcies",
        "total_debt_calc:total_debt_due",
    ]

    target = "loan_status"
    model_filename = "model.bin"
    encoder_filename = "encoder.bin"

    def __init__(self, feature_store: feast.FeatureStore):
        # Load model
        if Path(self.model_filename).exists():
            self.classifier = joblib.load(self.model_filename)
        else:
            self.classifier = tree.DecisionTreeClassifier()

        # Load ordinal encoder
        if Path(self.encoder_filename).exists():
            self.encoder = joblib.load(self.encoder_filename)
        else:
            self.encoder = OrdinalEncoder()

        # Set up feature store
        self.fs = feature_store

    def train(self, loans):
        train_X, train_Y = self._get_training_features(loans)

        self.classifier.fit(train_X[sorted(train_X)], train_Y)
        joblib.dump(self.classifier, self.model_filename)

    def _get_training_features(self, loans):
        training_df = self.fs.get_historical_features(
            entity_df=loans, features=self.feast_features
        ).to_df()

        self._fit_ordinal_encoder(training_df)
        self._apply_ordinal_encoding(training_df)

        train_X = training_df[
            training_df.columns.drop(self.target)
            .drop("event_timestamp")
            .drop("created_timestamp")
            .drop("loan_id")
            .drop("zipcode")
            .drop("dob_ssn")
        ]
        train_X = train_X.reindex(sorted(train_X.columns), axis=1)
        train_Y = training_df.loc[:, self.target]

        return train_X, train_Y

    def _fit_ordinal_encoder(self, requests):
        self.encoder.fit(requests[self.categorical_features])
        joblib.dump(self.encoder, self.encoder_filename)

    def _apply_ordinal_encoding(self, requests):
        requests[self.categorical_features] = self.encoder.transform(
            requests[self.categorical_features]
        )

    def predict(self, request):
        # Get online features from Feast
        feature_vector = self._get_online_features_from_feast(request)

        # Join features to request features
        features = request.copy()
        features.update(feature_vector)
        features_df = pd.DataFrame.from_dict(features)

        # Apply ordinal encoding to categorical features
        self._apply_ordinal_encoding(features_df)

        # Sort columns
        features_df = features_df.reindex(sorted(features_df.columns), axis=1)

        # Drop unnecessary columns
        features_df = features_df[features_df.columns.drop("zipcode").drop("dob_ssn")]

        # Make prediction
        features_df["prediction"] = self.classifier.predict(features_df)

        # return result of credit scoring
        return features_df["prediction"].iloc[0]

    def _get_online_features_from_feast(self, request):
        zipcode = request["zipcode"][0]
        dob_ssn = request["dob_ssn"][0]
        loan_amnt= request["loan_amnt"][0]

        return self.fs.get_online_features(
            entity_rows=[{"zipcode": zipcode, "dob_ssn": dob_ssn, "loan_amnt": loan_amnt}],
            features=self.feast_features,
        ).to_dict()

    def is_model_trained(self):
        try:
            check_is_fitted(self.classifier, "tree_")
        except NotFittedError:
            return False
        return True

### Instantiate the class, linking it to the Feature Store managed by the platform

`fs_banking` is the `FeatureStore` instance we defined near the top, with our connection to the operator-managed Feast deployment.
Initializing our class with it means the online and offline stores there are what we will use.

In [None]:
model = CreditScoringModel(feature_store=fs_banking)

### Train the model using the dataset and feature store together

If we haven't trained and saved the model locally yet, we want to use the offline store and some local data to train the simple decision tree classifier.

In [None]:
if not model.is_model_trained():
    loans = pd.read_parquet("feature_repo/data/loan_table.parquet")
    model.train(loans)

### Test the model

Using the online feature store, and this incoming request with identifiable form fields will be filtered down to just the features that matter for our model and submitted for inference at it directly.

Running this example should show **Loan rejected!** at the end.

In [None]:
loan_request = {
    "zipcode": [76104],
    "dob_ssn": ["19630621_4278"],
    "person_age": [133],
    "person_income": [59000],
    "person_home_ownership": ["RENT"],
    "person_emp_length": [123.0],
    "loan_intent": ["PERSONAL"],
    "loan_amnt": [35000],
    "loan_int_rate": [16.02],
}

result = model.predict(loan_request)

if result == 0:
    print("Loan approved!")
elif result == 1:
    print("Loan rejected!")

## Wrap up

If you've taken the time to look over the code here and understand how the Feature Store is helping organize your data for this model - and others! - it's time to clean up the demo.
Proceed to `20-cleanup.ipynb` when you're ready.