# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Model training & UI Exploration</span>

<span style="font-width:bold; font-size: 1.4rem;">In this last notebook, we will train a model on the dataset we created in the previous tutorial. We will train our model using standard Python and Scikit-learn, although it could just as well be trained with other machine learning frameworks such as PySpark, TensorFlow, and PyTorch. We will also show some of the exploration that can be done in Hopsworks, notably the search functions and the lineage. </span>

## **🗒️ This notebook is divided in 5 main sections:** 
1. **Loading the training data**
2. **Train the model**
3. **Register model to Hopsworks model registry**.
4. **Deploy model on Hopsworks model deployment platform**.
5. **Test model deploymen and use model serving rest APIs**.

![tutorial-flow](../images/03_model.png)

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
feature_view = fs.get_feature_view("fraud_model_view", 1)

## <span style="color:#ff5f27;"> ✨ Load Training Data </span>

First, we'll need to fetch the training dataset that we created in the previous notebook. We will use January - February data training and testing.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression

train_jan_feb_x, train_jan_feb_y = feature_view.get_training_data(1)
train_mar_x, train_mar_y = feature_view.get_training_data(2)

## <span style="color:#ff5f27;"> 🏃 Train Model</span>

Next we'll train a model. Here, we set the class weight of the positive class to be twice as big as the negative class.

In [None]:
import numpy as np
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(train_jan_feb_x)

In [None]:
# Train Predictions
y_pred_train = clf.predict(train_jan_feb_x)

In [None]:
# Test Predictions
y_pred_test = clf.predict(train_mar_x)

In [None]:
from sklearn.metrics import confusion_matrix, f1_score
from matplotlib import pyplot
import seaborn as sn

%matplotlib inline

if_cm=confusion_matrix(train_mar_y, y_pred_test)
pd.DataFrame(if_cm)
df_cm = pd.DataFrame(if_cm, ['step', 'True Normal',  'True Fraud'],['Pred Normal', 'step', 'Pred Fraud'])
df_cm.drop(index="step",inplace=True)
df_cm.drop("step", axis=1, inplace=True)

pyplot.figure(figsize = (8,4))
sn.set(font_scale=1.4)#for label size
sn.heatmap(df_cm, annot=True,annot_kws={"size": 16},fmt='g')# font size

In [None]:
if_cm=confusion_matrix(train_jan_feb_y, y_pred_train)
pd.DataFrame(if_cm)
df_cm = pd.DataFrame(if_cm, ['step', 'True Normal',  'True Fraud'],['Pred Normal', 'step', 'Pred Fraud'])
df_cm.drop(index="step",inplace=True)
df_cm.drop("step", axis=1, inplace=True)

pyplot.figure(figsize = (8,4))
sn.set(font_scale=1.4)#for label size
sn.heatmap(df_cm, annot=True,annot_kws={"size": 16},fmt='g')# font size


In [None]:
from sklearn.metrics import f1_score
# Compute f1 score
metrics = {"fscore": f1_score(train_mar_y, y_pred_test, average='micro')}
metrics


## <span style="color:#ff5f27;"> Register model</span>

One of the features in Hopsworks is the model registry. This is where we can store different versions of models and compare their performance. Models from the registry can then be served as API endpoints.

Let's connect to the model registry using the HSML library from Hopsworks.

In [None]:
mr = project.get_model_registry()

In [None]:
import joblib

joblib.dump(clf, 'model.pkl')

The model needs to be set up with a Model Schema, which describes the inputs and outputs for a model.

A Model Schema can be automatically generated from training examples, as shown below.

In [None]:
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

input_schema = Schema(train_jan_feb_x)
output_schema = Schema(train_jan_feb_y)
model_schema = ModelSchema(input_schema=input_schema, output_schema=output_schema)

model_schema.to_dict()

In [None]:
model = mr.sklearn.create_model(
    name="fraud_tutorial_model",
    metrics=metrics,
    description="Isolation forest anomaly detection model",
    input_example=train_jan_feb_x.sample(),
    model_schema=model_schema
)

model.save('model.pkl')

## <span style="color:#ff5f27;"> Deploy model</span>
### About Model Serving
Models can be served via KFServing or "default" serving, which means a Docker container exposing a Flask server. For KFServing models, or models written in Tensorflow, you do not need to write a prediction file (see the section below). However, for sklearn models using default serving, you do need to proceed to write a prediction file.

In order to use KFServing, you must have Kubernetes installed and enabled on your cluster.

### Create the Prediction File
In order to deploy a model, you need to write a Python file containing the logic to return a prediction from the model. Don't worry, this is usually a matter of just modifying some paths in a template script. An example can be seen in the code block below, where we have taken this Scikit-learn template script and changed two paths (see comments).

In [None]:
%%writefile predict_example.py
import os
from sklearn.externals import joblib

class Predict(object):

    def __init__(self):
        """ Initializes the serving state, reads a trained model from HDFS"""
        # load the trained model
        self.model = joblib.load(os.environ["ARTIFACT_FILES_PATH"] + "/model.pkl")
        print("Initialization Complete")


    def predict(self, inputs):
        """ Serves a prediction request usign a trained model"""
        return self.model.predict(inputs).tolist() # Numpy Arrays are not JSON serializable


If you wonder why we use the path Models/fraud_tutorial_model/1/model.pkl, it is useful to know that the Data Sets tab in the Hopsworks UI lets you browse among the different files in the project. Registered models will be found underneath the Models directory. Since we saved our model with the name fraud_tutorial_model, that's the directory we should look in. 1 is just the version of the model we want to deploy.

This script needs to be put into a known location in the Hopsworks file system. Let's call the file predict_example.py and put it in the Models directory.

In [None]:
import os
dataset_api = project.get_dataset_api()

uploaded_file_path = dataset_api.upload("predict_example.py", "Models", overwrite=True)
predictor_script_path = os.path.join("/Projects", project.name, uploaded_file_path)

## Create the deployment
Here, we fetch the model we want from the model registry and define a configuration for the deployment. For the configuration, we need to specify the serving type (default or KFserving) and in this case, since we use default serving and an sklearn model, we need to give the location of the prediction script.

In [None]:
# Use the model name from the previous notebook.
model = mr.get_model("fraud_tutorial_model", version=1)

# Give it any name you want
deployment = model.deploy(
    name="frauddeployment3", 
    model_server="PYTHON",
    serving_tool="KSERVE",
    script_file=predictor_script_path
)

In [None]:
print("Deployment: " + deployment.name)
deployment.describe()


#### The deployment has now been registered. However, to start it you need to run:

In [None]:
deployment.start()

In [None]:
deployment.get_logs()

## Using the deployment
Let's use the input example that we registered together with the model to query the deployment.

In [None]:
test_inputs = [model.input_example]

data = {
    "inputs": test_inputs
}

deployment.predict(data)

In [None]:
deployment.get_logs()

### Use REST endpoint

You can also use a REST endpoint for your model. To do this you need to create an API key with 'serving' enabled, and retrieve the endpoint URL from the Model Serving UI.

Go to the Model Serving UI and click on the eye icon next to a model to retrieve the endpoint URL. The shorter URL is an internal endpoint that you can only reach from within Hopsworks. If you want to call it from outside, you need one of the longer URLs. 


![serving-endpoints](../images/serving_endpoints.gif)

In [None]:
import os
import requests

mr = project.get_model_registry()

# Use the model name from the previous notebook.
model = mr.get_model("fraud_tutorial_model", version=1)

test_inputs = [model.input_example]

API_KEY = "..."  # Put your API key here.
MODEL_SERVING_URL = "..." # Put model serving endppoint here.
HOST_NAME = "..." # Put your hopsworks model serving hostname here 

data = {"inputs": test_inputs}
headers = {
    "Content-Type": "application/json", "Accept": "application/json",
    "Authorization": f"ApiKey {API_KEY}",
    "Host": HOST_NAME}

response = requests.post(MODEL_SERVING_URL, verify=False, headers=headers, json=data)
response.json()

In [None]:
# Now lets test feature vectors from online store
data = {"inputs": feature_vector = feature_view.get_feature_vector({"cc_num": 4467360740682089})}
response = requests.post(url, verify=False, headers=headers, json=data)
response.json()

## Stop Deployment
To stop the deployment we simply run:

In [None]:
deployment.stop()

## <span style="color:#ff5f27;"> 🎁  Wrapping things up </span>

In this module we introduced stream feature group, performed with training data that we have created from feature view and depoyed model in production.