## Model serving with HSML

In this example, we are going to serve the model that we created in the model training notebook.

For the example to work, you need to have serving enabled in your project. In the settings tab for your project, select Serving to enable it. Now your UI should show a new tab called Model Serving.

A model deployment (also called "model serving") can be created directly in the Hopsworks UI, by clicking on Model Serving and then on Create New Serving. In this example, however, we will create it through code with the HSML library.

## About model serving

Models can be served via KFServing or "default" serving, which means a Docker container exposing a Flask server. For KFServing models, or models written in Tensorflow, you do not need to write a prediction file (see the section below). However, for sklearn models using default serving, you do need to proceed to write a prediction file.

In order to use KFServing, you must have Kubernetes installed and enabled on your cluster.

### Create the prediction file (only necessary for sklearn models with default serving)

In order to deploy a model, you need to write a Python file containing the logic to return a prediction from the model. Don't worry, this is usually a matter of just modifying some paths in a template script.

There is an example of such a template script [here](https://hopsworks.readthedocs.io/en/latest/hopsml/python_model_serving.html). For our case, we can modify it so that the `__init__` method reads as follows (the rest can be left as is):

```
    def __init__(self):
        """ Initializes the serving state, reads a trained model from HDFS"""
        self.model_path = "Models/fraud_tutorial_model/1/model.pkl"
        print("Copying Scikit-Learn model from HDFS to local directory")
        hdfs.copy_to_local(self.model_path)
        print("Reading local Scikit-Learn model for serving")
        self.model = joblib.load("./model.pkl")
        print("Initialization Complete")
```

If you wonder why we use the path `Models/fraud_tutorial_model/2/model.pkl`, it is useful to know that the Data Sets tab in the Hopsworks UI lets you browse among the different files in the project. Registered models will be found underneath the `Models` directory. Since we saved our model with the name `fraud_tutorial`, that's the directory we shpuld look in. `1` is just the version of the model we want to deploy.

This script needs to be put into a known location in the Hopsworks file system. Let's call the file `predict_example.py` and put it in the `Models` directory.

## Create the deployment

Here, we fetch the model we want from the model registry and define a configuration for the deployment. For the configuration, we need to specify the serving type (default or KFserving) and in this case, since we use default serving and an sklearn model, we need to give the location of the prediction script.

In [3]:
import hsml
from hsml.predictor_config import PredictorConfig

conn = hsml.connection()
mr = conn.get_model_registry()

# Use the location where you saved the prediction file
PREDICTOR_SCRIPT = "/Projects/fraud_tutorial/Models/predict_example.py"

# Use the model name from notebook 4, where we registered the model
model = mr.get_model('fraud_tutorial_model', version=2)

predictor_config = PredictorConfig(model_server="PYTHON",
                                   serving_tool="DEFAULT",
                                   script_file=PREDICTOR_SCRIPT,
                                    )

# Give it any name you want
model.deploy(name='frauddeployment3', predictor_config=predictor_config)

Connected. Call `.close()` to terminate connection gracefully.


KeyError: 'id'

Your new deployment should now be visible in the UI under Model Serving. Press run to start the deployment. 

### Using the deployment

Let's create a fake data point with which we can query the deployment.

In [33]:
test_inputs = [[4.00000000e+00, 2.87415201e-06, 2.87415201e-06,
        0.00000000e+00, 4.79025335e-05, 1.26263643e-05, 1.26263643e-05,
        1.97967580e-03, 2.79541810e-02],
       [4.00000000e+00, 3.02887719e-03, 3.01897734e-03,
        3.78048219e-04, 1.43707600e-04, 3.89754901e-06, 5.64310547e-06,
        4.37811982e-03, 5.83522294e-02]]


data = {
            "inputs": test_inputs
        }

### Fetch the serving via HSML

... and make the prediction!

In [26]:
ms = conn.get_model_serving()
deployment = ms.get_deployment('frauddeployment')

In [35]:
deployment.predict(data)

### Use REST endpoint

We can also use the REST endpoint that you can find in the Model Serving UI by clicking on the eye icon next to a model. The shorter URL is an internal endpoint that you can only reach from within Hopsworks. If you want to call it from outside, you need one of the longer URLs.

In [7]:
import requests

data = {"inputs": test_inputs}
url = 'https://791bb4a0-bb1c-11ec-8721-7bd8cdac0b54.cloud.hopsworks.ai:443/hopsworks-api/api/project/120/inference/models/frauddeployment3:predict' # Found in the UI
headers = {'Content-Type': 'application/json', 'Accept':'application/json'}

response = requests.post(url, headers=headers, json=data)

In [8]:
response.json()

{'type': 'restApiJsonResponse',
 'errorCode': 200003,
 'errorMsg': 'Authorization header not set.'}

## (Work in progress) Simple GUI with Gradio

In order to have users input natural features (age in years, transaction amounts in euros etc.), we would need to have the normalization constants available here (i.e. the min/max values that were used for min-max scaling). Then the normalized feature values could be computed. 

For now, we will have to make do with an interface where users input min-max scaled values.

In [39]:
import gradio as gr

def get_prediction(cat, tx_vol_mean, tx_vol_std, tx_amt, tx_frq, age, days_exp, loc_delta, loc_delta_avg):
    '''
    The Gradio interface does an implicit mapping between category values and ints, so we don't need to 
    specify that mapping explicitly here.
    '''
    feat_vec = [[cat, tx_vol_mean, tx_vol_std, tx_amt, tx_frq, age, days_exp, loc_delta, loc_delta_avg]]
    res = deployment.predict(data)
    prediction = res['predictions'][0]
    return prediction


input_ui = [
        gr.inputs.Dropdown(list(category_mapping.keys()), type="index", label="category"),
        gr.inputs.Slider(0, 1, label="4h avg transaction volume (normalized)"),
        gr.inputs.Slider(0, 1, label="4h avg transaction volume (normalized)"),
        gr.inputs.Slider(0, 1, label="Transaction amount (normalized"),
        gr.inputs.Slider(0, 1, label="Transaction frequency (normalized"),
        gr.inputs.Slider(0, 1, label="Age at transaction (normalized)"),
        gr.inputs.Slider(0, 1, label="Days until card expires (normalized)"),
        gr.inputs.Slider(0, 1, label="Location delta (normalized)"),
        gr.inputs.Slider(0, 1, label="4h avg location delta (normalized)"),
    ]

iface = gr.Interface(fn=get_prediction, inputs=input_ui, outputs="label").launch(share=True)

Running on local URL:  http://127.0.0.1:7871/
2022-04-25 12:18:26,912 INFO: Connected (version 2.0, client OpenSSH_7.6p1)
2022-04-25 12:18:27,737 INFO: Authentication (publickey) successful!
Running on public URL: https://58788.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)


Now you can change the values with the slider and press Submit at the bottom. A prediction will then appear in the box labeled "Output" in the upper right column.