## Model serving with HSML

In this example, we are going to serve the model that we created in the model training notebook.

For the example to work, you need to have serving enabled in your project. In the settings tab for your project, select Serving to enable it. Now your UI should show a new tab called Model Serving.

A model deployment (also called "model serving") can be created directly in the Hopsworks UI, by clicking on Model Serving and then on Create New Serving. In this example, however, we will create it through code with the HSML library.

![tutorial-flow](images/end_to_end.png)

### About Model Serving

Models can be served via KFServing or "default" serving, which means a Docker container exposing a Flask server. For KFServing models, or models written in Tensorflow, you do not need to write a prediction file (see the section below). However, for sklearn models using default serving, you do need to proceed to write a prediction file.

In order to use KFServing, you must have Kubernetes installed and enabled on your cluster.

### Create the Prediction File

In order to deploy a model, you need to write a Python file containing the logic to return a prediction from the model. Don't worry, this is usually a matter of just modifying some paths in a template script. An example can be seen in the code block below, where we have taken [this](https://hopsworks.readthedocs.io/en/latest/hopsml/python_model_serving.html#serving-python-based-models-on-hopsworks) Scikit-learn template script and changed two paths (see comments). 

In [1]:
%%writefile predict_example.py
from sklearn.externals import joblib
from hops import hdfs # TODO this library should not be needed.

class Predict(object):

    def __init__(self):
        """ Initializes the serving state, reads a trained model from HDFS"""
        self.model_path = "Models/fraud_tutorial_model/1/model.pkl" # Changed to our path.
        print("Copying Scikit-Learn model from HDFS to local directory")
        hdfs.copy_to_local(self.model_path)
        print("Reading local Scikit-Learn model for serving")
        self.model = joblib.load("./model.pkl") # Changed to our path.
        print("Initialization Complete")


    def predict(self, inputs):
        """ Serves a prediction request usign a trained model"""
        return self.model.predict(inputs).tolist() # Numpy Arrays are not JSON serializable

Writing predict_example.py


If you wonder why we use the path `Models/fraud_tutorial_model/1/model.pkl`, it is useful to know that the Data Sets tab in the Hopsworks UI lets you browse among the different files in the project. Registered models will be found underneath the `Models` directory. Since we saved our model with the name `fraud_tutorial_model`, that's the directory we should look in. `1` is just the version of the model we want to deploy.

This script needs to be put into a known location in the Hopsworks file system. Let's call the file `predict_example.py` and put it in the `Models` directory.

In [2]:
import os
import hopsworks

hopsworks_conn = hopsworks.connection()
project = hopsworks_conn.get_project()
dataset_api = project.get_dataset_api()

uploaded_file_path = dataset_api.upload("predict_example.py", "Models")
predictor_script_path = os.path.join("/Projects", project.name, uploaded_file_path)

Connected. Call `.close()` to terminate connection gracefully.


HBox(children=(FloatProgress(value=0.0, description='Uploading', max=800.0, style=ProgressStyle(description_wi…




## Create the deployment

Here, we fetch the model we want from the model registry and define a configuration for the deployment. For the configuration, we need to specify the serving type (default or KFserving) and in this case, since we use default serving and an sklearn model, we need to give the location of the prediction script.

In [3]:
import hsml

conn = hsml.connection()
mr = conn.get_model_registry()

# Use the model name from the previous notebook.
model = mr.get_model("fraud_tutorial_model", version=1)

# Give it any name you want
deployment = model.deploy(
    name="frauddeployment", 
    model_server="PYTHON",
    serving_tool="KSERVE", #"DEFAULT",
    script_file=predictor_script_path
)

Connected. Call `.close()` to terminate connection gracefully.


In [4]:
print("Deployment: " + deployment.name)
deployment.describe()

Deployment: frauddeployment
{
    "artifact_version": 1,
    "batching_enabled": false,
    "created": "2022-05-24T21:46:23.079Z",
    "creator": "Admin Admin",
    "id": 1,
    "inference_logging": "ALL",
    "kafka_topic_dto": {
        "name": "CREATE",
        "num_of_partitions": 1,
        "num_of_replicas": 1
    },
    "model_name": "fraud_tutorial_model",
    "model_path": "/Projects/fv_test/Models/fraud_tutorial_model",
    "model_server": "PYTHON",
    "model_version": 1,
    "name": "frauddeployment",
    "predictor": "predict_example.py",
    "predictor_resource_config": {
        "cores": 1,
        "gpus": 0,
        "memory": 1024
    },
    "requested_instances": 1,
    "serving_tool": "KSERVE"
}


The deployment has now been registered. However, to start it you need to run:

In [5]:
deployment.start()

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

### Using the deployment

Let's use the input example that we registered together with the model to query the deployment.

In [6]:
test_inputs = [model.input_example]
print(test_inputs)

data = {
    "inputs": test_inputs
}

deployment.predict(data)

[[0.0, 0.0022940899903931, 0.9125449068314664, 0.6990882077784117, 0.249450252674718, 0.0022940899903931, 0.0022940899903931, 0.0022940899903931, 0.2687748826079058]]


{'predictions': [0]}

## TODO (Davit): gif how to model serving from the UI
### Use REST endpoint 

You can also use a REST endpoint for your model. To do this you need to create an API key with 'serving' enabled, and retrieve the endpoint URL from the Model Serving UI.

Go to the Model Serving UI and click on the eye icon next to a model to retrieve the endpoint URL. The shorter URL is an internal endpoint that you can only reach from within Hopsworks. If you want to call it from outside, you need one of the longer URLs. Make sure to use https instead of http. (**TODO this should be fixed**)

### TODO (Davit): gif how to find endpoind and defind API key


In [10]:
import os
import requests

import hsml

conn = hsml.connection()
mr = conn.get_model_registry()

# Use the model name from the previous notebook.
model = mr.get_model("fraud_tutorial_model", version=1)

API_KEY = ""  # Put your API key here.
MODEL_SERVING_ENDPOINT = "" # Put model serving endppoint here.
HOST_NAME = "" # Put your hopsworks model serving endppoint here 

data = {"inputs": test_inputs}
url = os.environ["REST_ENDPOINT"] + MODEL_SERVING_ENDPOINT 
headers = {
    "Content-Type": "application/json", "Accept": "application/json",
    "Authorization": f"ApiKey {API_KEY}",
    "Host": HOST_NAME}

response = requests.post(url, verify=False, headers=headers, json=data)
response.json()

Connected. Call `.close()` to terminate connection gracefully.


{'predictions': [0]}

### Stop Deployment

To stop the deployment we simply run:

In [None]:
deployment.stop()

### Next Steps

In the next notebook we'll take a look at how to automate jobs in Hopsworks using Airflow.