# 5.1: Basic Model Serving

This part of the real-time pipeline will first cover the basics of a model serving

By the end of this tutorial you'll learn how to

- Create model-serving functions.
- Deploy models at scale.
- Test your deployed models.

<a id="gs-tutorial-3-prerequisites"></a>

## Prerequisites

The following steps assume that you already have a model Classifier Model trained for Iris data set

<a id="gs-tutorial-3-step-setup"></a>

## Step 1: Setup and Configuration

<a id="gs-tutorial-3-import-libaries"></a>

### Importing Libraries

Run the following code to import required libraries:

In [1]:
import mlrun

<a id="gs-tutorial-4-set-mlrun-envr"></a>

### Initializing Your MLRun Environment

Use the `get_or_create_project` MLRun method to create a new project or fetch it from the DB/repository if it already exists.
Set the `project` and `user_project` parameters to the same values that you used in the call to this method in the [Part 1: MLRun Basics](./01-mlrun-basics.ipynb#gs-tutorial-1-mlrun-envr-init) tutorial notebook.

In [2]:
# Set the base project name
project_name_base = 'realtime-pipeline'

# Initialize the MLRun project object
project = mlrun.get_or_create_project(project_name_base, context="./", user_project=True)

> 2022-05-25 20:27:23,875 [info] loaded project realtime-pipeline from MLRun DB


<a id="gs-tutorial-3-step-writing-a-serving-class"></a>

## Step 2: Writing A Simple Serving Class

The serving class is initialized automatically by the model server.
All you need is to implement two mandatory methods:

- `load` &mdash; downloads the model files and loads the model into memory.
    This can be done either synchronously or asynchronously.
- `predict` &mdash; accepts a request payload and returns prediction (inference) results.

For more detailed information on serving classes, see the [MLRun documentation](https://github.com/mlrun/mlrun/blob/release/v0.6.x-latest/mlrun/serving/README.md).

The following code demonstrates a minimal scikit-learn (a.k.a. sklearn) serving-class implementation:

In [3]:
# mlrun: start-code

In [4]:
from cloudpickle import load
import numpy as np
from typing import List
import mlrun

class ClassifierModel(mlrun.serving.V2ModelServer):
    def load(self):
        """load and initialize the model and/or other elements"""
        model_file, extra_data = self.get_model('.pkl')
        self.model = load(open(model_file, 'rb'))

    def predict(self, body: dict) -> List:
        """Generate model predictions from sample."""
        feats = np.asarray(body['inputs'])
        result: np.ndarray = self.model.predict(feats)
        return result.tolist()

In [5]:
# mlrun: end-code

<a id="gs-tutorial-3-step-deploy-the-serving-function"></a>

## Step 3: Deploying the Model-Serving Function (Service)

To provision (deploy) a function for serving the model ("a serving function") you need to create an MLRun function of type `serving`.
You can do this by using the `code_to_function` MLRun method from a web notebook, or by importing an existing serving function or template from the MLRun functions marketplace.

<a id="gs-tutorial-3-convert-serving-class-to-function"></a>

### Converting a Serving Class to a Serving Function

The following code converts the `ClassifierModel` class that you defined in the previous step to a serving function.
The name of the class to be used by the serving function is set in `spec.default_class`.

In [6]:
serving_fn = mlrun.code_to_function('serving', kind='serving',image='mlrun/mlrun')
serving_fn.spec.default_class = 'ClassifierModel'

Add the model created in previous notebook by the training function  


In [18]:
model_file = project.get_artifact_uri('my_model') 

In [19]:
print(model_file)

store://artifacts/realtime-pipeline-xingsheng/my_model


In [21]:
serving_fn.add_model('my_model',model_path=model_file)

<mlrun.serving.states.TaskStep at 0x7fbb867fba90>

### Testing Your Function Locally

To test your function locally, create a test server (mock server) and test it with sample data.

In [22]:
my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''

In [23]:
server = serving_fn.to_mock_server()
server.test("/v2/models/my_model/infer", body=my_data)

> 2022-05-25 22:39:22,479 [info] model my_model was loaded
> 2022-05-25 22:39:22,480 [info] Loaded ['my_model']


X does not have valid feature names, but RandomForestClassifier was fitted with feature names


{'id': 'c31d5783a5c44d54b671d9162a141d1f',
 'model_name': 'my_model',
 'outputs': [0, 2]}

<a id="gs-tutorial-3-building-and-deploying-the-serving-function"></a>

### Building and Deploying the Serving Function

Use the `deploy` method of the MLRun serving function to build and deploy a Nuclio serving function from your serving-function code.

In [25]:
function_address = serving_fn.deploy()

> 2022-05-25 22:42:00,386 [info] Starting remote function deploy
2022-05-25 22:42:00  (info) Deploying function
2022-05-25 22:42:00  (info) Building
2022-05-25 22:42:00  (info) Staging files and preparing base images
2022-05-25 22:42:00  (info) Building processor image
2022-05-25 22:42:45  (info) Build complete
2022-05-25 22:42:53  (info) Function deploy complete
> 2022-05-25 22:42:53,872 [info] successfully deployed function: {'internal_invocation_urls': ['nuclio-realtime-pipeline-xingsheng-serving.default-tenant.svc.cluster.local:8080'], 'external_invocation_urls': ['realtime-pipeline-xingsheng-serving-realtime-pipeline-xingsheng.default-tenant.app.us-sales-341.iguazio-cd1.com/']}


<a id="gs-tutorial-3-step-using-the-live-serving-function"></a>

## Step 4: Using the Live Model-Serving Function

After the function is deployed successfully, the serving function has a new HTTP endpoint for handling serving requests.
The example tutorial serving function receives HTTP prediction (inference) requests on this endpoint;
calls the `infer` method to get the requested predictions; and returns the results on the same endpoint.

In [26]:
print (f'The address for the function is {function_address} \n')

!curl $function_address

The address for the function is http://realtime-pipeline-xingsheng-serving-realtime-pipeline-xingsheng.default-tenant.app.us-sales-341.iguazio-cd1.com/ 

{"name": "ModelRouter", "version": "v2", "extensions": []}

<a id="gs-tutorial-3-testing-the-model-server"></a>

### Testing the Model Server

Test your model server by sending data for inference.
The `invoke` serving-function method enables programmatic testing of the serving function.
For model inference (predictions), specify the model name followed by `infer`:
```
/v2/models/{model_name}/infer
```
For complete model-service API commands &mdash; such as for list models (`models`), get model health (`ready`), and model explanation (`explain`) &mdash; see the [MLRun documentation](https://github.com/mlrun/mlrun/blob/release/v0.6.x-latest/mlrun/serving/README.md#model-server-api).

In [27]:
serving_fn.invoke('/v2/models/my_model/infer', my_data)

> 2022-05-25 22:43:34,935 [info] invoking function: {'method': 'POST', 'path': 'http://nuclio-realtime-pipeline-xingsheng-serving.default-tenant.svc.cluster.local:8080/v2/models/my_model/infer'}


{'id': '1ca676fd-28d9-4310-a77e-dd4864096598',
 'model_name': 'my_model',
 'outputs': [0, 2]}

<a id="gs-tutorial-3-step-view-serving-func-in-ui"></a>

## Step 5: Viewing the Nuclio Serving Function on the Dashboard

On the **Projects** dashboard page, select the project and then select "Real-time functions (Nuclio)".

![Nuclio](./images/nuclio-deploy.png)