# Tutorial 3 - Model Serving


Mlrun serving can take MLRun models or standard model files and produce managed real-time serverless functions based on Nuclio real-time serverless engine, which can be deployed everywhere

Nuclio is a high-performance "serverless" framework focused on data, I/O, and compute intensive workloads. More details can be found on https://github.com/nuclio/nuclio 

Simple model serving classes can be written in Python or be taken from a set of pre-developed ML/DL classes, the code can handle complex data, feature preparation, binary data (images/video). The serving engine supports the full lifecycle including auto generation of micro-services, APIs, load-balancing, logging, model monitoring, configuration management, etc.



#### Load Project

In [1]:
from os import path, getenv
from mlrun import load_project
from mlrun import mlconf

project_path = path.abspath('conf')
project = load_project(project_path)

print(f'Project path: {project_path}\nProject name: {project.name}')
print(f'Artifacts path: {mlconf.artifact_path}')

Project path: /User/new-tutorials/conf
Project name: getting-started-tutorial-admin
Artifacts path: /v3io/projects/{{run.project}}/artifacts


## Writing A Simple Serving Class


The class is initialized automatically by the model server , all you need is to implement two mandatory methods:

load() - download the model file(s) and load the model into memory, note this can be done synchronously or asynchronously
predict() - accept request payload and return prediction/inference results

More detailed information on serving classes can be found here : https://github.com/mlrun/mlrun/blob/master/mlrun/serving/README.md


In [2]:
# nuclio: start-code

Below is an example of minimal sklearn serving function example:

In [3]:
from cloudpickle import load
import numpy as np
from typing import List
import mlrun

class ClassifierModel(mlrun.serving.V2ModelServer):
    def load(self):
        """load and initialize the model and/or other elements"""
        model_file, extra_data = self.get_model('.pkl')
        self.model = load(open(model_file, 'rb'))

    def predict(self, body: dict) -> List:
        """Generate model predictions from sample."""
        feats = np.asarray(body['inputs'])
        result: np.ndarray = self.model.predict(feats)
        return result.tolist()

In [4]:
# nuclio: end-code

## Deploying the Model Serving Function (Service)

In order to provision a serving function we need to create an MLRun function of type serving , this can be done by using the code_to_function() call from a notebook. We can also import an existing serving function/template from the marketplace.

The below code converts the ClassifierModel class sbove to a serving function. The name of the class to be used is set using spec.default_class


In [5]:
from mlrun import code_to_function
serving_fn = code_to_function('serving', kind='serving',image='mlrun/mlrun')
serving_fn.spec.default_class = 'ClassifierModel'

Add the model created in previous notebook by the training function  


In [6]:
model_file = f'store://{project.name}/train-iris-train_iris_model'
serving_fn.add_model('my_model',model_path=model_file)

<mlrun.serving.states.TaskState at 0x7fa4dad7ac90>

In [7]:
serving_fn = serving_fn.apply(mlrun.mount_v3io(remote='projects',mount_path='/v3io/projects'))
serving_fn = project.set_function(serving_fn)

### Deploy the function 
This will build and deploy a Nuclio serving function. Once function is deployed successfully, http endpoint will be available which can get inference requests, call the predict method and return the result

In [8]:
serving_fn.deploy()


> 2021-01-04 20:26:07,255 [info] Starting remote function deploy
2021-01-04 20:26:07  (info) Deploying function
2021-01-04 20:26:07  (info) Building
2021-01-04 20:26:07  (info) Staging files and preparing base images
2021-01-04 20:26:07  (info) Building processor image
2021-01-04 20:26:10  (info) Build complete
2021-01-04 20:26:16  (info) Function deploy complete
> 2021-01-04 20:26:17,150 [info] function deployed, address=default-tenant.app.product-a.iguazio-cd0.com:30117


'http://default-tenant.app.product-a.iguazio-cd0.com:30117'

In [9]:
serving_fn.spec.base_spec.get('metadata').get('name')

'serving'

## Using the new Model Serving Function


New endpoint was created for the function and can be accessed via http


In [11]:
function_address = serving_fn.status.address
print (f'The address for the function is {function_address} \n')

!curl $function_address

The address for the function is default-tenant.app.product-a.iguazio-cd0.com:30117 

{"name": "ModelRouter", "version": "v2", "extensions": []}

Test server by sending data for inference
The invoke method enables to programmatically test the function
For infer, specify the model name followed by infer =>  /v2/models/{model_name}/infer
For complete model service API commands such as list models, get model health and explain see https://github.com/mlrun/mlrun/blob/master/mlrun/serving/README.md#model-server-api 

In [12]:
my_data = '''{"inputs":[[5.1, 3.5, 1.4, 0.2],[7.7, 3.8, 6.7, 2.2]]}'''
serving_fn.invoke('/v2/models/my_model/infer', my_data)


{'id': '216ae597-6514-48bd-8dc2-fa6f277cc81a',
 'model_name': 'my_model',
 'outputs': [0, 2]}

## View Nuclio serving function in UI

##### In the projects screen, select the project and then select "Real-time functions (nuclio)"

<img src="./images/nuclio-deploy.png" alt="Jobs" width="800"/>

##### Save the project

In [13]:
project.save()

## Done!

Congratulation! You've completed tutorial 3 of the Iguazio Data Science Platform.
Go to [Tutorial 4](tutorial-4-Running-pipeline.ipynb) to learn how to create an automated pipeline for this project