# Deploy a Serverless Model Server with Nuclio-KFServing
  --------------------------------------------------------------------

The following notebook demonstrates how to deploy an XGBoost model using nuclio + KFServing (a.k.a <b>Nuclio-serving</b>)

#### **notebook how-to's**
* Write and test model serving (KFServing like) class in a notebook.
* Deploy the model server as a Nuclio-serving function.
* Invoke and test the serving function.

<a id='top'></a>
#### **steps**
**[define a new function and its dependencies](#define-function)**<br>
**[test the model serving class locally](#test-locally)**<br>
**[deploy our serving class using as a serverless function](#deploy)**<br>
**[test our model server using HTTP request](#test-model-server)**<br>

In [2]:
# nuclio: ignore
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter
import nuclio 

<a id='define-function'></a>
### **define a new function and its dependencies**

In [2]:
%nuclio config kind="nuclio:serving"
%nuclio env MODEL_CLASS=XGBoostModel

%nuclio: setting kind to 'nuclio:serving'
%nuclio: setting 'MODEL_CLASS' environment variable


In [None]:
%%nuclio cmd
pip install numpy
pip install xgboost

In [2]:
%nuclio config spec.build.baseImage = "mlrun/mlrun"

%nuclio: setting spec.build.baseImage to 'mlrun/mlrun'


In [3]:
import os
import numpy as np
import xgboost as xgb
from mlrun.runtimes import MLModelServer

  import pandas.util.testing as tm


In [4]:
class XGBoostModel(MLModelServer):
    def load(self):
        model_file, _ = self.get_model('.bst')
        self._booster = xgb.Booster(model_file=model_file)

    def predict(self, body):
        try:
            # Use of list as input is deprecated see https://github.com/dmlc/xgboost/pull/3970
            events = np.array(body['instances'])
            dmatrix = xgb.DMatrix(events)
            result: xgb.DMatrix = self._booster.predict(dmatrix)
            return result.tolist()
        except Exception as e:
            raise Exception("Failed to predict %s" % e)


The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [5]:
# nuclio: end-code

______________________________________________

<a id='test-locally'></a>
### **test the model serving class locally**
The class above can be tested locally. Just instantiate the class, `.load()` will load the model to a local dir.

> **Verify there is a `model.bst` file in the model_dir path (generated by the training notebook)**

In [6]:
# a valist model.bst file MUST EXIST in the model dir
model_dir = os.path.abspath('./')

In [7]:
my_server = XGBoostModel('my-model', model_dir=model_dir)
my_server.load()


We can use the `.predict(body)` method to test the model.

In [8]:
my_server.predict({"instances": [[5], [10]]})



[[0.5269981026649475,
  0.054608359932899475,
  0.056971631944179535,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647],
 [0.5269981026649475,
  0.054608359932899475,
  0.056971631944179535,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647,
  0.05163170397281647]]

<a id='deploy'></a>
### **deploy our serving class using as a serverless function**
in the following section we create a new model serving function which wraps our class , and specify model and other resources.

the `models` dict store model names and the assosiated model **dir** URL (the URL can start with `S3://` and other blob store options), the faster way is to use a shared file volume, we use `.apply(mount_v3io())` to attach a v3io (iguazio data fabric) volume to our function. By default v3io will mount the current user home into the `\User` function path.

**verify the model dir does contain a valid `model.bst` file**

In [9]:
from mlrun import new_model_server, mount_v3io
import requests

In [10]:
fn = new_model_server('iris-srv', 
                      models={'iris_v1': model_dir}, 
                      model_class='XGBoostModel')

fn.apply(mount_v3io()) 

<mlrun.runtimes.function.RemoteRuntime at 0x7feae58a2748>

In [11]:
addr = fn.deploy()

[mlrun] 2020-04-27 11:03:19,753 deploy started
[nuclio] 2020-04-27 11:04:12,561 (info) Build complete
[nuclio] 2020-04-27 11:04:24,756 (info) Function deploy complete
[nuclio] 2020-04-27 11:04:24,763 done updating default-iris-srv, function address: 13.58.191.176:32112


<a id="test-model-server"></a>
### **test our model server using HTTP request**


We invoke our model serving function using test data, the data vector is specified in the `instances` attribute.

In [12]:
# KFServing protocol event
event_data = {"instances":[[5], [10]]}

In [13]:
import json
resp = requests.put(addr + '/iris_v1/predict', json=json.dumps(event_data))
print(resp.text)

[[0.5269981026649475, 0.054608359932899475, 0.056971631944179535, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647], [0.5269981026649475, 0.054608359932899475, 0.056971631944179535, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647, 0.05163170397281647]]


**[back to top](#top)**