# Deploy a Serverless XGBoost Model Server
  --------------------------------------------------------------------

The following notebook demonstrates how to deploy an XGBoost model server (a.k.a <b>Nuclio-serving</b>)

#### **notebook how-to's**
* Write and test model serving class in a notebook.
* Deploy the model server function.
* Invoke and test the serving function.

<a id='top'></a>
#### **steps**
**[define a new function and its dependencies](#define-function)**<br>
**[test the model serving class locally](#test-locally)**<br>
**[deploy our serving class using as a serverless function](#deploy)**<br>
**[test our model server using HTTP request](#test-model-server)**<br>

In [1]:
# nuclio: ignore
import nuclio 

<a id='define-function'></a>
### **define a new function and its dependencies**

In [2]:
%nuclio config spec.build.baseImage = "mlrun/ml-serving:0.4.7"

%nuclio: setting spec.build.baseImage to 'mlrun/ml-serving:0.4.7'


In [3]:
%%nuclio cmd -c
pip install xgboost

## Function Code

In [4]:
import kfserving
import os
import json
import numpy as np
import xgboost as xgb

[I 200426 13:21:44 utils:129] Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[I 200426 13:21:44 utils:141] NumExpr defaulting to 8 threads.


### Model Serving Class

In [5]:
class XGBoostModel(kfserving.KFModel):
    def __init__(self, name: str, model_dir: str):
        self.name = name
        self.model_filepath = model_dir
        self._booster = None
        self.ready = None

    def load(self):
        self._booster = xgb.Booster(model_file=self.model_filepath)
        self.ready = True
    
    def predict(self, body):
        try:
            # Use of list as input for XGBoost.DMatrix is deprecated 
            # see https://github.com/dmlc/xgboost/pull/3970
            # dtype should be ....
            np_array_2d = np.asarray(body['instances'], dtype=np.float32).reshape(-1,1)
            dmatrix = xgb.DMatrix(np_array_2d)
            result: xgb.DMatrix = self._booster.predict(dmatrix)
            return result.tolist()
        except Exception as e:
            raise Exception("Failed to predict %s" % e)

The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [6]:
# nuclio: end-code

<a id='test-locally'></a>
## Test the function locally

The class above can be tested locally. Just instantiate the class, `.load()` will load the model to a local dir.

> **Verify there is a `model.bst` file in the model_dir path (generated by the training notebook)**

In [7]:
model_dir = '/User/artifacts/xgb_serving/model.bst'

In [8]:
my_server = XGBoostModel('my-model', model_dir=model_dir)
my_server.load()


We can use the `.predict(body)` method to test the model.

In [9]:
import json
my_server.predict({"instances":[[5], [10]]})



[[0.7837269902229309,
  0.025536756962537766,
  0.03395244851708412,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667],
 [0.7837269902229309,
  0.025536756962537766,
  0.03395244851708412,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667,
  0.02239767648279667]]

### mlconfig

In [10]:
from mlrun import mlconf

In [11]:
mlconf.dbpath = mlconf.dbpath or './'
mlconf.dbpath

'http://mlrun-api:8080'

In [12]:
vcs_branch = 'development'
base_vcs = f'https://raw.githubusercontent.com/mlrun/functions/{vcs_branch}/'

mlconf.hub_url = mlconf.hub_url or base_vcs + f'{name}/function.yaml'
mlconf.hub_url

'/User/repos/functions/{name}/function.yaml'

In [13]:
import os
mlconf.artifact_path = mlconf.artifact_path or f'{os.environ["V3IO_HOME"]}/artifacts'
mlconf.artifact_path

'/User/artifacts'

<a id='deploy'></a>
### **deploy our serving class using as a serverless function**
in the following section we create a new model serving function which wraps our class , and specify model and other resources.

the `models` dict store model names and the assosiated model **dir** URL (the URL can start with `S3://` and other blob store options), the faster way is to use a shared file volume, we use `.apply(mount_v3io())` to attach a v3io (iguazio data fabric) volume to our function. By default v3io will mount the current user home into the `\User` function path.

**verify the model dir does contain a valid `model.bst` file**

In [14]:
from mlrun import new_model_server, mount_v3io
import requests

In [15]:
fn = new_model_server('iris-srv',
                      model_class='XGBoostModel',
                      models={'iris_v1': '/User/artifacts/xgb_serving/model.bst'})
fn.spec.description = "xgboost iris classification server"
fn.metadata.categories = ['serving', 'models']
fn.metadata.labels = {'author': 'yaronh'}
fn.spec.imagePullPolicy = "Always"

fn.save()
fn.export("function.yaml")

[mlrun] 2020-04-26 13:22:02,033 saving function: iris-srv, tag: latest
[mlrun] 2020-04-26 13:22:02,055 function spec saved to path: function.yaml


<mlrun.runtimes.function.RemoteRuntime at 0x7f32da19fbb0>

## tests

In [16]:
if "V3IO_HOME" in list(os.environ):
    from mlrun import mount_v3io
    fn.apply(mount_v3io())
else:
    # is you set up mlrun using the instructions at
    # https://github.com/mlrun/mlrun/blob/master/hack/local/README.md
    from mlrun.platforms import mount_pvc
    fn.apply(mount_pvc('nfsvol', 'nfsvol', '/home/joyan/data'))

In [17]:
addr = fn.deploy()

[mlrun] 2020-04-26 13:22:02,102 deploy started
[nuclio] 2020-04-26 13:22:05,190 (info) Build complete
[nuclio] 2020-04-26 13:22:11,261 (info) Function deploy complete
[nuclio] 2020-04-26 13:22:11,266 done updating iris-srv, function address: 18.221.173.138:32234


<a id="test-model-server"></a>
### **test our model server using HTTP request**


We invoke our model serving function using test data, the data vector is specified in the `instances` attribute.

In [18]:
# KFServing protocol event
event_data = {"instances": [[5], [10]]}

In [19]:
import json
resp = requests.put(addr + '/iris_v1/predict', json=json.dumps(event_data))
print(resp.text)

[[0.7837269902229309, 0.025536756962537766, 0.03395244851708412, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667], [0.7837269902229309, 0.025536756962537766, 0.03395244851708412, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667, 0.02239767648279667]]


**[back to top](#top)**