# Model Deployment using Numpy Linear Classifier

## 1. Introduction
In this workbook, we will look into the basics of deploying a model. For simplicity, we will consider a simple numpy linear classifier $$ \mathbf{Y} = \mathbf{W} \mathbf{X} + \mathbf{b}$$

For simplicity, we will consider $\mathbf{X}$ to be 6 dimensional ($\mathbb{R}^6$). i.e. 1 data point $x \in \mathbf{X}$ will be a numpy array of shape $(1,6)$. The output $\mathbf{Y}$ is 3 dimensional ($\mathbb{R}^3$). Then, the weights $\mathbf{W}$ will be a numpy array of shape $(3,6)$ and bias $\mathbf{b}$ will be a numpy array of shape $(,3)$. 

In this workbook, we will demonstrate how to deploy this numpy linear classifier as a server and how to perform query on this numpy linear classifier.

## 2. Imports and Dependencies.
The few packages needed are loaded next. Particularly, `numpy`, `mlflow` will be majorly used in this tutorial. `requests` package will be used for performing query. `json` is used to post and get response from the server.

In [1]:
import os
import sys
import mlflow
import numpy as np
from mlflow import pyfunc

# Setting a tracking uri to log the mlflow logs in a particular location tracked by 
from mlflow.tracking import MlflowClient
tracking_uri = os.environ.get("TRACKING_URL")
client = MlflowClient(tracking_uri=tracking_uri)
mlflow.set_tracking_uri(tracking_uri)

# 3. Some utility functions

In [2]:
## Utility function to add libraries to conda environment
def add_libraries_to_conda_env(_conda_env,libraries=[],conda_dependencies=[]):
    dependencies = _conda_env["dependencies"]
    dependencies = dependencies + conda_dependencies
    pip_index = None
    for _index,_element in enumerate(dependencies):
        if type(_element) == dict:
            if "pip" in _element.keys():
                pip_index = _index
                break
    dependencies[pip_index]["pip"] =  dependencies[pip_index]["pip"] + libraries
    _conda_env["dependencies"] = dependencies
    return _conda_env

In [3]:
## Model Wrapper that takes X as input using json and predicts an output Y
class ModelWrapper(mlflow.pyfunc.PythonModel):
    def load_context(self,context):
        import numpy as np
        self.model = np.load(context.artifacts['model_path'], allow_pickle=True).tolist()
        print("Model initialized")
    
    def predict(self, context, model_input):
        import numpy as np
        import json
        json_txt = ", ".join(model_input.columns)
        data_list = json.loads(json_txt)
        inputs = np.array(data_list)
        if len(inputs.shape) == 2:
            print('batch inference')
            predictions = []
            for idx in range(inputs.shape[0]):
                prediction = np.matmul(inputs[idx,:],self.model['weights'].T) + self.model['bias']
                predictions.append(prediction.tolist())
        elif len(inputs.shape) == 1:
            print('single inference')
            predictions = self.model['weights'].T * inputs + self.model['bias']
            predictions = predictions.tolist()
        else:
            raise ValueError('invalid input shape')
        return json.dumps(predictions)

In [4]:
# instantiate the python inference model wrapper for the server
model_wrapper = ModelWrapper()
env = mlflow.pytorch.get_default_conda_env()
env = add_libraries_to_conda_env(env,libraries=['numpy'])

# define the model weights randomly
np_weights = np.random.rand(3,6)
np_bias = np.random.rand(3)

# checkpointing and logging the model in mlflow
artifact_path = './np_model'
np.save(artifact_path, {'weights':np_weights, 'bias':np_bias})
model_artifacts = {"model_path" : artifact_path+'.npy'}
mlflow.pyfunc.log_model("np_model", python_model=model_wrapper, artifacts=model_artifacts, conda_env=env)

## 4. Deploying the model
The above code logs a model in the experiments tab. For more info please refer [here](https://rocketml.gitbook.io/rocketml-user-guide/experiments). After deploying the model, we can obtain the model url for performing query as shown below.

## 5. Query from the server

There are two methods to perform query... The first is using `requests` library and the other using `curl` shell command.

In [8]:
import requests
import json

url = "http://127.0.0.1:5011/invocations"
headers = {"Content-Type":"text/csv"}

# First case, run inference on single data point
np_array = np.random.rand(1,6).tolist()
json_data = json.dumps(np_array)
response = requests.post(url,data=json_data,headers=headers)
if response.status_code == 200:
    output = np.array(json.loads(response.json())).astype(np.float32)
    print(output)
else:
    print(response.status_code)
    print("REST API deployment is in progress -- please try again in a few minutes!")

# Second case, run inference on multiple data points
np_array = np.random.rand(20,6).tolist()
json_data = json.dumps(np_array)
response = requests.post(url,data=json_data,headers=headers)
if response.status_code == 200:
    output = np.array(json.loads(response.json())).astype(np.float32)
    print(output)
else:
    print(response.status_code)
    print("REST API deployment is in progress -- please try again in a few minutes!")

[[1.81811   2.361802  2.3531516]]
[[1.6456957 1.9675521 1.7067064]
 [2.5838118 2.8157287 2.2976384]
 [2.1801186 2.6144202 2.1953769]
 [1.488653  2.0579617 1.9867096]
 [1.809318  2.2533681 1.9978099]
 [1.6314347 2.6019633 2.2711916]
 [1.8024088 2.3680732 2.1608143]
 [1.5847387 2.1460922 1.6070927]
 [1.6557037 1.9244839 1.9006283]
 [1.6835335 2.0878038 1.7025021]
 [1.1496542 1.4508257 1.5056852]
 [1.8416998 2.6405902 2.111057 ]
 [2.2202673 2.6577232 2.1318862]
 [2.4101307 2.4593313 2.14497  ]
 [1.4908085 2.0428987 2.0579786]
 [1.6514273 2.2453768 2.2934952]
 [1.8059101 1.9135844 1.9444287]
 [1.8330411 2.6794806 2.3693244]
 [1.997308  2.4394622 2.3306658]
 [1.9428451 2.5416408 2.459957 ]]


In [7]:
!curl http://127.0.0.1:5011/invocations -H 'Content-Type:text/csv' -d '[[0.6499166977064089, 0.17579454262114602, 0.2688911143313131, 0.7146591854799202, 0.6497433572112488, 0.7723469203958951]]'

"[[1.9308286100142427, 2.58603809605743, 2.3403451985121264]]"