# Boston Housing: model serving with MLFlow

This notebook is intended to demonstrate the use of MLFlow for deploying models as a prediction service. 

It follows on from the notebook in the Experimentation folder (BostonHousing_mlflow.ipynb) - picking up from where that notebook finished. Firstly, we will be using the MLFlow UI to select the best performing model and then deploying it.

Multiple deployment methods are outlined:
1. Deploying the model on a local REST server
2. Deploying the model as a containerised service locally on Docker Desktop
3. Deploying the model to a remote Kubernetes instance using Seldon Core

In each case, the model is deployed as a RESTful web service. Curl has been used in each case to test the exposed API. 

**Goal:** *deploy a model as a web service for predicting house prices.*

### Prepare environment

**Import libraries**

In [2]:
import numpy as np
import pandas as pd
import joblib
import os
from yaml import load, Loader

from seldon_core.seldon_client import SeldonClient

In [3]:
import warnings
warnings.filterwarnings("ignore")

**Load config for environment variables**

In [4]:
with open(os.path.join(os.getcwd(),'config.yaml'),'r') as config_file:
    config = load(config_file, Loader=Loader)

docker_registry = config['DOCKER_REGISTRY']
service_name = config['SERVICE_NAME']
api_version = config['API_VERSION']

### Review models in the MLFlow UI 

Start up a local tracking server and point it towards the experiment SQLite db (for entities) and local file storage (for artefacts) using the mlflow CLI. Then, navigate to http://localhost:5000/ in a browser to see the MLFlow UI and compare models. 

In [5]:
!mlflow server `
    --backend-store-uri 'sqlite:///../experimentation/mlruns.db' `
    --default-artifact-root ../experimentation/mlruns `
    --host 0.0.0.0

'mlflow' is not recognized as an internal or external command,
operable program or batch file.


Look through the runs in MLFlow and select the best performing model from the relevant experiment. Assign the associated experiment and run ids to their respective variables.

In [5]:
experiment_id = '1'
run_id = '58a37f2f98854eb59d1ba17aa97c310c'

### Sense check the model

Step to sense check the pipeline. 
Load the chosen pipeline manually and pass it a record from the raw test set to see if it generates a sensible prediction.

In [11]:
# load in some data for testing (this is one record from the test set used in Experimentation)
# column transformer expects input as pandas dataframe
cols=['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']
df = pd.read_csv('../experimentation/datasets/housing.csv',sep=' ',skipinitialspace=True,header=None,names=cols)

In [16]:
# load the chosen pipeline and run it on the data
poly_pipeline = joblib.load(f'../experimentation/mlruns/{experiment_id}/{run_id}/artifacts/model/model.pkl')
poly_pipeline.predict(df)

array([-692807.61674176, -766423.05892107, -667810.80847113,
       -639701.97375985, -668342.41224398, -660224.81584541,
       -691817.61151689, -927892.32778756, -920715.73311918,
       -774004.59202979, -878104.44762302, -794151.65275894,
       -602895.49202216, -675499.83480587, -732619.57215977,
       -652173.63866581, -585283.06618044, -737935.43930928,
       -243165.41408338, -679566.84204045, -853352.92375651,
       -829425.18329416, -869703.90097452, -952473.25220569,
       -883771.81240339, -486895.04227474, -771528.87639355,
       -524138.76009542, -861584.89663875, -760221.29575106,
       -744350.22955016, -878046.66449642, -315546.56172076,
       -749575.27999979, -491999.62966795, -701096.34251934,
       -588016.77170139, -633648.17672167, -617811.82528888,
       -639030.3558957 , -648513.91302273, -602187.78591626,
       -587794.0994141 , -640354.86117511, -597611.55696229,
       -628750.84546731, -629452.52015124, -794630.43586567,
       -907679.04425899,

### Serve the model on a local REST server

The model can be deployed with a local REST server to create a prediction web service. Use the mlflow CLI to serve your chosen model (include path to model) and expose it at port 1234.

Note: this step requires Conda to be installed. 

In [None]:
!mlflow models serve -m ../experimentation/mlruns/1/6dca30cc19b44d359dbaf994cee1084a/artifacts/model -p 1234

Test the prediction web service using Curl or the python requests module.

In [None]:
# use Curl to test the web service (shell)
!curl -X POST -H "Content-Type:application/json; format=pandas-split" \
    --data '{"columns":["INDUS","RM","TAX","PTRATIO","LSTAT"],"data":[[5.86, 6.108, 330.0, 19.1, 9.16]]}' http://127.0.0.1:1234/invocations

In [29]:
# use Curl to test the web service (windows dos)
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"INDUS\",\"RM\",\"TAX\",\"PTRATIO\",\"LSTAT\"],\"data\":[[5.86, 6.108, 330.0, 19.1, 9.16]]}" http://127.0.0.1:1234/invocations

[21.139822721718414]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   112  100    20  100    92     20     92  0:00:01 --:--:--  0:00:01  3612


### Deploy the model as a containerised service using Docker

Build a docker image of the containerised model using the mlflow CLI. 

In [None]:
# build the docker image (powershell)
!mlflow models build-docker `
  -m ../experimentation/mlruns/1/58a37f2f98854eb59d1ba17aa97c310c/artifacts/model `
  -n edlongbottom/mlwebservice/bostonhousing:0.0.3 `
  --enable-mlserver

Serve the built image on Docker and map the web service port (8080) to localhost port (1234)

In [23]:
!docker run -it -p 1234:8080 --name test-ml-model edlongbottom/mlwebservice/bostonhousing:0.0.3

the input device is not a TTY.  If you are using mintty, try prefixing the command with 'winpty'


Test the prediction service API using the same request format as with the local REST server

In [27]:
#cols=['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']
#vals = [18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"CRIM\",\"ZN\",\"INDUS\",\"CHAS\",\"NOX\",\"RM\",\"AGE\",\"DIS\",\"RAD\",\"TAX\",\"PTRATIO\",\"B\",\"LSTAT\"],\"data\":[[18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]]}" http://127.0.0.1:1234/invocations

[-692807.6167417627]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   195  100    20  100   175     20    175  0:00:01 --:--:--  0:00:01  6290


Tear down the service once testing is complete

In [28]:
!docker stop test-ml-model
!docker rm test-ml-model

test-ml-model
test-ml-model


### Deploy the model to Kubernetes using Seldon Core

Using Seldon Core introduces a number of benefits:
 - Pre-built reusable model servers available as docker images with associated CRDs / helm charts
 - Model servers available for models packaged using sklearn or mlflow
 - Automated ingress configuration
 - Integration with monitoring and analytics solutions
 
Ingress enables fault-tolerance and scalability with the exposure of a web server externally. This functions as a reverse proxy-server, routing traffic to the necessary model API with load-balancing and TLS termination.

**Deploy the containerised model**

Set the current context in your kubectl CLI to the chosen Kubernetes cluster. And if it doesn't already have it installed, install Seldon Core using the below helm chart.

In [None]:
# create a dedicated namespace for seldon core
!kubectl create namespace seldon-system

# use helm to install seldon-core from the template helm chart (powershell)
!helm install seldon-core seldon-core-operator `
    --repo https://storage.googleapis.com/seldon-charts `
    --set usageMetrics.enabled=true `
    --set ambassador.enabled=true `
    --namespace seldon-system

Next, deploy the containerised model to a Kubernetes cluster using Helm. Seldon Core have helm chart templates that can be used for the deployment, or alternatively the MLFlow website has an example YAML manifest that can be used to create a custom helm chart (which has been done here).

In [None]:
# deploy using helm (the model will be deployed to the namespace model-serving which is defined in the chart)
!helm install mlflow-seldon-model ./helm-mlflow-deployment `
    --set image.tag=0.0.3 

**Test the web service**

Port-forward the pod containerPort to localhost port 1234 for testing.

In [None]:
!kubectl port-forward pod/mlflow-seldon-model-default-0-mlflow-seldon-model-587f9f95gxfzw 1234:8080 -n model-serving

Use Curl to test the prediction service API using the same request format as with the local REST server

In [None]:
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"CRIM\",\"ZN\",\"INDUS\",\"CHAS\",\"NOX\",\"RM\",\"AGE\",\"DIS\",\"RAD\",\"TAX\",\"PTRATIO\",\"B\",\"LSTAT\"],\"data\":[[18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]]}" http://127.0.0.1:1234/invocations

**Tear down**

Tear down the service once complete (leave the seldon-core-operator in place if continuing to the next section)

In [None]:
# uninstall the model deployment
!helm uninstall mlflow-seldon-model -n model-serving

# uninstall the seldon-core operator (comment in to perform uninstall)
#helm uninstall seldon-core -n seldon-system
#!kubectl delete namespace seldon-system

### Further steps for a robust, scalable deployment with ingress

Seldon Core's guidelines must be followed to correctly package the chosen model in the required format. It must be wrapped in a python class that runs the model, sits alongside a requirements.txt file that includes a seldon-core package entry, and be containerised where the container runs the seldon-core-microservice. 

Firstly, move the desired model to the relevant directory.

In [6]:
model_pipeline = joblib.load(f'../experimentation/mlruns/{experiment_id}/{run_id}/artifacts/model/model.pkl')
joblib.dump(model_pipeline, './seldon-deployment/model.pkl')

['./seldon-deployment/model.pkl']

Build the docker image and run locally for testing

In [None]:
# build the docker image
!docker build -t edlongbottom/mlwebservice/bostonhousing:0.0.4 . 

# run the service locally using docker
!docker run -it -p 5000:5000 --name test-ml-model edlongbottom/mlwebservice/bostonhousing:0.0.4

Test the service using Curl

In [36]:
# pass feature column names and values in curl request (linux)
# cols: ["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT"]
# vals: [18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]
!curl -X POST -H 'Content-Type: application/json' -d '{"data": { "names":["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT"],"ndarray": [[18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]]}}' http://localhost:9000/api/v1.0/predictions

**Install ingress on Kubernetes**

Next, install Ambassador API gateway on kubernetes to route requests to our model(s).

In [33]:
# add the repo for ambassador (datawire) to your helm repos config
!helm repo add datawire https://www.getambassador.io
!helm repo update
    
# install the ambassador helm chart (powershell)
!helm install ambassador datawire/ambassador `
    --set image.repository=docker.io/datawire/ambassador `
    --set crds.keep=false `
    --set enableAES=false `
    --namespace seldon-system

IndentationError: unexpected indent (Temp/ipykernel_7920/2990590381.py, line 7)

**Deploy model to Kubernetes**

Deploy the model to Kubernetes using Seldon's template Helm chart for single model serving.

In [None]:
# create a dedicated namespace for model serving
!kubectl create namespace model-serving

# install a helm chart to serve the model
!helm install test-ml-seldon-app seldon-single-model `
  --repo https://storage.googleapis.com/seldon-charts `
  --set model.image=edlongbottom/mlwebservice/bostonhousing:0.0.4 `
  --set model.imagePullPolicy=Never `
  --namespace model-serving

In [None]:
# install a helm chart to serve the model
!helm install test-ml-seldon-app ./seldon-single-model `
  --set model.image=edlongbottom/mlwebservice/bostonhousing:0.0.4 `
  --set model.imagePullPolicy=Never `
  --namespace model-serving

Test the prediction web service using Curl or the python requests module

Note: currently NOT working, could be down to following issues:
 - Incorrectly formatted curl request
 - Problems with traffic routing from API gateway to model
 - Issues mapping localhost port to ambassador

In [None]:
# map localhost port 8003 to port 8080 on the API gateway on k8s
!kubectl port-forward $(kubectl get pods -n seldon-system -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon-system 8003:8080

Use Curl to test requests to the prediction service end-point exposed at localhost port 8003.

In [None]:
# first retrieve the IP for the API gateway (will just be localhost for Docker Desktop)
!kubectl -n seldon-system get service ambassador

In [None]:
# now pass a request using curl (in Linux)
# the URL follows template - http://<ambassadorEndpoint>/seldon/<namespace>/<deploymentName>/api/v0.1/predictions
#cols=['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']
#vals = [18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]
!curl http://localhost:8003/seldon/model-serving/mlflow-model/api/v0.1/predictions \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"data":{"names":["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT"],"tensor":{"shape":[1,13],"values":[18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0]}}}'

In [None]:
# use Curl to test the web service (in windows)
!curl -X POST -H "Content-Type:application/json" --data "{\"data\":{\"names\":[\"INDUS\",\"RM\",\"TAX\",\"PTRATIO\",\"LSTAT\"],\"tensor\":{\"shape\":[5,1],\"values\":[-0.77089554,-0.2106905 ,-0.46459208,0.27510008,-0.53194571]}}}" http://localhost:80/seldon/model-serving/mlflow-model/api/v0.1/predictions

### Tear down resources in Kubernetes

Once you are finished testing/using the web service, remove resources all from the Kubernetes instance to tidy up.

In [None]:
# delete the model deployment resources (if used YAML manifest)
!kubectl delete SeldonDeployment mlflow-model -n model-serving
!kubectl delete namespace model-serving

# delete the model deployment resources (if used Helm chart)
!helm uninstall test-ml-seldon-app --namepace model-serving
!kubectl delete namespace model-serving

# uninstall the API gateway
!helm uninstall ambassador --namespace ambassador
!kubectl delete namespace ambassador

# uninstall the seldon core operator
!helm uninstall seldon-core --namespace seldon-system
!kubectl delete namespace seldon-system