# Boston Housing: model serving with MLFlow

This notebook is intended to demonstrate the use of MLFlow for deploying models as a prediction service. 

It follows on from the notebook in the Experimentation folder (BostonHousing_mlflow.ipynb) - picking up from where that notebook finished. Firstly, we will be using the MLFlow UI to select the best performing model and then deploying it.

Multiple deployment methods are outlined:
1. Deploying the model on a local REST server
2. Deploying the model as a containerised service locally on Docker Desktop
3. Deploying the model to a remote Kubernetes instance using Seldon Core

In each case, the model is deployed as a RESTful web service. Curl has been used in each case to test the exposed API. 

**Goal:** *deploy a model as a web service for predicting house prices.*

### Import libraries

In [2]:
import numpy as np
import joblib

from seldon_core.seldon_client import SeldonClient

In [7]:
import warnings
warnings.filterwarnings("ignore")

### Review models in the MLFlow UI 

Start up a local tracking server and point it towards the experiment SQLite db (for entities) and local file storage (for artefacts) using the mlflow CLI. Then, navigate to http://localhost:5000/ in a browser to see the MLFlow UI and compare models. 

In [5]:
!mlflow server `
    --backend-store-uri 'sqlite:///../experimentation/mlruns.db' `
    --default-artifact-root ../experimentation/mlruns `
    --host 0.0.0.0

'mlflow' is not recognized as an internal or external command,
operable program or batch file.


Look through the runs in MLFlow and select the best performing model from the relevant experiment. Assign the associated experiment and run ids to their respective variables.

In [4]:
experiment_id = '2'
run_id = '2ce79d559d0546dd91bd26a252b0ac78'

### Sense check the model

Step to sense check the pipeline. 
Load the chosen pipeline manually and pass it a record from the raw test set to see if it generates a sensible prediction.

In [5]:
# create some data for testing (this is one record from the test set used in Experimentation)
X_test_0 = np.array([5.86, 6.108, 330.0, 19.1, 9.16])

In [6]:
# load the pipeline and run it on the data
poly_pipeline = joblib.load(f'../experimentation/mlruns/{experiment_id}/{run_id}/artifacts/model/model.pkl')
poly_pipeline.predict(X_test_0.reshape(1, -1))



array([21.13982272])

### Serve the model on a local REST server

The model can be deployed with a local REST server to create a prediction web service. Use the mlflow CLI to serve your chosen model (include path to model) and expose it at port 1234.

Note: this step requires Conda to be installed. 

In [None]:
!mlflow models serve -m ../experimentation/mlruns/1/6dca30cc19b44d359dbaf994cee1084a/artifacts/model -p 1234

Test the prediction web service using Curl or the python requests module.

In [None]:
# use Curl to test the web service (shell)
!curl -X POST -H "Content-Type:application/json; format=pandas-split" \
    --data '{"columns":["INDUS","RM","TAX","PTRATIO","LSTAT"],"data":[[5.86, 6.108, 330.0, 19.1, 9.16]]}' http://127.0.0.1:1234/invocations

In [29]:
# use Curl to test the web service (windows dos)
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"INDUS\",\"RM\",\"TAX\",\"PTRATIO\",\"LSTAT\"],\"data\":[[5.86, 6.108, 330.0, 19.1, 9.16]]}" http://127.0.0.1:1234/invocations

[21.139822721718414]


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   112  100    20  100    92     20     92  0:00:01 --:--:--  0:00:01  3612


### Deploy the model as a containerised service using Docker

Build a docker image of the containerised model using the mlflow CLI. 

In [None]:
# build the docker image (shell)
!mlflow models build-docker \
  -m ../experimentation/mlruns/2/2ce79d559d0546dd91bd26a252b0ac78/artifacts/model \
  -n edlongbottom/mlwebservice/bostonhousing:0.0.2 \
  --enable-mlserver

In [None]:
# build the docker image (powershell)
!mlflow models build-docker `
  -m ../experimentation/mlruns/2/2ce79d559d0546dd91bd26a252b0ac78/artifacts/model `
  -n edlongbottom/mlwebservice/bostonhousing:0.0.2 `
  --enable-mlserver

Serve the built image on Docker and map the web service port (8080) to localhost port (1234)

In [None]:
!docker run -it -p 1234:8080 --name test-ml-model edlongbottom/mlwebservice/bostonhousing:0.0.2 

Test the prediction service API using the same request format as with the local REST server

In [None]:
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"INDUS\",\"RM\",\"TAX\",\"PTRATIO\",\"LSTAT\"],\"data\":[[5.86, 6.108, 330.0, 19.1, 9.16]]}" http://127.0.0.1:1234/invocations

Tear down the service once testing is complete

In [30]:
!docker stop test-ml-model
!docker rm test-ml-model

test-ml-model
test-ml-model


### Deploy the model to Kubernetes using Seldon Core

Set the current context in your kubectl CLI to the chosen Kubernetes cluster. And if it doesn't already have it installed, install Seldon Core.

In [None]:
# create a dedicated namespace for seldon core
!kubectl create namespace seldon-system

# use helm to install seldon-core from the template helm chart
!helm install seldon-core seldon-core-operator `
    --repo https://storage.googleapis.com/seldon-charts `
    --set usageMetrics.enabled=true `
    --set ambassador.enabled=true `
    --namespace seldon-system

Next, install Ambassador API gateway on kubernetes to route requests to our model(s).

Note: currently not working (Ambassador helm chart not compatible with version of k8s running on Docker Desktop)

In [33]:
# add the repo for ambassador (datawire) to your helm repos config
!helm repo add datawire https://www.getambassador.io
!helm repo update
    
# install the ambassador helm chart
!helm install ambassador datawire/ambassador `
    --set image.repository=docker.io/datawire/ambassador `
    --set crds.keep=false `
    --set enableAES=false `
    --namespace seldon-system

# map localhost port 8003 to port 8080 on the API gateway on k8s
!kubectl port-forward $(kubectl get pods -n seldon-system -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon-system 8003:8080

IndentationError: unexpected indent (Temp/ipykernel_7920/2990590381.py, line 7)

Next, deploy the containerised model to a Kubernetes cluster using Helm. Seldon Core have helm chart templates that can be used for the deployment, or alternatively the MLFlow website has an example YAML manifest.

In [None]:
# create a dedicated namespace for model serving
!kubectl create namespace model-serving

Deploy the model using a YAML manifest

In [None]:
# deploy the containerised model using spec defined in a deployment.yaml manifest
!kubectl apply -f ./mlflow-housing/deployment/deployment.yaml

OR, deploy using a Helm chart (ISSUE - failed calling webhook - unresolved)

In [None]:
# install a helm chart to serve the model
!helm install test-ml-seldon-app seldon-single-model `
  --repo https://storage.googleapis.com/seldon-charts `
  --set model.image=edlongbottom/mlwebservice/bostonhousing:0.0.2 `
  --namespace model-serving

Test the prediction web service using Curl or the python requests module

Note: currently NOT working, could be down to following issues:
 - Incorrectly formatted curl request
 - Problems with hosting a service at localhost port 80

In [None]:
# set up seldon client
sc = SeldonClient(
    deployment_name="mlflow-model",
    namespace="model-serving",
    gateway_endpoint="localhost:8003",
    gateway="ambassador",
)

In [None]:
r = sc.predict(transport="rest")
assert r.success == True
print(r)

Use Curl to test requests to the prediction service end-point exposed at localhost port 8003.

In [None]:
# first retrieve the IP for the API gateway (will just be localhost for Docker Desktop)
!kubectl -n ambassador get service ambassador

In [None]:
# now pass a request using curl (in Linux)
# the URL follows template - http://<ambassadorEndpoint>/seldon/<namespace>/<deploymentName>/api/v0.1/predictions
!curl http://localhost:8003/seldon/model-serving/mlflow-model/api/v0.1/predictions \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"data":{"names":["INDUS","RM","TAX","PTRATIO","LSTAT"],"tensor":{"shape":[5,1],"values":[-0.77089554,-0.2106905 ,-0.46459208,0.27510008,-0.53194571]}}}'

In [None]:
# use Curl to test the web service (in windows)
!curl -X POST -H "Content-Type:application/json" --data "{\"data\":{\"names\":[\"INDUS\",\"RM\",\"TAX\",\"PTRATIO\",\"LSTAT\"],\"tensor\":{\"shape\":[5,1],\"values\":[-0.77089554,-0.2106905 ,-0.46459208,0.27510008,-0.53194571]}}}" http://localhost:80/seldon/model-serving/mlflow-model/api/v0.1/predictions

### Tear down resources in Kubernetes

Once you are finished testing/using the web service, remove resources all from the Kubernetes instance to tidy up.

In [None]:
# delete the model deployment resources (if used YAML manifest)
!kubectl delete SeldonDeployment mlflow-model -n model-serving
!kubectl delete namespace model-serving

# delete the model deployment resources (if used Helm chart)
!helm uninstall test-ml-seldon-app --namepace model-serving
!kubectl delete namespace model-serving

# uninstall the API gateway
!helm uninstall ambassador --namespace ambassador
!kubectl delete namespace ambassador

# uninstall the seldon core operator
!helm uninstall seldon-core --namespace seldon-system
!kubectl delete namespace seldon-system