## Walkthrough of model deployment as ML web service on Kubernetes

This notebook outlines steps for deploying a machine learning model as a simple custom-built REST API prediction service to a Kubernetes instance.

It is composed of the following sections:
 1. Prepare environment:
 2. Test the model
 3. Run the service locally with Flask
 4. Run the service using Docker
 5. Run the service on a Kubernetes instance

Note: this notebook assumes the user is running on a windows device and has the Docker, Kubectl and Helm CLIs installed. Alternative Curl command syntax would be needed for a linux-user.


### Prepare environment

**Import libraries**

In [5]:
from yaml import load, Loader
import pandas as pd
import os, glob
import requests
import json
import joblib

**Load config and chosen models**

Load configuration

In [2]:
with open('config.yaml','r') as config_file:
    config = load(config_file, Loader=Loader)

docker_registry = config['DOCKER_REGISTRY']
service_name = config['SERVICE_NAME']
api_version = config['API_VERSION']
model_repo = '..\experimentation\models'

Copy latest model to deployment directory

In [3]:
latest_model = sorted(os.listdir(model_repo))[-1]
latest_model_path = os.path.join(model_repo,latest_model)

!copy "{latest_model_path}" .

        1 file(s) copied.


### Test the model

Import data for testing

In [6]:
test_df = pd.read_csv("../experimentation/datasets/test.csv")
test_entry = test_df[test_df.Fare.notna()].copy()

Load in the ML model and call the predict method on the data.

In [8]:
# load in file with .pkl extension as the model
ml_model = joblib.load(glob.glob('*.pkl')[0])
predictions = ml_model.predict(test_entry)
predictions

array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,
       1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1,
       1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,
       0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,

### Run the prediction service locally using Flask

Run the flask app. The service will be served at http://127.0.0.1:5000/titanic/v0.0.1/predict

In [9]:
# run the service locally
!python api.py

^C


Test the web service using Curl. 

In [None]:
# this will need to be run from a separate kernel / terminal 
!curl -X POST -H "Content-Type:application/json" --data "{\"PassengerId\":[892],\"Pclass\":[3],\"Name\":[\"Kelly, Mr. James\"],\"Sex\":[\"male\"],\"Age\":[34.5],\"SibSp\":[0],\"Parch\":[0],\"Fare\":[7.8292],\"Embarked\":[\"S\"]}" http://127.0.0.1:5000/titanic/v0.0.1/predict

### Containerise the prediction service using Docker

**Build the docker image**

Create a relevant tag that includes the image repository, a name for the service and its version. Build the image and tag it with the relevant tag.

In [13]:
tag = f'{docker_registry}/{service_name}:{api_version}'
!docker build -t {tag} .

#1 [internal] load build definition from Dockerfile
#1 sha256:3653b7f4eb55c89c4ca666c0fefffb0333f8f8ac5ee2edfbcbb32b34f45053ee
#1 transferring dockerfile: 32B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:d45d1a578aaa317b91817517a57aedad5a13ca5c8f968a3ebcf9160a01480f57
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/python:3.9-slim
#3 sha256:3425157df499c84dd49181e5611a11caeed16adf15a5ddbcfa4c3002c56d3d27
#3 DONE 1.5s

#4 [1/5] FROM docker.io/library/python:3.9-slim@sha256:f4efbe5d1eb52c221fded79ddf18e4baa0606e7766afe2f07b0b330a9e79564a
#4 sha256:9ce0d84a404c9ac604ef98baa1f1065d5a70e321684b314c01df3d72c5a89693
#4 DONE 0.0s

#6 [internal] load build context
#6 sha256:98489ef9bda79a123558bad715b9a37b5e733c2152a44f8c85f7d73a65b6d3a9
#6 transferring context: 210B 0.0s done
#6 DONE 0.0s

#5 [2/5] RUN mkdir /app
#5 sha256:f8977e52fc2da4995e347b7fb878eedc812cb1c54dc0d42a662aa3db7b518aba
#5 CACHED

#7 [3/5] COPY config.yaml api.p

**Run the service on Docker**

Run the image as a container locally and map container port 5000 to localhost port 5000 for testing.

In [17]:
!docker run --rm -p 5000:5000 --name test-ml-model edlongbottom/mlwebservice/titanic:0.0.1

^C


**Test the service**

Use Curl or the python requests module to test the prediction web service

In [15]:
# again, this must be executed from a separate kernel/terminal as the kernel is occupied running the previous cell
!curl -X POST -H "Content-Type:application/json" --data "{\"PassengerId\":[892],\"Pclass\":[3],\"Name\":[\"Kelly, Mr. James\"],\"Sex\":[\"male\"],\"Age\":[34.5],\"SibSp\":[0],\"Parch\":[0],\"Fare\":[7.8292],\"Embarked\":[\"S\"]}" http://127.0.0.1:5000/titanic/v0.0.1/predict

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   167  100    20  100   147     20    147  0:00:01 --:--:--  0:00:01  163k


{"predictions":[0]}


**Tear down**

Once testing is complete, stop and remove the docker container. This step isn't required if the '--rm' flag was included when performing the docker run step.

In [None]:
!docker stop test-ml-model
!docker rm test-ml-model

### Deploy the prediction service to Kubernetes

Push the built image to Docker hub so it available remotely (you may need to log in to Docker first and create the repository if you haven't already).

In [None]:
!docker push {tag}

**Configure a kubernetes cluster** 

At this point, a kubernetes cluster is required and your kubectl CLI must be configured to set the chosen cluster as its current context. Docker desktop or Minikube can be used to spin up a cluster locally, or alternatively you could look to provision a cluster through a cloud provide (for example, AKS from Azure).

**Deploy the prediction service using Helm**

Once you have a cluster setup and you are connected to it, deploy the docker image to Kubernetes using Helm. The helm chart is included under the deployment folder.

In [22]:
!helm upgrade --install mlwebservice-titanic helm-ml-serving

Release "mlwebservice-titanic" does not exist. Installing it now.
NAME: mlwebservice-titanic
LAST DEPLOYED: Thu Dec 30 15:54:52 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None


Confirm the deployment was successful by checking the pods in the model-serving namespace (you may need to wait a minute).

In [26]:
!kubectl get pods -n model-serving

NAME                                    READY   STATUS    RESTARTS   AGE
mlwebservice-titanic-6b8c7c5dcc-z4cmc   1/1     Running   0          58s


Test the web service using Curl

In [27]:
!curl -X POST -H "Content-Type:application/json" --data "{\"PassengerId\":[892],\"Pclass\":[3],\"Name\":[\"Kelly, Mr. James\"],\"Sex\":[\"male\"],\"Age\":[34.5],\"SibSp\":[0],\"Parch\":[0],\"Fare\":[7.8292],\"Embarked\":[\"S\"]}" http://127.0.0.1:5000/titanic/v0.0.1/predict

{"predictions":[0]}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   167  100    20  100   147     20    147  0:00:01 --:--:--  0:00:01  1532


**Tear down**

Remove the service when not in use

In [28]:
!helm uninstall mlwebservice-titanic

release "mlwebservice-titanic" uninstalled
