# Training and serving TensorFlow models with Kubernetes Jobs and Seldon

In this notebook we will guide you through one of the most important tasks for the cloud-native data scientist: Deploying ML models as a service. This is a critical step in the intelligent application pipeline as it moves machine learning algorithms out of one-off proof-of-concepts and into a production setting where it can be easily utilized as part of a larger micro-service architecture.

To do this we will use the OpenShift client tool (`oc`) to build, train and deploy a TensorFlow model on OpenShift. Most of the steps outlined in this notebook are intended to be entered directly into the command line, however, for the sake of explainability and reproducibility, we will be using the `%%bash` and `!` which allow us to execute to the command line from within a Jupyter notebook cell.  


In this notebook we will walk through:

   1) **Setting up OpenShift for Model Serving and Training**

   2) **Model Training**

   3) **Model Serving**

   4) **Interacting with a Model Service**


## 1. Setup OpenShift for Model Serving and Training  

#### Install the OC client
First, we need to install the `oc` command

In [None]:
%%bash

curl -O https://mirror.openshift.com/pub/openshift-v4/clients/oc/4.1/linux/oc.tar.gz
tar xzf oc.tar.gz
cp oc /opt/app-root/bin/

### Exercise 1. Login in to your cluster and new project

The next step is to login to OpenShift server and switch projects to the one where this Jupyter server is running. We rely on two pre-configured environment variable - `$TOKEN` and `$NAMESPACE` here. There are 2 reasons for this - 1. to make the notebook reproducible without users having to manually change anything and 2. to avoid displaying the secret (`$TOKEN`) in the Jupyter UI.

_If this step fails you might need to go to `Control Panel > Stop My Server` and provide those environment variables in Spawner UI_

**Objective:**

1) Use the `oc login` command to access your OpenShift cluster. Be sure to also pass the parameters `--server https://openshift.default.svc.cluster.local`, `--insecure-skip-tls-verify` , `--token=`$TOKEN

2) Switch to your new project namespace using the `oc project [project-name]` command. Your project name should be stored in the variable ${NAMESPACE}.  

**Note:** if you want to input a terminal command using Jupyter notebook cells, it can be done by adding the `%%bash` line to the top of the cell or the `!` character at the beginning of a line.   

In [None]:
%%bash

# 1. Login
oc login --server https://openshift.default.svc.cluster.local --insecure-skip-tls-verify --token=$TOKEN

# 2. Switch to your project 
oc project ${NAMESPACE}

## 2. Model Training

#### Apply training job resources to our cluster namespace

Before we deploy our training job, we need to apply the correct resources available in the https://gitlab.com/opendatahub/data-engineering-and-machine-learning-workshop repository. These contain the necessary `BuildConfigs` and `Templates` to build and deploy the training `Job` and serving `SeldonDeployment`.

In [None]:
%%bash

oc apply -f ../tf-random-forest/openshift

The above command will kick off a number of container image builds. We will need this images properly deployed in our namespace to successfully run the training jobs. 

While we wait, let's use the `oc logs` command to follow the build logs. 


In [None]:
!oc logs -f buildconfig.build.openshift.io/forest-mnist-train

### Exercise 2. Examine the training job parameters

Let's take a look at the parameters we can configure for the training job. Some of them come with default value, but some of them need to be configured by the user. We can output these parameters by passing the `--parameters` flag to the `oc process` command for our `forest-mnist-train` build we just did above.

**Objective:**

1) Output the parameters for this job using the oc command `process forest-mnist-train --parameters` in the cell below 

In [None]:
%%bash

# 1. Display the job parameters

oc process forest-mnist-train --parameters

#### Deploy training job

We use the predefined environment variables here again. The `MODEL_VERSION` parameter allows you to version your models - the value will be used for generation of the exported model file name so you will be able to switch between trained models in serving part.

Do not forget to change `MODEL_VERSION` for each training though otherwise the following command will fail.

In [None]:
%%bash

oc process forest-mnist-train \
-p S3_ENDPOINT_URL=${S3_ENDPOINT_URL} \
-p AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
-p AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
-p BUCKET_NAME=${NAMESPACE} \
-p MODEL_VERSION="1" | oc apply -f -

Again, you can watch the training output by using the `oc logs` command shown in the cell below.

Do not forget to change the name of job based on the output of the command above!

If you are interested to see how well your model performs, you can find the `Test Accuracy` value close to the end of the logs.

In [None]:
!oc logs -f job.batch/forest-mnist-train-1

The training job outputs a compressed model into S3 object storage (using the endpoint and credentials from the environment variables). It also creates a bucket if one does not already exists.



#### Examine your trained model stored in your remote bucket

Let's take a look at what buckets exists in the object storage and see the trained model stored in your bucket.

If you changed the bucket name for the training job, make sure you use the same value here in `Bucket=` parameter

In [None]:
import boto3
import os
from pprint import pprint

conn = boto3.client(service_name='s3', 
                    endpoint_url=os.environ['S3_ENDPOINT_URL'])

bucket = os.environ['NAMESPACE']
pprint(conn.list_buckets()['Buckets'])
objects = conn.list_objects(Bucket=bucket)

pprint(objects)
print("Stored models: ", ", ".join([x['Key'] for x in objects['Contents']]))

### 3. Model Serving 

Now that our model is trained, exported and stored in object object storage, we can serve it using Seldon. Let's take a look at the parameters for our deployment.

In [None]:
%%bash

oc process forest-mnist-serve --parameters

You can see they are very similar to the training job parameters, which means we will need to provide the S3 storage credentials again and make sure `MODEL_VERSION` match so that we deploy the correct model.

#### Deploy Trained Model as a Service

Use the `oc process` and `oc apply` commands to deploy our model service with the appropriate parameters.

In [None]:
%%bash

oc process forest-mnist-serve \
-p S3_ENDPOINT_URL=${S3_ENDPOINT_URL} \
-p AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
-p AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
-p BUCKET_NAME=${NAMESPACE} \
-p MODEL_VERSION="1" | oc apply -f -

### Exercise 3. Display pods and logs 

Now that our model is being deployed, use the `oc get pods` command to display the pods with the name "forest-mnist-predictor". After that, us the `oc logs` command again to introspect into the pod and see whats going on.
 
**Objective**

1) Use the `oc get pods` command with the `-o name` flag and filter to the predictor pod with `| grep forest-mnist-predictor`  

2) Display the pod logs using `oc logs -c forest-experiment pod/[POD_NAME]`

In [None]:
# 1. Display pods

!oc get pods -o name | grep forest-mnist-predictor

In [None]:
# 2. Inspect Logs

!oc logs -c forest-experiment pod/forest-mnist-predictor-28e5946-79c4996dd8-fp9z8

### 4. Interact with Model 

Now that the serving container has started successfully, we can load some data into our notebook (using the TF examples library) and test out our newly deployed model inference service!

In [None]:
!pip install tensorflow==1.13.*
import os, sys
import tensorflow as tf


# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)

#### Identify our served model's route

We will use the `oc get route` command again to get the URL of the model prediction endpoint and store it as Python variable and Shell variable.

In [None]:
route=!oc get route forest-mnist -o "jsonpath={.spec.host}"
route=route[0]
%env SELDON_ROUTE=$route

#### Select test sample
Next we can select our test sample. Go ahead and change the value of variable `y`in the cell below to get a different image from the test dataset. 

You will see the actual label which should later match the  model's prediction.

In [None]:
y=111
x=[mnist.test.images[y].tolist()]
print("Label: ", mnist.test.labels[y])

#### Query Model 

There are multiple ways to query the model for predictions. Let's take a look at two of them: Using a command line tool `curl` and a Python package `requests`.



**Curl**: In the cell below, we will export the variable `x` from the cell above as a shell environment variable and use it as a part of the payload to `/api/v0.1/predictions` endpoint.

You will get a JSON back which contains probabilities for all the classes. The highest probability represents the predicted label.

In [None]:
%%bash -s "$x"

curl -k -X POST -H 'Content-Type: application/json' \
    -d "{'data': {'ndarray': $1}}" \
https://${SELDON_ROUTE}/api/v0.1/predictions 2>/dev/null

**requests**: It is a bit easier to work with the JSON objects in Python, so we can actually print the guessed label with it's probability. 

Does it match the `Label` printed above?

In [None]:
import requests
import json

def get_label(predictions, names):
    result = max(predictions)
    return names[predictions.index(result)].split(":")[1], result
    

response = requests.post("https://%s/api/v0.1/predictions" % route, json={'data': {'ndarray': x}}, verify=False).json()
print("Predicted number is %s (%f) " % (get_label(response['data']['ndarray'][0], response['data']['names'])))

## Congratulations 

You have successful trained and served a Machine Learning model as a deployed application on OpenShift.