# Model Serving with Docker/Kubernetes and Tensorflow - MNIST Classification
---
*INPUT --> MODEL --> PREDICTION*

> **NOTE:** It is assumed that a model called *mnist* is already available in Hopsworks. An example of training a model for the *MNIST handwritten digit classification problem* is available in `Jupyter/end_to_end_pipelines/tensorflow/end_to_end_tensorflow.ipynb`

## Model Serving on [Hopsworks](https://github.com/logicalclocks/hopsworks)

![hops.png](../../../images/hops.png)

### The `hops` python library

`hops` is a helper library for Hops that facilitates development by hiding the complexity of running applications and iteracting with services.

Have a feature request or encountered an issue? Please let us know on <a href="https://github.com/logicalclocks/hops-util-py">github</a>.

## Serve the MNIST classifier

### Check Model Repository for best model based on accuracy

![Image7-Monitor.png](../../../images/models.gif)

### Query Model Repository for best mnist Model

In [1]:
import hsml
MODEL_NAME="mnist"
EVALUATION_METRIC="accuracy"

In [2]:
conn = hsml.connection()
mr = conn.get_model_registry()

best_model = mr.get_best_model(MODEL_NAME, EVALUATION_METRIC, "max")

Connected. Call `.close()` to terminate connection gracefully.


In [3]:
print('Model name: ' + best_model.name)
print('Model version: ' + str(best_model.version))
print(best_model.training_metrics)

Model name: mnist
Model version: 1
{'accuracy': '0.71875'}


### Create Model Serving of Exported Model

In [4]:
from hops import serving

In [5]:
# Create serving instance
SERVING_NAME = MODEL_NAME

response = serving.create_or_update(SERVING_NAME, # define a name for the serving instance
                                    best_model.model_path, model_version=best_model.version, # set the path and version of the model to be deployed
                                    kfserving=False, # the model will be served either with Docker or Kubernetes depending on the Hopsworks version
                                    topic_name="CREATE", # (optional) set the topic name or CREATE to create a new topic for inference logging
                                    instances=1, # with KFServing, set 0 instances to leverage scale-to-zero capabilities
                                    )

Inferring model server from artifact files: TENSORFLOW_SERVING
Creating serving mnist for artifact /Projects/demo_ml_meb10000//Models/mnist ...
Serving mnist successfully created


In [6]:
# List all available servings in the project
for s in serving.get_all():
    print(s.name)

mnist


In [7]:
# Get serving status
serving.get_status(SERVING_NAME)

'Stopped'

## Classify digits with the MNIST classifier

### Start Model Serving Server

In [8]:
if serving.get_status(SERVING_NAME) == 'Stopped':
    serving.start(SERVING_NAME)

Starting serving with name: mnist...
Serving with name: mnist successfully started


In [9]:
import time
while serving.get_status(SERVING_NAME) != "Running":
    time.sleep(5) # Let the serving startup correctly
time.sleep(10)

### Check Model Serving for active servings

![Image7-Monitor.png](../../../images/servings.gif)

### Send Prediction Requests to the Served Model using Hopsworks REST API

In [10]:
import json
import numpy as np

NUM_FEATURES=784

for i in range(20):
    data = {
                "signature_name": "serving_default", "instances": [np.random.rand(NUM_FEATURES).tolist()]
            }
    response = serving.make_inference_request(SERVING_NAME, data)
    print(response)

{'predictions': [[0.0171465632, 0.048792474, 0.0487240963, 0.125329539, 0.154285505, 0.0121596335, 0.405637234, 0.116209693, 0.0209982134, 0.0507171303]]}
{'predictions': [[0.00984500162, 0.0813344792, 0.0794480368, 0.229178905, 0.146066204, 0.0101964734, 0.356174409, 0.037145935, 0.0207445342, 0.0298659969]]}
{'predictions': [[0.0314147845, 0.0672961771, 0.0496617593, 0.137096703, 0.171351343, 0.00814313535, 0.397425473, 0.0644465834, 0.0328251459, 0.0403388739]]}
{'predictions': [[0.0181549136, 0.038787093, 0.0792503878, 0.264465064, 0.136571646, 0.00646544108, 0.297726512, 0.0910884291, 0.0375934504, 0.0298969876]]}
{'predictions': [[0.0168738756, 0.0996855795, 0.0380544215, 0.117498912, 0.1118147, 0.0165658947, 0.379053235, 0.102008328, 0.0545530841, 0.0638920367]]}
{'predictions': [[0.0176697783, 0.0455756187, 0.0470717624, 0.0988537148, 0.0915572941, 0.0109392768, 0.496714264, 0.101835445, 0.0549756736, 0.0348071]]}
{'predictions': [[0.0205914453, 0.0654138848, 0.0507608578, 0.10

## Monitor Prediction Requests and Responses using Kafka

In [11]:
from hops import kafka
from confluent_kafka import Producer, Consumer, KafkaError

Setup Kafka consumer and subscribe to the topic containing the prediction logs

In [12]:
TOPIC_NAME = serving.get_kafka_topic(SERVING_NAME)

config = kafka.get_kafka_default_config()
config['default.topic.config'] = {'auto.offset.reset': 'earliest'}
consumer = Consumer(config)
topics = [TOPIC_NAME]
consumer.subscribe(topics)

Read the Kafka Avro schema from Hopsworks and setup an Avro reader

In [13]:
json_schema = kafka.get_schema(TOPIC_NAME)
avro_schema = kafka.convert_json_schema_to_avro(json_schema)

Read messages from the Kafka topic, parse them with the Avro schema and print the results

In [14]:
PRINT_INSTANCES=False
PRINT_PREDICTIONS=True

for i in range(0, 5):
    msg = consumer.poll(timeout=5.0)
    if msg is not None:
        value = msg.value()
        try:
            event_dict = kafka.parse_avro_msg(value, avro_schema)
            
            print("serving: {}, version: {}, timestamp: {},\n"\
                  "        http_response_code: {}, model_server: {}, serving_tool: {}".format(
                       event_dict["modelName"],
                       event_dict["modelVersion"],
                       event_dict["requestTimestamp"],
                       event_dict["responseHttpCode"],
                       event_dict["modelServer"],
                       event_dict["servingTool"]))
            
            if PRINT_INSTANCES:
                print("instances: {}\n".format(event_dict["inferenceRequest"]))
            if PRINT_PREDICTIONS:
                print("predictions: {}\n".format(json.loads(event_dict["inferenceResponse"])["predictions"][0]))
                      
        except Exception as e:
            print("A message was read but there was an error parsing it")
            print(e)
    else:
        print("timeout.. no more messages to read from topic")

timeout.. no more messages to read from topic
serving: mnist, version: 1, timestamp: 1634307452661,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.0171465632, 0.048792474, 0.0487240963, 0.125329539, 0.154285505, 0.0121596335, 0.405637234, 0.116209693, 0.0209982134, 0.0507171303]

serving: mnist, version: 1, timestamp: 1634307452775,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.00984500162, 0.0813344792, 0.0794480368, 0.229178905, 0.146066204, 0.0101964734, 0.356174409, 0.037145935, 0.0207445342, 0.0298659969]

serving: mnist, version: 1, timestamp: 1634307452864,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.0314147845, 0.0672961771, 0.0496617593, 0.137096703, 0.171351343, 0.00814313535, 0.397425473, 0.0644465834, 0.0328251459, 0.0403388739]

serving: mnist, version: 1, timestamp: 1634307452968,
        http_res