# Model Serving with Docker/Kubernetes and Tensorflow - MNIST Classification
---
*INPUT --> MODEL --> PREDICTION*

> **NOTE:** It is assumed that a model called *mnist* is already available in Hopsworks. An example of training a model for the *MNIST handwritten digit classification problem* is available in `Jupyter/end_to_end_pipelines/tensorflow/end_to_end_tensorflow.ipynb`

## Model Serving on [Hopsworks](https://github.com/logicalclocks/hopsworks)

![hops.png](../../../images/hops.png)

### The `hops` python library

`hops` is a helper library for Hops that facilitates development by hiding the complexity of running applications and iteracting with services.

Have a feature request or encountered an issue? Please let us know on <a href="https://github.com/logicalclocks/hops-util-py">github</a>.

## Serve the MNIST classifier

### Check Model Repository for best model based on accuracy

![Image7-Monitor.png](../../../images/models.gif)

### Query Model Repository for best mnist Model

In [1]:
from hops import model
from hops.model import Metric
MODEL_NAME="mnist"
EVALUATION_METRIC="accuracy"

In [2]:
best_model = model.get_best_model(MODEL_NAME, EVALUATION_METRIC, Metric.MAX)

In [3]:
print('Model name: ' + best_model['name'])
print('Model version: ' + str(best_model['version']))
print(best_model['metrics'])

Model name: mnist
Model version: 1
{'accuracy': '0.75'}


### Create Model Serving of Exported Model

In [4]:
from hops import serving

In [5]:
# Create serving instance
SERVING_NAME = MODEL_NAME
MODEL_PATH="/Models/" + best_model['name']

response = serving.create_or_update(SERVING_NAME, # define a name for the serving instance
                                    MODEL_PATH, model_version=best_model['version'], # set the path and version of the model to be deployed
                                    kfserving=False, # the model will be served either with Docker or Kubernetes depending on the Hopsworks version
                                    topic_name="CREATE", # (optional) set the topic name or CREATE to create a new topic for inference logging
                                    instances=1, # with KFServing, set 0 instances to leverage scale-to-zero capabilities
                                    )

Inferring model server from artifact files: TENSORFLOW_SERVING
Creating serving mnist for artifact /Projects/demo_ml_meb10000//Models/mnist ...
Serving mnist successfully created


In [6]:
# List all available servings in the project
for s in serving.get_all():
    print(s.name)

mnist


In [7]:
# Get serving status
serving.get_status(SERVING_NAME)

'Stopped'

## Classify digits with the MNIST classifier

### Start Model Serving Server

In [8]:
if serving.get_status(SERVING_NAME) == 'Stopped':
    serving.start(SERVING_NAME)

Starting serving with name: mnist...
Serving with name: mnist successfully started


In [9]:
import time
while serving.get_status(SERVING_NAME) != "Running":
    time.sleep(5) # Let the serving startup correctly
time.sleep(5)

### Check Model Serving for active servings

![Image7-Monitor.png](../../../images/servings.gif)

### Send Prediction Requests to the Served Model using Hopsworks REST API

In [10]:
import json
import numpy as np

NUM_FEATURES=784

for i in range(20):
    data = {
                "signature_name": "serving_default", "instances": [np.random.rand(NUM_FEATURES).tolist()]
            }
    response = serving.make_inference_request(SERVING_NAME, data)
    print(response)

{'predictions': [[0.0570081174, 0.0655516908, 0.0384815671, 0.0236714445, 0.326048434, 0.208096072, 0.0829753354, 0.081687, 0.0640862733, 0.0523940176]]}
{'predictions': [[0.0410101265, 0.0716211, 0.0464931, 0.0277518351, 0.239403486, 0.129260719, 0.0967375934, 0.106829077, 0.165632, 0.075260967]]}
{'predictions': [[0.0452777408, 0.0637608, 0.0491008312, 0.0400233492, 0.346460611, 0.0908036157, 0.0619194657, 0.0982931, 0.114815064, 0.0895455331]]}
{'predictions': [[0.0314507075, 0.0585816801, 0.0445142835, 0.0438556336, 0.273136318, 0.215605736, 0.0794836283, 0.130447, 0.0517816357, 0.0711433962]]}
{'predictions': [[0.0698710307, 0.0869943276, 0.0301038921, 0.0247928668, 0.286450505, 0.167296484, 0.094396539, 0.0698259771, 0.11602433, 0.0542440601]]}
{'predictions': [[0.0860077, 0.0403077304, 0.0244597942, 0.0264858678, 0.27012223, 0.163445517, 0.0495519452, 0.056668926, 0.170305803, 0.112644464]]}
{'predictions': [[0.0458639339, 0.0367544517, 0.0322426595, 0.0237141084, 0.434011459, 0

## Monitor Prediction Requests and Responses using Kafka

In [11]:
from hops import kafka
from confluent_kafka import Producer, Consumer, KafkaError

Setup Kafka consumer and subscribe to the topic containing the prediction logs

In [12]:
TOPIC_NAME = serving.get_kafka_topic(SERVING_NAME)

config = kafka.get_kafka_default_config()
config['default.topic.config'] = {'auto.offset.reset': 'earliest'}
consumer = Consumer(config)
topics = [TOPIC_NAME]
consumer.subscribe(topics)

Read the Kafka Avro schema from Hopsworks and setup an Avro reader

In [13]:
json_schema = kafka.get_schema(TOPIC_NAME)
avro_schema = kafka.convert_json_schema_to_avro(json_schema)

Read messages from the Kafka topic, parse them with the Avro schema and print the results

In [15]:
PRINT_INSTANCES=False
PRINT_PREDICTIONS=True

for i in range(0, 5):
    msg = consumer.poll(timeout=1.0)
    if msg is not None:
        value = msg.value()
        try:
            event_dict = kafka.parse_avro_msg(value, avro_schema)
            
            print("serving: {}, version: {}, timestamp: {},\n"\
                  "        http_response_code: {}, model_server: {}, serving_tool: {}".format(
                       event_dict["modelName"],
                       event_dict["modelVersion"],
                       event_dict["requestTimestamp"],
                       event_dict["responseHttpCode"],
                       event_dict["modelServer"],
                       event_dict["servingTool"]))
            
            if PRINT_INSTANCES:
                print("instances: {}\n".format(event_dict["inferenceRequest"]))
            if PRINT_PREDICTIONS:
                print("predictions: {}\n".format(json.loads(event_dict["inferenceResponse"])["predictions"][0]))
                      
        except Exception as e:
            print("A message was read but there was an error parsing it")
            print(e)
    else:
        print("timeout.. no more messages to read from topic")

serving: mnist, version: 1, timestamp: 1623765611867,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.0570081174, 0.0655516908, 0.0384815671, 0.0236714445, 0.326048434, 0.208096072, 0.0829753354, 0.081687, 0.0640862733, 0.0523940176]

serving: mnist, version: 1, timestamp: 1623765612038,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.0410101265, 0.0716211, 0.0464931, 0.0277518351, 0.239403486, 0.129260719, 0.0967375934, 0.106829077, 0.165632, 0.075260967]

serving: mnist, version: 1, timestamp: 1623765612232,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: DEFAULT
predictions: [0.0452777408, 0.0637608, 0.0491008312, 0.0400233492, 0.346460611, 0.0908036157, 0.0619194657, 0.0982931, 0.114815064, 0.0895455331]

serving: mnist, version: 1, timestamp: 1623765612459,
        http_response_code: 200, model_server: TENSORFLOW_SERVING, serving_tool: