# Machine Learining model Deployment with tensorflow serving

## 1. Introduction

#### 1.1 What is Tensorflow Serving ?

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.


<img src="Images/Tensorflow_serving.png">

#### 1.2 Why use tensorflow serving ?

- Highly scalable model serving solution
- Works well for large models up to 2GB
- Production ready Model Serving
- Model Version Control
- Consistent export format
- REST and gRPC endpoints
- Docker images are available for CPU and GPU hardware

#### 1.3 When to use Tensorflow Serving ?

This diagram compares various current frameworks for productionizing the machine learning models. 

Each framework has it's benefits and drawbacks.


## Task 2: Load Data

#### Dataset
https://www.kaggle.com/snap/amazon-fine-food-reviews

- We will only use "Score" and "Text" columns

#### 2.1 Importing Libraries

In [1]:
import os
import time
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

#### 2.2 Loading dataset

In [None]:
df=pd.read_csv('Dataset/Reviews.csv')
print(df.shape)
df[['Score','Text']].head()

#### 2.3 Train Test Split

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.loc[:, df.columns != 'Score'], df['Score'], train_size=0.67, random_state=42)
X_train['Rating']=y_train
X_test['Rating']=y_test
X_train=X_train[['Rating','Text']]
X_test=X_test[['Rating','Text']]

# Already created
# X_train.to_csv('Dataset/train.csv')
# X_test.to_csv('Dataset/test.csv')

## Task 3. Preprocessing Data

Let's create function which can load and preprocess train - validation dataset

In [4]:
def load_dataset(file_path,num_samples):
    df=pd.read_csv(file_path,nrows=num_samples)    
    text=df['Text'].tolist()
    text=[str(t).encode('ascii','replace') for t in text]
    text=np.array(text,dtype=object)[:]
    
    labels=df['Rating'].tolist()
    labels=[1 if i >=4 else 0 if i==3 else -1 for i in labels]
    labels=np.array(pd.get_dummies(labels),dtype=int)[:]
    
    return labels,text

In [None]:
# Only for testing if the function is working 
tmp_labels,tmp_text=load_dataset('Dataset/train.csv',100)

## Task 4: Building the Classification Model using TF Hub

https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1

- We will use google's pre-trained model which is Token based text embedding trained on English Google News 200B corpus
- We will use **transfer learning** to change it for our use case by adding few dense layers and softmax layer for classification task

To this pre-trained model we will be :- 
- Adding 64 dense layers
- Adding softmax layer of 3 (positive sentiment, neural sentiment, negative sentiment)
- using categorical_crossentropy as loss function
- adam optimizer will be used
- metrics will be accuracy

**Let's write a fuction to get the get model**

In [5]:
def get_model():
    hub_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1", output_shape=[128], 
                           input_shape=[], dtype=tf.string, name='input', trainable=False)

    model = tf.keras.Sequential()
    model.add(hub_layer)
    model.add(tf.keras.layers.Dense(64, activation='relu'))
    model.add(tf.keras.layers.Dense(3, activation='softmax', name='output'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='Adam', metrics=['accuracy'])
    model.summary()
    return model

In [6]:
# Only to show how pretrained model generates output
# embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
# embeddings = embed(["whats is your name", "let's generate embeddings"])
# print(embeddings.shape)
# del embed, embeddings

**Note:** We see that for 2 input we generate two 128 embbedings

In [7]:
# Only for testing if the function is working 
#get_model()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (KerasLayer)           (None, 128)               124642688 
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
output (Dense)               (None, 3)                 195       
Total params: 124,651,139
Trainable params: 8,451
Non-trainable params: 124,642,688
_________________________________________________________________


<tensorflow.python.keras.engine.sequential.Sequential at 0x7fa26eab5f90>

## Task 5: Training Process


Let's create a fuction for training purposes:

- Epochs defaulted value 5, can be change by parameter (eg for CV or for grid search)
- Batch Size defaulted to 32
- Train_file/ Test_file for sentiment analysis


In [6]:
def train(EPOCHS=5, BATCH_SIZE=32, TRAIN_FILE='Dataset/train.csv', VAL_FILE='Dataset/test.csv'):
    WORKING_DIR = os.getcwd() #use to specify model checkpoint path
    print("Loading training/validation data ...")
    y_train, x_train = load_dataset(TRAIN_FILE, num_samples=100000)
    y_val, x_val = load_dataset(VAL_FILE, num_samples=10000)

    print("Training the model ...")
    model = get_model()
    model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=1,
              validation_data=(x_val, y_val),
              callbacks=[tf.keras.callbacks.ModelCheckpoint(os.path.join(WORKING_DIR,
                                                                         'model_checkpoint'),
                                                            monitor='val_loss', verbose=1,
                                                            save_best_only=True,
                                                            save_weights_only=False,
                                                            mode='auto')])
    return model

## Task 6: Train and Export Model as Protobuf

In [7]:
def export_model(model, base_path="Sentiment_Model/"):
    path = os.path.join(base_path, str(int(time.time())))
    tf.saved_model.save(model, path)

if __name__== '__main__':
    model = train()
    export_model(model)

Loading training/validation data ...
Training the model ...
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (KerasLayer)           (None, 128)               124642688 
_________________________________________________________________
dense (Dense)                (None, 64)                8256      
_________________________________________________________________
output (Dense)               (None, 3)                 195       
Total params: 124,651,139
Trainable params: 8,451
Non-trainable params: 124,642,688
_________________________________________________________________
Epoch 1/5
Epoch 00001: val_loss improved from inf to 0.51023, saving model to /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


Epoch 2/5
Epoch 00002: val_loss improved from 0.51023 to 0.49768, saving model to /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint
INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


Epoch 3/5
Epoch 00003: val_loss improved from 0.49768 to 0.48834, saving model to /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint
INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


Epoch 4/5
Epoch 00004: val_loss did not improve from 0.48834
Epoch 5/5
Epoch 00005: val_loss improved from 0.48834 to 0.48816, saving model to /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint
INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


INFO:tensorflow:Assets written to: /Users/maitreytalware/Documents/GitHub/ML-Model-Deployment-with-TensorFlow-Serving/model_checkpoint/assets


INFO:tensorflow:Assets written to: Sentiment_Model/1601531800/assets


INFO:tensorflow:Assets written to: Sentiment_Model/1601531800/assets


## Task 7: Testing Model

#### Negative Review

In [9]:
test_sentence = "horrible book, waste of time"
model.predict([test_sentence])

array([[0.72817457, 0.01399327, 0.2578322 ]], dtype=float32)

#### Positive Review

In [10]:
test_sentence = "Awesome book."
model.predict([test_sentence])

array([[0.00663852, 0.00293764, 0.99042386]], dtype=float32)

#### Neural Review

## Task 8: Tensorflow Serving with Docker

`docker run -p 8500:8500 \
            -p 8501:8501 \
            --mount type=bind,\
            source=/path/Sentiment_Model/,\
            target=/models/Sentiment_Model \
            -e MODEL_NAME=Sentiment_Model \
            -t tensorflow/serving`

##### Support for gRPC and REST

- TensorFlow Serving supports
    - Remote Procedure Protocal (gRPC)
    - Representational State Transfer (REST)
- Consistent API structures
- Server supports both standards simultaneously
- Default ports:
    - RPC: 8500
    - REST: 8501

## Task 9: Setup a REST Client to Perform Model Predictions

#### Predictions via REST

- Standard HTTP POST requests
- Response is a JSON body with the prediction
- Request from the default or specific model

Default URI scheme:

`http://{HOST}:{PORT}/v1/models/{MODEL_NAME}`

Specific model versions:

`http://{HOST}:{PORT}/v1/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]:predict`

In [27]:
%%writefile tf_serving_rest_client.py
import json
import requests
import sys

def get_rest_url(model_name, host='127.0.0.1', port='8501', verb='predict', version=None):
    """ generate the URL path"""
    url = "http://{host}:{port}/v1/models/{model_name}".format(host=host, port=port, model_name=model_name)
    if version:
        url += 'versions/{version}'.format(version=version)
    url += ':{verb}'.format(verb=verb)
    return url


def get_model_prediction(model_input, model_name='Sentiment_Model', signature_name='serving_default'):
    """ no error handling at all, just poc"""

    url = get_rest_url(model_name)
    #In the row format, inputs are keyed to instances key in the JSON request.
    #When there is only one named input, specify the value of instances key to be the value of the input:
    data = {"instances": [model_input]}
    
    rv = requests.post(url, data=json.dumps(data))
    if rv.status_code != requests.codes.ok:
        rv.raise_for_status()
    
    return rv.json()['predictions']

if __name__ == '__main__':

    print("\nGenerate REST url ...")
    url = get_rest_url(model_name='Sentiment_Model')
    print(url)
    
    while True:
        print("\nEnter an Sentiment review [:q for Quit]")
        if sys.version_info[0] <= 3:
            sentence = input()
        if sentence == ':q':
            break
        model_input = sentence
        model_prediction = get_model_prediction(model_input)
        print("The model predicted ...")
        print(model_prediction)

Writing tf_serving_rest_client.py


## Task 10: Setup a gRPC Client to Perform Model Predictions

Modified from [https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py#L152)

#### Predictions via gRPC

More sophisticated client-server connections

- Prediction data has to be converted to the Protobuf format
- Request types have designated types, e.g. float, int, bytes
- Payloads need to be converted to base64
- Connect to the server via gRPC stubs

#### gRPC vs REST: When to use which API standard

- Rest is easy to implement and debug
- RPC is more network efficient, smaller payloads
- RPC can provide much faster inferences!

In [22]:
import sys
import grpc
from grpc.beta import implementations
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2, get_model_metadata_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc