<h2 align="center"> Deploy Models with TensorFlow Serving and Docker</h2>

## Load and Preprocess Data

In [1]:
#%%writefile -a train.py
import os
import time
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

In [2]:
#Souce: https://www.kaggle.com/snap/amazon-fine-food-reviews/data
!head -n 2 train.csv

Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
184502,B001BCVY4W,A1JMR1N9NBYJ1X,Mad Ethyl Flint,0,0,4,1228176000,Doesn't look like catfood!,"When you first open the can, it looks like something you would eat.  And no catfood smell! Nice sized chunks of chicken and vegetables in a lot of gravy.<br /><br />That being said, Ms Casiopia lapped up all the gravy and left the rest.  This however is not the product's fault as she has done this before with other catfoods<br /><br />I would have given it 5 stars, but since I won't be purchasing it, I gave it 4.  If your cat will eat chunks and vegetables, this product is for you.<br /><br />I have donated the remainder of the package to a less fortunate friend.<br /><br />Thank you."


In [3]:
#%%writefile -a train.py

def load_dataset(file_path, num_samples):
    df = pd.read_csv(file_path, usecols=[6, 9], nrows=num_samples)
    df.columns = ['rating', 'title']

    text = df['title'].tolist()
    text = [str(t).encode('ascii', 'replace') for t in text]
    text = np.array(text, dtype=object)[:] # : is for returning all the entries
    #creating 1D array of strings
    labels = df['rating'].tolist()
    labels = [1 if i>=4 else 0 if i==3 else -1 for i in labels] #3 Classes instead of 5 #1 is for positive review
    labels = np.array(pd.get_dummies(labels), dtype=int)[:] #one-hot encoding

    return labels, text

In [4]:
tmp_labels, tmp_text = load_dataset('train.csv', 100) #loading only small no. of samples
tmp_text.shape
#tmp_labels

(100,)

## Build the Classification Model using Keras and TF Hub

NLP Pre-processing steps:

-Must Do:

    Noise removal
    Lowercasing (can be task dependent in some cases)
-Should Do:

    Simple normalization — (e.g. standardize near identical words)
-Task Dependent:

    Advanced normalization (e.g. addressing out-of-vocabulary words)
    Stop-word removal
    Stemming / lemmatization
    Text enrichment / augmentation

Search for: tokenization vs lemmatization vs stemming

Building a clasification model that uses a pre-trained text embedding layer and accepting input as raw text and does the pre-processing itself by splitting the text on spaces. Don't need much of a additional pre-processing like tokenization/stemming/removing stopwords and so on because the module that we use here from TF Hub does all that for us. All it needs is a batch of sentences in a 1D tensor strings as an input.

TF Hub is a library for publication, discovery and consumption of reusable parts of ML models.

In [5]:
#%%writefile -a train.py
##https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1
##https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1
def get_model():
    hub_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1", output_shape=[50], 
                           input_shape=[], dtype=tf.string, name='input', trainable=False) #weights are not updated during propagation as we want to use word embeddings that have been already learnt and then just add a tiny fully connected layer and add a classification head on the top of that.

    model = tf.keras.Sequential()
    model.add(hub_layer)
    model.add(tf.keras.layers.Dense(16, activation='relu'))
    model.add(tf.keras.layers.Dense(3, activation='softmax', name='output'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='Adam', metrics=['accuracy'])
    model.summary()
    return model

In [6]:
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1")
embeddings = embed(["this is a test", "look at the embeddings"])
embeddings #embedding vectors of length 50

<tf.Tensor: shape=(2, 50), dtype=float32, numpy=
array([[ 0.05650096,  0.2567145 ,  0.24404189,  0.14395264, -0.05569138,
        -0.10513686,  0.09544804,  0.3080969 , -0.218672  , -0.03048538,
        -0.19036277,  0.01005417,  0.11541115, -0.14860378,  0.03914931,
        -0.2561884 , -0.15442336,  0.12836292,  0.0469152 , -0.1500514 ,
        -0.13068351, -0.01958708,  0.09192695,  0.1208052 , -0.12291992,
        -0.04548305, -0.3679261 ,  0.05125156,  0.09797382, -0.10217863,
        -0.1965521 ,  0.15523128, -0.05881735, -0.16426983,  0.06646369,
         0.05789638,  0.15421619, -0.24014738,  0.11075415, -0.10756174,
        -0.01679449, -0.01877424,  0.18602087,  0.2623015 , -0.3829217 ,
        -0.34895867, -0.0868978 ,  0.02295742,  0.03787762, -0.02646483],
       [-0.01533648,  0.2517981 ,  0.15771465,  0.10011643, -0.03027005,
        -0.09655963,  0.10035348, -0.13405894, -0.13515756,  0.15999079,
        -0.0257801 ,  0.01482286,  0.17336626,  0.02416893, -0.02589497,
 

## Define Training Procedure

In [7]:
#%%writefile -a train.py

def train(EPOCHS=5, BATCH_SIZE=32, TRAIN_FILE='train.csv', VAL_FILE='test.csv'):
    WORKING_DIR = os.getcwd() #use to specify model checkpoint path
    print("Loading training/validation data ...")
    y_train, x_train = load_dataset(TRAIN_FILE, num_samples=100000)
    y_val, x_val = load_dataset(VAL_FILE, num_samples=10000)

    print("Training the model ...")
    model = get_model()
    model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=1,
              validation_data=(x_val, y_val),
              callbacks=[tf.keras.callbacks.ModelCheckpoint(os.path.join(WORKING_DIR,
                                                                         'model_checkpoint'),
                                                            monitor='val_loss', verbose=1,
                                                            save_best_only=True,
                                                            save_weights_only=False,
                                                            mode='auto')]) #set the mode (val loss) to min or set as auto
    return model

## Train and Export Model as Protobuf

In [8]:
#%%writefile -a train.py

def export_model(model, base_path="amazon_review/"):
    path = os.path.join(base_path, str(int(time.time())))
    tf.saved_model.save(model, path)


if __name__== '__main__':
    model = train()
    export_model(model)

Loading training/validation data ...
Training the model ...
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (KerasLayer)           (None, 50)                48190600  
_________________________________________________________________
dense (Dense)                (None, 16)                816       
_________________________________________________________________
output (Dense)               (None, 3)                 51        
Total params: 48,191,467
Trainable params: 867
Non-trainable params: 48,190,600
_________________________________________________________________
Epoch 1/5

Epoch 00001: val_loss improved from inf to 0.57044, saving model to C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint
INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


Epoch 2/5

Epoch 00002: val_loss improved from 0.57044 to 0.56548, saving model to C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint
INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


Epoch 3/5

Epoch 00003: val_loss improved from 0.56548 to 0.56219, saving model to C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint
INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


Epoch 4/5

Epoch 00004: val_loss improved from 0.56219 to 0.56055, saving model to C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint
INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


INFO:tensorflow:Assets written to: C:\Users\KIIT\Everything\Deploy-Deep-Learning-Models-TF-Serving-Docker\model_checkpoint\assets


Epoch 5/5

Epoch 00005: val_loss did not improve from 0.56055

FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.



FOR DEVS: If you are overwriting _tracking_metadata in your class, this property has been used to save metadata in the SavedModel. The metadta field will be deprecated soon, so please move the metadata to a different file.


INFO:tensorflow:Assets written to: amazon_review/1624968631\assets


INFO:tensorflow:Assets written to: amazon_review/1624968631\assets


## Test Model

#### Negative Review:

In [9]:
test_sentence = "horrible book, waste of time"
model.predict([test_sentence])

array([[0.47671187, 0.07938445, 0.44390374]], dtype=float32)

#### Positive Review:

In [10]:
test_sentence = "Awesome product."
model.predict([test_sentence])

array([[0.02953439, 0.03649703, 0.9339686 ]], dtype=float32)

Model is saved and exported to the subfolder of amazon_review folder, where the subfolders name is just the timestamp for when the model was exported. So, any future models we train will also be dropped or exported into the amazon_review parent folder meaning that there will be new timestamp folders within this directory here.

## TensorFlow Serving with Docker

`docker pull tensorflow/serving`

Exposing two ports 8500 & 8501

`docker run -p 8500:8500 \ For gRPC clients
            -p 8501:8501 \ For rest end points
            --mount type=bind,\
            source=amazon_review/,\
            target=/models/amazon_review \
            -e MODEL_NAME=amazon_review \
            -t tensorflow/serving`

## Setup a REST Client to Perform Model Predictions

#### Perform Model Prediction

##### Support for gRPC and REST

- TensorFlow Serving supports
    - Remote Procedure Protocal (gRPC)
    - Representational State Transfer (REST)
- Consistent API structures
- Server supports both standards simultaneously
- Default ports:
    - RPC: 8500
    - REST: 8501

#### Predictions via REST

- Standard HTTP POST requests
- Response is a JSON body with the prediction
- Request from the default or specific model

Default URI scheme:

`http://{HOST}:{PORT}/v1/models/{MODEL_NAME}`

Specific model versions:

`http://{HOST}:{PORT}/v1/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]:predict`

In [12]:
%%writefile tf_serving_rest_client.py
import json
import requests
import sys

def get_rest_url(model_name, host='127.0.0.1', port='8501', verb='predict', version=None):
    """ generate the URL path"""
    url = "http://{host}:{port}/v1/models/{model_name}".format(host=host, port=port, model_name=model_name)
    if version:
        url += 'versions/{version}'.format(version=version)
    url += ':{verb}'.format(verb=verb)
    return url


def get_model_prediction(model_input, model_name='amazon_review', signature_name='serving_default'):
    """ no error handling at all, just poc"""

    url = get_rest_url(model_name)
    #In the row format, inputs are keyed to instances key in the JSON request.
    #When there is only one named input, specify the value of instances key to be the value of the input:
    data = {"instances": [model_input]}
    
    rv = requests.post(url, data=json.dumps(data))
    if rv.status_code != requests.codes.ok:
        rv.raise_for_status()
    
    return rv.json()['predictions']

if __name__ == '__main__':

    print("\nGenerate REST url ...")
    url = get_rest_url(model_name='amazon_review')
    print(url)
    
    while True:
        print("\nEnter an Amazon review [:q for Quit]")
        if sys.version_info[0] <= 3:
            sentence = input()
        if sentence == ':q':
            break
        model_input = sentence
        model_prediction = get_model_prediction(model_input)
        print("The model predicted ...")
        print(model_prediction)

Overwriting tf_serving_rest_client.py


## Setup a gRPC Client to Perform Model Predictions

Modified from [https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.py#L152)

#### Predictions via gRPC

More sophisticated client-server connections

- Prediction data has to be converted to the Protobuf format
- Request types have designated types, e.g. float, int, bytes
- Payloads need to be converted to base64
- Connect to the server via gRPC stubs

#### gRPC vs REST: When to use which API standard

- Rest is easy to implement and debug
- RPC is more network efficient, smaller payloads
- RPC can provide much faster inferences!

In [13]:
%%writefile tf_serving_grpc_client.py
import sys
import grpc
from grpc.beta import implementations
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2, get_model_metadata_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc


def get_stub(host='127.0.0.1', port='8500'):
    channel = grpc.insecure_channel('127.0.0.1:8500') 
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
    return stub


def get_model_prediction(model_input, stub, model_name='amazon_review', signature_name='serving_default'):
    """ no error handling at all, just poc"""
    request = predict_pb2.PredictRequest()
    request.model_spec.name = model_name
    request.model_spec.signature_name = signature_name
    request.inputs['input_input'].CopyFrom(tf.make_tensor_proto(model_input))
    response = stub.Predict.future(request, 5.0)  # 5 seconds
    return response.result().outputs["output"].float_val


def get_model_version(model_name, stub):
    request = get_model_metadata_pb2.GetModelMetadataRequest()
    request.model_spec.name = 'amazon_review'
    request.metadata_field.append("signature_def")
    response = stub.GetModelMetadata(request, 10)
    # signature of loaded model is available here: response.metadata['signature_def']
    return response.model_spec.version.value

if __name__ == '__main__':
    print("\nCreate RPC connection ...")
    stub = get_stub()
    while True:
        print("\nEnter an Amazon review [:q for Quit]")
        if sys.version_info[0] <= 3:
            sentence = raw_input() if sys.version_info[0] < 3 else input()
        if sentence == ':q':
            break
        model_input = [sentence]
        model_prediction = get_model_prediction(model_input, stub)
        print("The model predicted ...")
        print(model_prediction)

Overwriting tf_serving_grpc_client.py
