# Serving a TensorFlow Model as a REST Endpoint with TensorFlow Serving and SageMaker

We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?

If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints. 

<img src="img/sagemaker-architecture.png" width="80%" align="left">

In [1]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [2]:
%store -r training_job_name

In [3]:
print(training_job_name)

tensorflow-training-2020-08-22-19-35-37-636


# Copy the Model to the Notebook

In [4]:
!aws s3 cp s3://$bucket/$training_job_name/output/model.tar.gz ./model.tar.gz

download: s3://sagemaker-us-west-2-250107111215/tensorflow-training-2020-08-22-19-35-37-636/output/model.tar.gz to ./model.tar.gz


In [5]:
!mkdir -p ./model/
!tar -xvzf ./model.tar.gz -C ./model/

tensorboard/
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/saved_model/0/variables/
tensorflow/saved_model/0/variables/variables.data-00000-of-00001
tensorflow/saved_model/0/variables/variables.index
tensorflow/saved_model/0/assets/
tensorflow/saved_model/0/saved_model.pb
code/
code/inference.py
metrics/
metrics/confusion_matrix.png
transformers/
transformers/fine-tuned/
transformers/fine-tuned/tf_model.h5
transformers/fine-tuned/config.json


In [6]:
!saved_model_cli show --all --dir ./model/tensorflow/saved_model/0/

2020-08-22 21:39:57.931824: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/efa/lib:/opt/amazon/efa/lib:/opt/amazon/efa/lib64:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:
2020-08-22 21:39:57.931908: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib:/usr/local/cuda-10.0/ef

# Show `inference.py`

In [7]:
!pygmentize ./model/code/inference.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtensorflow==2.1.0[39;49;00m[33m'[39;49;00m])
subprocess.check_call([sys.executable, [33m'[39;49;00m[33m-m[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mpip[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minstall[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mtransformers==2.8.0[39;49;00m[33m'[39;49;00m])
[34mimport[39;49;00m [04m[36mtensorflow[39;49;00m [34mas[39;49;00m [04m[36mtf[39;49;00m
[34mfrom[39;49;00m [04m[36mtransformers[39;49;00m [34mimport[39;49;00m DistilBertTokenizer

classes=[[34m1[39;49;00m, [34m2[39;49;00m, [34m3[39;49;00m, [34m4[39;49;00m, [

# Deploy the Model
This will create a default `EndpointConfig` with a single model.  

The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.

_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_

https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

In [8]:
import time
timestamp = int(time.time())

tensorflow_model_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_model_name)

tensorflow-training-2020-08-22-19-35-37-636-tf-1598132403


In [9]:
from sagemaker.tensorflow.serving import Model

tensorflow_model = Model(name=tensorflow_model_name,
                         model_data='s3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name),
                         role=role,                
                         framework_version='2.1.0')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [10]:
tensorflow_endpoint_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_endpoint_name)

tensorflow-training-2020-08-22-19-35-37-636-tf-1598132403


In [11]:
tensorflow_model = tensorflow_model.deploy(endpoint_name=tensorflow_endpoint_name,
                                           initial_instance_count=1, # Should use >=2 for high(er) availability 
                                           instance_type='ml.m5.4xlarge', # requires enough disk space for tensorflow, transformers, and bert downloads
                                           wait=False)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


In [12]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(region, tensorflow_endpoint_name)))


In [13]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=tensorflow_endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

# Simulate a Prediction from an Application

In [14]:
import json
from sagemaker.tensorflow.serving import Predictor

predictor = Predictor(endpoint_name=tensorflow_endpoint_name,
                      sagemaker_session=sess,
                      content_type='application/json',
                      model_name='saved_model',
                      model_version=0)

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [15]:
reviews = ["This is great!"]

predicted_classes = predictor.predict(reviews)

for predicted_class, review in zip(predicted_classes, reviews):
    print('[Predicted Star Rating: {}]'.format(predicted_class), review)

[Predicted Star Rating: 5] This is great!


# Predict the `star_rating` with `review_body` Samples from our TSV's

In [16]:
import csv

df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', 
                         delimiter='\t', 
                         quoting=csv.QUOTE_NONE,
                         compression='gzip')
df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=100)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

(100, 3)

In [17]:
import pandas as pd

def predict(review_body):
    return predictor.predict([review_body])[0]

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews

Unnamed: 0,index,review_body,star_rating,predicted_class
0,95326,I have tried a few other lesser known pieces o...,4,2
1,93775,It will be great once I figure it out. I have...,4,2
2,86698,"Very thorough! Save me money, and that is a go...",4,2
3,38964,"Eh, not user friendly.",2,5
4,68516,I especially liked the way it allowed me<br />...,5,2
...,...,...,...,...
95,29051,Was done in 20 min. Great App!,5,5
96,84351,Saved me a lot of money doing it this way. No ...,5,2
97,68491,They have my account screwed up and after and ...,3,2
98,41472,I downloaded this product praying that the iss...,3,2


# Save for Next Notebook(s)

In [18]:
%store tensorflow_model_name

Stored 'tensorflow_model_name' (str)


In [19]:
%store tensorflow_endpoint_name 

Stored 'tensorflow_endpoint_name' (str)


In [20]:
%store

Stored variables and their in-db values:
autopilot_endpoint_name                          -> 'automl-dm-ep-22-16-47-12'
balance_dataset                                  -> True
best_candidate_tuning_job_name                   -> 0    tensorflow-training-200822-2113-002-05e3c8f5

comprehend_endpoint_arn                          -> 'arn:aws:comprehend:us-west-2:250107111215:documen
df_dataset_metrics                               ->         entity                   instance         
experiment_name                                  -> 'Amazon-Customer-Reviews-BERT-Experiment-159812492
header_train_s3_uri                              -> 's3://sagemaker-us-west-2-250107111215/data/amazon
max_seq_length                                   -> 64
noheader_train_s3_uri                            -> 's3://sagemaker-us-west-2-250107111215/data/amazon
prepare_trial_component_name                     -> 'TrialComponent-2020-08-22-193529-qhjz'
processed_test_data_s3_uri                       -> 's3://

# Delete Endpoint
To save cost, we should delete the endpoint.

In [21]:
# sm.delete_endpoint(
#      EndpointName=tensorflow_endpoint_name
# )

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();