# Serving a TensorFlow Model as a REST Endpoint with TensorFlow Serving and SageMaker

We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?

If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints. 

<img src="img/sagemaker-architecture.png" width="80%" align="left">

In [None]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [None]:
%store -r training_job_name

In [None]:
print(training_job_name)

# Copy the Model to the Notebook

In [None]:
!aws s3 cp s3://$bucket/$training_job_name/output/model.tar.gz ./model.tar.gz

In [None]:
!mkdir -p ./model/
!tar -xvzf ./model.tar.gz -C ./model/

In [None]:
!saved_model_cli show --all --dir ./model/tensorflow/saved_model/0/

# Show `inference.py`

In [None]:
!pygmentize ./model/code/inference.py

# Deploy the Model
This will create a default `EndpointConfig` with a single model.  

The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.

_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_

https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

In [None]:
import time
timestamp = int(time.time())

tensorflow_model_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_model_name)

In [None]:
from sagemaker.tensorflow.serving import Model

tensorflow_model = Model(name=tensorflow_model_name,
                         model_data='s3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name),
                         role=role,                
                         framework_version='2.1.0')

In [None]:
tensorflow_endpoint_name = '{}-{}-{}'.format(training_job_name, 'tf', timestamp)

print(tensorflow_endpoint_name)

In [None]:
tensorflow_model = tensorflow_model.deploy(endpoint_name=tensorflow_endpoint_name,
                                           initial_instance_count=1, # Should use >=2 for high(er) availability 
                                           instance_type='ml.m5.4xlarge', # requires enough disk space for tensorflow, transformers, and bert downloads
                                           wait=False)

In [None]:
from IPython.core.display import display, HTML

display(HTML('<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(region, tensorflow_endpoint_name)))


In [None]:
waiter = sm.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=tensorflow_endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

# Simulate a Prediction from an Application

In [None]:
import json
from sagemaker.tensorflow.serving import Predictor

predictor = Predictor(endpoint_name=tensorflow_endpoint_name,
                      sagemaker_session=sess,
                      content_type='application/json',
                      model_name='saved_model',
                      model_version=0)

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [None]:
reviews = ["This is great!"]

predicted_classes = predictor.predict(reviews)

for predicted_class, review in zip(predicted_classes, reviews):
    print('[Predicted Star Rating: {}]'.format(predicted_class), review)

# Predict the `star_rating` with `review_body` Samples from our TSV's

In [None]:
import csv

df_reviews = pd.read_csv('./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz', 
                         delimiter='\t', 
                         quoting=csv.QUOTE_NONE,
                         compression='gzip')
df_sample_reviews = df_reviews[['review_body', 'star_rating']].sample(n=100)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

In [None]:
import pandas as pd

def predict(review_body):
    return predictor.predict([review_body])[0]

df_sample_reviews['predicted_class'] = df_sample_reviews['review_body'].map(predict)
df_sample_reviews

# Save for Next Notebook(s)

In [None]:
%store tensorflow_model_name

In [None]:
%store tensorflow_endpoint_name 

In [None]:
%store

# Delete Endpoint
To save cost, we should delete the endpoint.

In [None]:
# sm.delete_endpoint(
#      EndpointName=tensorflow_endpoint_name
# )

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();