# Making batch predictions using a TensorFlow model with Amazon SageMaker
This notebook shows how to make **batch predictions with TensorFlow on SageMaker**. Many customers have machine learning workloads that require a large number of predictions to be made reliably on a repeatable schedule. As compared to SageMaker's managed hosting service, compute capacity for batch predictions is spun up on demand and taken down upon completion of the batch. For large batch workloads, this represents significant cost savings over an always-on endpoint. 

Another benefit of SageMaker batch is that it allows data scientists can stay focused on creating the best models.
[SageMaker batch](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html) uses the same trained model easily across hosted endpoints and batch, with no need for expensive rewrites or infrastructure. Here is an overview picture showing how clusters of batch transformer instances are able to make predictions at scale. You provide the input via s3, and SageMaker returns the predictions via s3 as well. Note that the same input and output handlers you used for your SageMaker endpoint in the [previous lab](./2_tf_sm_image_classification_birds.ipynb) are used for batch predictions. Likewise, the same trained model works for both. TensorFlow Serving is leveraged in both cases.

![](./batch_overview.png)

Here is a [blog post](https://aws.amazon.com/blogs/machine-learning/performing-batch-inference-with-tensorflow-serving-in-amazon-sagemaker/) with additional detail on performing batch TensorFlow predictions with SageMaker.

## Setup
This notebook assumes you have already trained your TensorFlow model in the prior lab, which results in model artifacts being available in S3. Update the `training_job_name` variable below to refer to your specific training job, so that the notebook has a full s3 URI to the model artifacts. 

These same model artifacts were used for deployment in a SageMaker hosted endpoint in the previous lab. In this lab, we demonstrate batch predictions with the same trained model.

In [None]:
import boto3
import sagemaker
import tensorflow
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow import TensorFlow
from time import gmtime, strftime

training_job_name = 'mpr-tf-ic-2019-09-10-12-00-28-178'  ### Replace this with your job name from the previous lab

USE_GPU_INSTANCE     = True
TF_FRAMEWORK_VERSION = '1.14'

if TF_FRAMEWORK_VERSION == '1.12':
    TF_ACCOUNT_NUM = '520713654638'
    TF_CONTAINER_NAME = 'sagemaker-tensorflow-serving'
else:
    TF_ACCOUNT_NUM = '763104351884'
    TF_CONTAINER_NAME = 'tensorflow-inference'
## Here is the documentation for identifying TensorFlow SageMaker container images
##   https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html

To help with evaluating the batch prediction results, enter the list of class labels that your classifier was trained on.

In [None]:
class_name_list = ['013.Bobolink', '017.Cardinal']

## Create a Model for performing batch predictions

When we deployed the model in the previous lab to an Amazon SageMaker real time endpoint, we deployed to a CPU-based instance type.  Under the hood a CPU-based Amazon SageMaker Model object was created to wrap a CPU-based TFS container.  However, for Batch Transform on a large dataset, we would prefer to use full GPU instances.  To do this, we need to create another Model object that will utilize a GPU-based TFS container.  

Here we instantiate a Model object pointing to the trained model artifacts and referring to the TensorFlow Serving image that will be used to drive inference on that model.

In [None]:
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'DEMO-TF-image-classification-birds'
model_prefix = 'mpr-tf-ic-gpu'

model_artifacts = 's3://{}/{}/output/model.tar.gz'.format(bucket, training_job_name)
print(model_artifacts)

region_name = boto3.Session().region_name

In [None]:
client = boto3.client('sagemaker')

model_name = '{}-{}'.format(model_prefix, strftime("%d-%H-%M-%S", gmtime()))

if USE_GPU_INSTANCE:
    batch_instance_type = 'ml.p3.2xlarge'
    framework_image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}-gpu'.format(TF_ACCOUNT_NUM,
                                                                     region_name, TF_CONTAINER_NAME,
                                                                     TF_FRAMEWORK_VERSION)
    print('Ensuring tensorflow-gpu Python package is installed for batch inference')
    # publish a model_gpu.tar.gz with an appropriate requirements.txt for running on a gpu instance
    model_artifacts_base = model_artifacts[0:model_artifacts.index('model.tar.gz') - 1] # no slash at end
    !bash ./replace-requirements-gpu.sh $model_artifacts_base
    model_artifacts = model_artifacts_base + '/model_gpu.tar.gz'
else:
    batch_instance_type = 'ml.c5.4xlarge'
    framework_image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}-cpu'.format(TF_ACCOUNT_NUM,
                                                                     region_name, TF_CONTAINER_NAME,
                                                                     TF_FRAMEWORK_VERSION)
print('Using TensorFlow Serving image: {}'.format(framework_image))

In [None]:
tf_serving_model = Model(model_data=model_artifacts,
                         role=sagemaker.get_execution_role(),
                         image=framework_image,
                         framework_version=TF_FRAMEWORK_VERSION,
                         sagemaker_session=sess)

tf_serving_container = tf_serving_model.prepare_container_def(batch_instance_type)
model_params = {
    'ModelName': model_name,
    'Containers': [
        tf_serving_container
    ],
    'ExecutionRoleArn': sagemaker.get_execution_role()
}

client.create_model(**model_params)

SageMaker batch transformations require input to be specified in s3, and you need to provide an s3 output path where SageMaker will save the resulting predictions.

In [None]:
input_data_path  = 's3://{}/{}/train/'.format(bucket, prefix)
output_data_path = 's3://{}/{}/batch_predictions'.format(bucket, prefix)
print(output_data_path)

Before we run the batch transformation, we first remove prior batch prediction results. In production, you would likely instead tag the folder with a timestamp and retain the results from each run of the batch.

In [None]:
if input('Are you sure you want to remove the old batch predictions {}?'.format(output_data_path)) == 'yes':
    !aws s3 rm --quiet --recursive $output_data_path

Likewise, to interpret the results, we copy them down to our local folder. If we have done this before, we first remove the old results.

In [None]:
if input('Are you sure you want to remove the prior local batch predictions from ./batch_predictions') == 'yes':
    !rm -rf ./batch_predictions/*

## Launch the batch transformation job
Here we kick off the batch prediction job using the SageMaker Transformer object.

In [None]:
batch_instance_count = 2
concurrency = 100

transformer = sagemaker.transformer.Transformer(
    model_name = model_name,
    instance_count = batch_instance_count,
    instance_type = batch_instance_type,
    max_concurrent_transforms = concurrency,
    output_path = output_data_path,
    base_transform_job_name='tf-birds-image-transform')

transformer.transform(data = input_data_path, content_type = 'application/x-image')
transformer.wait()

To facilitate evaluation of the output, we download the results to our local folder.

In [None]:
!aws s3 cp  --quiet --recursive $output_data_path ./batch_predictions

Here we take a look at a sample output file. For each jpg file we passed to the batch transformation job, we get a corresponding `.jpg.out` file containing the json formatted output from the prediction.

In [None]:
import glob
filepaths = glob.glob('./batch_predictions/*/*')
sample_file = filepaths[0]
!cat $sample_file

Here we take the highest probability class prediction for each image and compare that to the actual class of the image (represented by its class subfolder).

In [None]:
import json
import re
import os
import glob
import numpy as np

total = 0
correct = 0

predicted = []
actual = []

for entry in glob.glob('batch_predictions/*/*'):
    try:
        actual_label = entry.split('/')[1]
        actual_index = class_name_list.index(actual_label)
        with open(entry, 'r') as f:
            jstr = json.load(f)
            results = [float('%.3f'%(item)) for sublist in jstr['predictions'] for item in sublist]
            class_index = np.argmax(np.array(results))
            predicted_label = class_name_list[class_index]
            predicted.append(class_index)
            actual.append(actual_index)
            is_correct = (predicted_label == actual_label) or False
            if is_correct:
                correct += 1
            total += 1
    except Exception as e:
        print(e)
        continue

In [None]:
print('Out of {} total images, accurate predictions were returned for {}'.format(total, correct))
accuracy = correct / total
print('Accuracy is {:.1%}'.format(accuracy))

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import itertools

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.GnBu):
    plt.figure(figsize=(7,7))
    plt.grid(False)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), 
                                  range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.gca().set_xticklabels(class_name_list)
    plt.gca().set_yticklabels(class_name_list)
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [None]:
from sklearn.metrics import confusion_matrix
def create_and_plot_confusion_matrix(actual, predicted):
    cnf_matrix = confusion_matrix(actual, np.asarray(predicted),labels=range(len(class_name_list)))
    plot_confusion_matrix(cnf_matrix, classes=range(len(class_name_list)))

In [None]:
create_and_plot_confusion_matrix(actual, predicted)