# Train once and deploy everywhere on cloud and edge devices with SageMaker Neo

In this workshop, you will learn how to use Amazon SageMaker Neo Deep Learning Compiler (DLC) to compile your trained TensorFlow models and deploy them on the cloud or on edge devices. You will learn how Neo DLC optimizes the trained models by improving efficiency and reducing memory footprint of the compiled model. You will also learn how Neo runtime abstracts the underlying hardware and allows running compiled model on multiple hardware devices such as Intel Xeon/Atom, NVIDIA Jetson, ARM, and many more. You will gain experience in improving runtime performance by 2x and reducing memory footprint by 10x using SageMaker Neo.

With Amazon SageMaker, you can package your own algorithms that can then be trained and deployed in the SageMaker environment. This notebook guides you through an example using TensorFlow that shows you how to build a Docker container for SageMaker and use it for training and inference. By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. 

1. [Building your own TensorFlow container](#Building-your-own-tensorflow-container)
1. [Part 1 - Packaging and Uploading your Algorithm for use with Amazon SageMaker](#Part-1:-Packaging-and-Uploading-your-Algorithm-for-use-with-Amazon-SageMaker)
    1. [An overview of Docker](#An-overview-of-Docker)
    1. [How Amazon SageMaker runs your Docker container](#How-Amazon-SageMaker-runs-your-Docker-container)
      1. [Running your container during training](#Running-your-container-during-training)
        1. [The input](#The-input)
        1. [The output](#The-output)
      1. [Running your container during hosting](#Running-your-container-during-hosting)
    1. [The parts of the sample container](#The-parts-of-the-sample-container)
    1. [The Dockerfile](#The-Dockerfile)
    1. [Building and registering the container](#Building-and-registering-the-container)
  1. [Testing your algorithm on your local machine](#Testing-your-algorithm-on-your-local-machine)
1. [Part 2 - Training and Hosting your Algorithm in Amazon SageMaker](#Part-2:-Training-and-Hosting-your-Algorithm-in-Amazon-SageMaker)
  1. [Set up the environment](#Set-up-the-environment)
  1. [Create the session](#Create-the-session)
  1. [Upload the data for training](#Upload-the-data-for-training)
  1. [Training On SageMaker](#Training-on-SageMaker)
  1. [Optional cleanup](#Optional-cleanup)  
1. [Part 3 - Compiling models for various targets using Sagemaker Neo](#Part-3:-Compiling-models-for-various-targets-using-Sagemaker-Neo)
1. [Reference](#Reference)

## _or_ 
### Just [let me see the code](#The-Dockerfile)!


In this example we show how to package a custom TensorFlow container with a Python example which works with the CIFAR-10 dataset and uses TensorFlow Serving for inference. However, different inference solutions other than TensorFlow Serving can be used by modifying the docker container.

In this example, we use a single image to support training and hosting. This simplifies the procedure because we only need to manage one image for both tasks. Sometimes you may want separate images for training and hosting because they have different requirements. In this case, separate the parts discussed below into separate Dockerfiles and build two images. Choosing whether to use a single image or two images is a matter of what is most convenient for you to develop and manage.

If you're only using Amazon SageMaker for training or hosting, but not both, only the functionality used needs to be built into your container.

[CIFAR-10]: http://www.cs.toronto.edu/~kriz/cifar.html


# Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker

### An overview of Docker

If you're familiar with Docker already, you can skip ahead to the next section.

For many data scientists, Docker containers are a new technology. But they are not difficult and can significantly simply the deployment of your software packages. 

Docker provides a simple way to package arbitrary code into an _image_ that is totally self-contained. Once you have an image, you can use Docker to run a _container_ based on that image. Running a container is just like running a program on the machine except that the container creates a fully self-contained environment for the program to run. Containers are isolated from each other and from the host environment, so the way your program is set up is the way it runs, no matter where you run it.

Docker is more powerful than environment managers like conda or virtualenv because (a) it is completely language independent and (b) it comprises your whole operating environment, including startup commands, and environment variable.

A Docker container is like a virtual machine, but it is much lighter weight. For example, a program running in a container can start in less than a second and many containers can run simultaneously on the same physical or virtual machine instance.

Docker uses a simple file called a `Dockerfile` to specify how the image is assembled. An example is provided below. You can build your Docker images based on Docker images built by yourself or by others, which can simplify things quite a bit.

Docker has become very popular in programming and devops communities due to its flexibility and its well-defined specification of how code can be run in its containers. It is the underpinning of many services built in the past few years, such as [Amazon ECS].

Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms.

In Amazon SageMaker, Docker containers are invoked in a one way for training and another, slightly different, way for hosting. The following sections outline how to build containers for the SageMaker environment.

Some helpful links:

* [Docker home page](http://www.docker.com)
* [Getting started with Docker](https://docs.docker.com/get-started/)
* [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
* [`docker run` reference](https://docs.docker.com/engine/reference/run/)

[Amazon ECS]: https://aws.amazon.com/ecs/

### How Amazon SageMaker runs your Docker container

Because you can run the same image in training or hosting, Amazon SageMaker runs your container with the argument `train` or `serve`. How your container processes this argument depends on the container.

* In this example, we don't define an `ENTRYPOINT` in the Dockerfile so Docker runs the command [`train` at training time](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html) and [`serve` at serving time](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html). In this example, we define these as executable Python scripts, but they could be any program that we want to start in that environment.
* If you specify a program as an `ENTRYPOINT` in the Dockerfile, that program will be run at startup and its first argument will be `train` or `serve`. The program can then look at that argument and decide what to do.
* If you are building separate containers for training and hosting (or building only for one or the other), you can define a program as an `ENTRYPOINT` in the Dockerfile and ignore (or verify) the first argument passed in. 

#### Running your container during training

When Amazon SageMaker runs training, your `train` script is run, as in a regular Python program. A number of files are laid out for your use, under the `/opt/ml` directory:

    /opt/ml
    ├── input
    │   ├── config
    │   │   ├── hyperparameters.json
    │   │   └── resourceConfig.json
    │   └── data
    │       └── <channel_name>
    │           └── <input data>
    ├── model
    │   └── <model files>
    └── output
        └── failure

##### The input

* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training.
* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to CreateTrainingJob but it's generally important that channels match algorithm expectations. The files for each channel are copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure. 
* `/opt/ml/input/data/<channel_name>_<epoch_number>` (for Pipe mode) is the pipe for a given epoch. Epochs start at zero and go up by one each time you read them. There is no limit to the number of epochs that you can run, but you must close each pipe before reading the next epoch.

##### The output

* `/opt/ml/model/` is the directory where you write the model that your algorithm generates. Your model can be in any format that you want. It can be a single file or a whole directory tree. SageMaker packages any files in this directory into a compressed tar archive file. This file is made available at the S3 location returned in the `DescribeTrainingJob` result.
* `/opt/ml/output` is a directory where the algorithm can write a file `failure` that describes why the job failed. The contents of this file are returned in the `FailureReason` field of the `DescribeTrainingJob` result. For jobs that succeed, there is no reason to write this file as it is ignored.

#### Running your container during hosting

Hosting has a very different model than training because hosting is reponding to inference requests that come in via HTTP. In this example, we use [TensorFlow Serving](https://www.tensorflow.org/serving/), however the hosting solution can be customized. One example is the [Python serving stack within the scikit learn example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb).

Amazon SageMaker uses two URLs in the container:

* `/ping` receives `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these are passed in as well. 

The container has the model files in the same place that they were written to during training:

    /opt/ml
    └── model
        └── <model files>



### The parts of the sample container

The `container` directory has all the components you need to package the sample algorithm for Amazon SageMager:

    .
    ├── Dockerfile
    ├── build_and_push.sh
    └── cifar10
        ├── cifar10.py
        ├── resnet_model.py
        ├── nginx.conf
        ├── serve
        ├── train

Let's discuss each of these in turn:

* __`Dockerfile`__ describes how to build your Docker container image. More details are provided below.
* __`build_and_push.sh`__ is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.
* __`cifar10`__ is the directory which contains the files that are installed in the container.

In this simple application, we install only five files in the container. You may only need that many, but if you have many supporting routines, you may wish to install more. These five files show the standard structure of our Python containers, although you are free to choose a different toolset and therefore could have a different layout. If you're writing in a different programming language, you will have a different layout depending on the frameworks and tools you choose.

The files that we put in the container are:

* __`cifar10.py`__ is the program that implements our training algorithm.
* __`resnet_model.py`__ is the program that contains our Resnet model. 
* __`nginx.conf`__ is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
* __`serve`__ is the program started when the container is started for hosting. It simply launches nginx and loads your exported model with TensorFlow Serving.
* __`train`__ is the program that is invoked when the container is run for training. Our implementation of this script invokes cifar10.py with our our hyperparameter values retrieved from /opt/ml/input/config/hyperparameters.json. The goal for doing this is to avoid having to modify our training algorithm program.

In summary, the two files you probably want to change for your application are `train` and `serve`.

### The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

For the Python science stack, we start from an official TensorFlow docker image and run the normal tools to install TensorFlow Serving. Then we add the code that implements our specific algorithm to the container and set up the right environment for it to run under.

Let's look at the Dockerfile for this example.

In [1]:
!cat container/Dockerfile

# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.

# For more information on creating a Dockerfile
# https://docs.docker.com/compose/gettingstarted/#step-2-create-a-dockerfile
FROM tensorflow/tensorflow:1.8.0-py3

RUN apt-get update && apt-get install -y --no-install-recommends nginx curl

# Download TensorFlow Serving
# https://www.tensorflow.org/serving/setup#installing_the_modelserver
RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-

### Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh sagemaker-tf-cifar10-example` to build the image `sagemaker-tf-cifar10-example`. 

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

In [2]:
%%sh

# The name of our algorithm
algorithm_name=sagemaker-tf-cifar10-example

cd container

chmod +x cifar10/train
chmod +x cifar10/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Sending build context to Docker daemon  36.86kB
Step 1/8 : FROM tensorflow/tensorflow:1.8.0-py3
 ---> a83a3dd79ff9
Step 2/8 : RUN apt-get update && apt-get install -y --no-install-recommends nginx curl
 ---> Using cache
 ---> fcdd2d29cac1
Step 3/8 : RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list
 ---> Using cache
 ---> f859ad33ea29
Step 4/8 : RUN curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
 ---> Using cache
 ---> a26a30aa259f
Step 5/8 : RUN apt-get update && apt-get install tensorflow-model-server
 ---> Using cache
 ---> 306841e0cf88
Step 6/8 : ENV PATH="/opt/ml/code:${PATH}"
 ---> Using cache
 ---> 975565089244
Step 7/8 : COPY /cifar10 /opt/ml/code
 ---> Using cache
 ---> 05f51f0abf7e
Step 8/8 : WORKDIR /opt/ml/code
 ---> Using cache
 ---> f988e4ede1a9

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



## Testing your algorithm on your local machine

When you're packaging you first algorithm to use with Amazon SageMaker, you probably want to test it yourself to make sure it's working correctly. We use the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to test both locally and on SageMaker. For more examples with the SageMaker Python SDK, see [Amazon SageMaker Examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk). In order to test our algorithm, we need our dataset.

## Download the CIFAR-10 dataset
Our training algorithm is expecting our training data to be in the file format of [TFRecords](https://www.tensorflow.org/guide/datasets), which is a simple record-oriented binary format that many TensorFlow applications use for training data.
Below is a Python script adapted from the [official TensorFlow CIFAR-10 example](https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator), which downloads the CIFAR-10 dataset and converts them into TFRecords.

In [8]:
!pip install tensorflow==1.8.0

Collecting tensorflow==1.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/22/c6/d08f7c549330c2acc1b18b5c1f0f8d9d2af92f54d56861f331f372731671/tensorflow-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (49.1MB)
[K     |████████████████████████████████| 49.1MB 451kB/s  eta 0:00:01
Collecting tensorboard<1.9.0,>=1.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/59/a6/0ae6092b7542cfedba6b2a1c9b8dceaf278238c39484f3ba03b03f07803c/tensorboard-1.8.0-py3-none-any.whl (3.1MB)
[K     |████████████████████████████████| 3.1MB 46.5MB/s eta 0:00:01
Collecting bleach==1.5.0
  Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting html5lib==0.9999999
[?25l  Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
[K     |████████████████████████████████| 890kB 48.3MB/s eta 0:0

In [9]:
%%time
! python utils/generate_cifar10_tfrecords.py --data-dir=/tmp/cifar-10-data

Download from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz and extract.
Generating /tmp/cifar-10-data/train.tfrecords
Generating /tmp/cifar-10-data/validation.tfrecords
Generating /tmp/cifar-10-data/eval.tfrecords
Removing original files.
Done!
CPU times: user 131 ms, sys: 34.8 ms, total: 165 ms
Wall time: 8.45 s


### There should be three tfrecords. (eval, train, validation)

In [10]:
! ls /tmp/cifar-10-data

eval.tfrecords	train.tfrecords  validation.tfrecords


## SageMaker Python SDK Local Training
To represent our training, we use the Estimator class, which needs to be configured in five steps. 
1. IAM role - our AWS execution role
2. train_instance_count - number of instances to use for training.
3. train_instance_type - type of instance to use for training. For training locally, we specify `local`.
4. image_name - our custom TensorFlow Docker image we created.
5. hyperparameters - hyperparameters we want to pass.

Let's start with setting up our IAM role. We make use of a helper function within the Python SDK. This function throw an exception if run outside of a SageMaker notebook instance, as it gets metadata from the notebook instance. If running outside, you must provide an IAM role with proper access stated above in [Permissions](#Permissions).

In [11]:
from sagemaker import get_execution_role

role = get_execution_role()

## Fit, Deploy, Predict

Now that the rest of our estimator is configured, we can call `fit()` with the path to our local CIFAR10 dataset prefixed with `file://`. This invokes our TensorFlow container with 'train' and passes in our hyperparameters and other metadata as json files in /opt/ml/input/config within the container.

After our training has succeeded, our training algorithm outputs our trained model within the /opt/ml/model directory, which is used to handle predictions.

We can then call `deploy()` with an instance_count and instance_type, which is 1 and `local`. This invokes our Tensorflow container with 'serve', which setups our container to handle prediction requests through TensorFlow Serving. What is returned is a predictor, which is used to make inferences against our trained model.

After our prediction, we can delete our endpoint.

We recommend testing and training your training algorithm locally first, as it provides quicker iterations and better debuggability.

In [12]:
# Lets set up our SageMaker notebook instance for local mode.
!/bin/bash ./utils/setup.sh

The user has root access.
nvidia-docker2 already installed. We are good to go!
SageMaker instance route table setup is ok. We are good to go.
SageMaker instance routing for Docker is ok. We are good to go!


In [13]:
%%time
from sagemaker.estimator import Estimator

hyperparameters = {'train-steps': 100}

instance_type = 'local'

estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name='sagemaker-tf-cifar10-example:latest',
                      hyperparameters=hyperparameters)

estimator.fit('file:///tmp/cifar-10-data')

predictor = estimator.deploy(1, instance_type)

Creating tmprdyla9cq_algo-1-z5wyd_1 ... 
[1BAttaching to tmprdyla9cq_algo-1-z5wyd_12mdone[0m
[36malgo-1-z5wyd_1  |[0m Training complete.
[36mtmprdyla9cq_algo-1-z5wyd_1 exited with code 0
[0mAborting on container exit...
===== Job Complete =====


W1028 09:20:38.802621 140147847612224 connectionpool.py:662] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76619d9b70>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
W1028 09:20:38.805431 140147847612224 connectionpool.py:662] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76610b4320>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping
W1028 09:20:38.806475 140147847612224 connectionpool.py:662] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7661a90e10>: Failed to establish a new connection: [Errno 111] Connection refused',)': /ping


Attaching to tmpx2cvi3ir_algo-1-chxop_1
[36malgo-1-chxop_1  |[0m Starting TensorFlow Serving.
[36malgo-1-chxop_1  |[0m 2019-10-28 09:20:40.140345: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config:  model_name: cifar10_model model_base_path: /opt/ml/model/export/Servo
[36malgo-1-chxop_1  |[0m 2019-10-28 09:20:40.140502: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
[36malgo-1-chxop_1  |[0m 2019-10-28 09:20:40.140528: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: cifar10_model
[36malgo-1-chxop_1  |[0m 2019-10-28 09:20:40.240899: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: cifar10_model version: 1572254435}
[36malgo-1-chxop_1  |[0m 2019-10-28 09:20:40.240946: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: cifar10_model version: 1572254435}
[36malgo-1-chxop_1  |[0m 201

## Making predictions using Python SDK

To make predictions, we use an image that is converted using OpenCV into a json format to send as an inference request. We need to install OpenCV to deserialize the image that is used to make predictions.

The JSON reponse will be the probabilities of the image belonging to one of the 10 classes along with the most likely class the picture belongs to. The classes can be referenced from the [CIFAR-10 website](https://www.cs.toronto.edu/~kriz/cifar.html). Since we didn't train the model for that long, we aren't expecting very accurate results.

In [14]:
! pip install opencv-python



In [15]:
import cv2
import numpy

In [16]:
import cv2
import numpy

from sagemaker.predictor import json_serializer, json_deserializer

image = cv2.imread("data/cat.png", 1)

# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))

data = {'instances': numpy.asarray(image).astype(float).tolist()}

# The request and response format is JSON for TensorFlow Serving.
# For more information: https://www.tensorflow.org/serving/api_rest#predict_api
predictor.accept = 'application/json'
predictor.content_type = 'application/json'

predictor.serializer = json_serializer
predictor.deserializer = json_deserializer

# For more information on the predictor class.
# https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py
predictor.predict(data)

{'predictions': [{'probabilities': [0.0121649159,
    0.0816401541,
    0.0807319283,
    0.0960978195,
    0.152616918,
    0.0360619,
    0.148431093,
    0.206910297,
    0.154628932,
    0.0307160225],
   'classes': 7}]}

[36malgo-1-chxop_1  |[0m 172.19.0.1 - - [28/Oct/2019:09:20:53 +0000] "POST /invocations HTTP/1.1" 200 250 "-" "-"


### Deleting local endpoint

In [17]:
predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


# Part 2: Training and Hosting your Algorithm in Amazon SageMaker
Once you have your container packaged, you can use it to train and serve models. Let's do that with the algorithm we made above.

## Set up the environment
Here we specify the bucket to use and the role that is used for working with SageMaker.

In [18]:
# S3 prefix
prefix = 'tensorflow-cifar10-neo'

## Create the session

The session remembers our connection parameters to SageMaker. We use it to perform all of our SageMaker operations.

In [19]:
import sagemaker as sage

sess = sage.Session()

## Upload the data for training

We will use the tools provided by the SageMaker Python SDK to upload the data to a default bucket.

In [20]:
WORK_DIRECTORY = '/tmp/cifar-10-data'

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)
data_location

's3://sagemaker-us-east-1-497456752804/tensorflow-cifar10-neo'

## Training on SageMaker
Training a model on SageMaker with the Python SDK is done in a way that is similar to the way we trained it locally. This is done by changing our train_instance_type from `local` to one of our [supported EC2 instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/).

In addition, we must now specify the ECR image URL, which we just pushed above.

Finally, our local training dataset has to be in Amazon S3 and the S3 URL to our dataset is passed into the `fit()` call.

Let's first fetch our ECR image url that corresponds to the image we just built and pushed.

In [21]:
import boto3

client = boto3.client('sts')
account = client.get_caller_identity()['Account']

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = 'sagemaker-tf-cifar10-example'

ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)

print(ecr_image)

497456752804.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tf-cifar10-example:latest


---
## Train on EC2

*Note* - This takes about 4 mins on an ml.m4.xlarge for 100 steps. Study the logs to see actual training time vs. time to provision infrastructure.

In [22]:
%%time
from sagemaker.estimator import Estimator

hyperparameters = {'train-steps': 100}

instance_type = 'ml.m4.xlarge'

tfcifar_estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name=ecr_image,
                      hyperparameters=hyperparameters)

tfcifar_estimator.fit(data_location)

2019-10-28 09:22:09 Starting - Starting the training job...
2019-10-28 09:22:11 Starting - Launching requested ML instances......
2019-10-28 09:23:13 Starting - Preparing the instances for training...
2019-10-28 09:24:09 Downloading - Downloading input data
2019-10-28 09:24:09 Training - Downloading the training image......
2019-10-28 09:25:08 Training - Training image download completed. Training in progress....
2019-10-28 09:25:40 Uploading - Uploading generated training model.[31mTraining complete.[0m

2019-10-28 09:25:45 Completed - Training job completed
CPU times: user 558 ms, sys: 0 ns, total: 558 ms
Wall time: 4min 12s


## Deploy, but don't wait for completion (view progress in console and move on with the lab)

In [24]:
%%time
import time
endpoint_version = time.strftime("%m%d%Y%H%M%S")
m4_endpoint_name = 'original-m4-' + endpoint_version
predictor =tfcifar_estimator.deploy(1, instance_type, wait=False,endpoint_name=m4_endpoint_name)

CPU times: user 40.3 ms, sys: 0 ns, total: 40.3 ms
Wall time: 695 ms



# Part 3: Compiling models for various targets using Sagemaker Neo




## Compile for an EC2 target


In [25]:
target = 'ml_m4'
output_path = tfcifar_estimator.output_path + target
optimized_estimator_ml_m4 = tfcifar_estimator.compile_model(target_instance_family=target, 
                              #input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              #input_shape={'inputs':[-1, 32, 32, 3]},  # Height, width, depth from cifar10.py
                              input_shape={'data': [128, 3, 224, 224]},
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.8.0')

?.....!

---
## Compile for Jetson TX2

https://developer.nvidia.com/embedded/jetson-tx2

In [26]:
target = 'jetson_tx2'
output_path = tfcifar_estimator.output_path + target
optimized_estimator_tx2 = tfcifar_estimator.compile_model(target_instance_family=target, 
                              #input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              input_shape={'data':[5, 32, 32, 3]},  # Batch size 5, 3 channels, 32x32 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

?....!

W1028 09:28:18.024215 140147847612224 model.py:373] The instance type jetson_tx2 is not supported to deploy via SageMaker,please deploy the model manually.


---
## Compile for NVIDIA Jetson nano

https://developer.nvidia.com/embedded/jetson-nano-developer-kit

In [27]:
target = 'jetson_nano'
output_path = tfcifar_estimator.output_path + target
optimized_estimator_nano = tfcifar_estimator.compile_model(target_instance_family=target, 
                              #input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              input_shape={'data':[5, 32, 32, 3]},  # Batch size 5, 3 channels, 32x32 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

?....!

W1028 09:28:44.287264 140147847612224 model.py:373] The instance type jetson_nano is not supported to deploy via SageMaker,please deploy the model manually.


---

## Compile for Raspberry Pi

https://www.raspberrypi.org/products/raspberry-pi-4-model-b/

In [28]:
target = 'rasp3b'
output_path = tfcifar_estimator.output_path + target
optimized_estimator_rpi = tfcifar_estimator.compile_model(target_instance_family=target, 
                              #input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              input_shape={'data':[5, 32, 32, 3]},  # Batch size 5, 32x32 Images, 3 channels
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

?.....!

W1028 09:29:15.552706 140147847612224 model.py:373] The instance type rasp3b is not supported to deploy via SageMaker,please deploy the model manually.


----

## Compiled Model Summary

In [29]:
def get_model_size(estimator):
    out= !aws s3 ls {estimator.model_data} --human-readable
    return out[0].split(' ')[-3]+' MB'

In [30]:
import pandas as pd

estimators = [tfcifar_estimator, optimized_estimator_ml_m4, optimized_estimator_rpi, optimized_estimator_tx2, optimized_estimator_nano] 
targets = ['Original','EC2 M4','Raspberry Pi','Jetson tx2','Jetson Nano']
locations = [e.model_data for e in estimators]
sizes = [get_model_size(e) for e in estimators]
pd.set_option('display.max_colwidth', 0)
pd.DataFrame(list(zip(targets,locations,sizes)), columns =['Targets', 'Locations','Sizes']) 

Unnamed: 0,Targets,Locations,Sizes
0,Original,s3://sagemaker-us-east-1-497456752804/sagemaker-tf-cifar10-example-2019-10-28-09-22-09-098/output/model.tar.gz,9.0 MB
1,EC2 M4,s3://sagemaker-us-east-1-497456752804/ml_m4/model-ml_m4.tar.gz,1.7 MB
2,Raspberry Pi,s3://sagemaker-us-east-1-497456752804/rasp3b/model-rasp3b.tar.gz,1.7 MB
3,Jetson tx2,s3://sagemaker-us-east-1-497456752804/jetson_tx2/model-jetson_tx2.tar.gz,1.7 MB
4,Jetson Nano,s3://sagemaker-us-east-1-497456752804/jetson_nano/model-jetson_nano.tar.gz,1.7 MB


You can expect to see resulting sizes vary based on architecture, number of variables and hardware target used for Neo compilation.

## Before moving on, see if your model is deployed (Optional)

https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints

It should say "In Service", and not "Creating" or "Failed". If your endpoint is not yet "In sevice", the next code cell you execute will give you an error

## Invoking the original model (Optional)

In [31]:
predictor.endpoint

'original-m4-10282019092714'

In [None]:
%%time
image = cv2.imread("data/cat.png", 1)

# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))

data = {'instances': numpy.asarray(image).astype(float).tolist()}

# The request and response format is JSON for TensorFlow Serving.
# For more information: https://www.tensorflow.org/serving/api_rest#predict_api
predictor.accept = 'application/json'
predictor.content_type = 'application/json'

predictor.serializer = json_serializer
predictor.deserializer = json_deserializer

# For more information on the predictor class.
# https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py
predictor.predict(data)

But this includes network and framework level latencies, and not just model latency. Let's try predicting locally to get a comparison...

## Cleanup
When you're done with the endpoint, you should clean it up. All of the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account.

In [None]:
predictor.delete_endpoint()

# Compare original and Compiled models

## First start by downloading them ...

In [33]:
!aws s3 cp {optimized_estimator_ml_m4.model_data} ./

download: s3://sagemaker-us-east-1-497456752804/ml_m4/model-ml_m4.tar.gz to ./model-ml_m4.tar.gz


In [34]:
!aws s3 cp {tfcifar_estimator.model_data} ./

download: s3://sagemaker-us-east-1-497456752804/sagemaker-tf-cifar10-example-2019-10-28-09-22-09-098/output/model.tar.gz to ./model.tar.gz


In [35]:
!mkdir original & tar -xzvf model.tar.gz -C original

graph.pbtxt
model.ckpt-100.data-00000-of-00001
model.ckpt-100.index
model.ckpt-1.index
model.ckpt-1.meta
model.ckpt-100.meta
checkpoint
events.out.tfevents.1572254714.ip-10-0-76-21.ec2.internal
eval/
eval/events.out.tfevents.1572254736.ip-10-0-76-21.ec2.internal
export/
export/Servo/
export/Servo/1572254737/
export/Servo/1572254737/variables/
export/Servo/1572254737/variables/variables.data-00000-of-00001
export/Servo/1572254737/variables/variables.index
export/Servo/1572254737/saved_model.pb
model.ckpt-1.data-00000-of-00001


In [36]:
!mkdir compiled & tar -xzvf model-ml_m4.tar.gz -C compiled

compiled.params
compiled_model.json
compiled.so


## Local inference - original model

We will upgrade to TF 2.0 to demonstrate how you can use saved_models from older (in this case, 1.18.0) versions/

In [39]:
!pip install --upgrade tensorflow

Requirement already up-to-date: tensorflow in /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages (2.0.0)


In [40]:
import tensorflow as tf
print(tf.__version__)
tf.get_logger().setLevel('ERROR')
tf.executing_eagerly()

2.0.0


True

### Load model and serving signature

In [41]:
path = !find ./original/ -type f -name "*.pb"
path = path[0][:-14]
print(path)

./original/export/Servo/1572254737/


In [42]:
loaded = tf.saved_model.load(path)

In [43]:
!saved_model_cli show --dir {path} --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['inputs'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 32, 32, 3)
      name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['classes'] tensor_info:
      dtype: DT_INT64
      shape: (-1)
      name: ArgMax:0
  outputs['probabilities'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: softmax_tensor:0
Method name is: tensorflow/serving/predict


In [44]:
print(list(loaded.signatures.keys())) 

['serving_default']


In [45]:
infer = loaded.signatures["serving_default"]

Load example image ...

In [46]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))
i = tf.image.convert_image_dtype(image.reshape(-1,32,32,3),tf.float32)

(32, 32, 3)


Check single inference ...

In [48]:
%%time
infer(i)['probabilities']

CPU times: user 24.5 ms, sys: 0 ns, total: 24.5 ms
Wall time: 6.95 ms


<tf.Tensor: id=2905, shape=(1, 10), dtype=float32, numpy=
array([[0.07790422, 0.10809208, 0.11672521, 0.07859202, 0.09794388,
        0.06962992, 0.13660109, 0.14994791, 0.08573934, 0.07882429]],
      dtype=float32)>

Get mean value

In [49]:
time_original = %timeit -n25 -r25 -o infer(i)['probabilities']

5.96 ms ± 172 µs per loop (mean ± std. dev. of 25 runs, 25 loops each)


## Local inference - compiled model

DLR or Deep Learning Runtime is a part of Neo (https://github.com/neo-ai/neo-ai-dlr) is a compact, common runtime for deep learning models and decision tree models compiled by AWS SageMaker Neo, TVM, or Treelite. DLR uses the TVM runtime, Treelite runtime, NVIDIA TensorRT™, and can include other hardware-specific runtimes. DLR provides unified Python/C++ APIs for loading and running compiled models on various devices. DLR currently supports platforms from Intel, NVIDIA, and ARM, with support for Xilinx, Cadence, and Qualcomm coming soon.

In [52]:
!pip install dlr



In [53]:
from dlr import DLRModel
input_shape = {'data': [1, 3, 224, 224]} # A single RGB 224x224 image
output_shape = [1, 1000]                 # The probability for each one of the 1,000 classes
device = 'cpu'                           # Go, Raspberry Pi, go!

model = DLRModel(model_path='compiled')

In [54]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))

input_data = {'Placeholder': numpy.asarray(image).astype(float).tolist()}

(32, 32, 3)


Check single inference ...

In [55]:
%%time
model.run(input_data)

CPU times: user 8.47 ms, sys: 0 ns, total: 8.47 ms
Wall time: 5.06 ms


[array([[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)]

Get mean value ...

In [56]:
time_compiled = %timeit -n25 -r25 -o model.run(input_data)

1.81 ms ± 28.8 µs per loop (mean ± std. dev. of 25 runs, 25 loops each)


In [57]:
o1 = float(str(time_compiled)[:4])

In [58]:
o2 = float(str(time_original)[:4])

In [59]:
'{} vs {}ms ... {}x speedup!'.format(o2,o1,o2/o1)

'5.96 vs 1.81ms ... 3.292817679558011x speedup!'

# Thank you!