# Building your own TensorFlow container

With Amazon SageMaker, you can package your own algorithms that can then be trained and deployed in the SageMaker environment. This notebook guides you through an example using TensorFlow that shows you how to build a Docker container for SageMaker and use it for training and inference.

By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. 

1. [Building your own TensorFlow container](#Building-your-own-tensorflow-container)
  1. [When should I build my own algorithm container?](#When-should-I-build-my-own-algorithm-container?)
  1. [Permissions](#Permissions)
  1. [The example](#The-example)
  1. [The presentation](#The-presentation)
1. [Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker](#Part-1:-Packaging-and-Uploading-your-Algorithm-for-use-with-Amazon-SageMaker)
    1. [An overview of Docker](#An-overview-of-Docker)
    1. [How Amazon SageMaker runs your Docker container](#How-Amazon-SageMaker-runs-your-Docker-container)
      1. [Running your container during training](#Running-your-container-during-training)
        1. [The input](#The-input)
        1. [The output](#The-output)
      1. [Running your container during hosting](#Running-your-container-during-hosting)
    1. [The parts of the sample container](#The-parts-of-the-sample-container)
    1. [The Dockerfile](#The-Dockerfile)
    1. [Building and registering the container](#Building-and-registering-the-container)
  1. [Testing your algorithm on your local machine](#Testing-your-algorithm-on-your-local-machine)
1. [Part 2: Training and Hosting your Algorithm in Amazon SageMaker](#Part-2:-Training-and-Hosting-your-Algorithm-in-Amazon-SageMaker)
  1. [Set up the environment](#Set-up-the-environment)
  1. [Create the session](#Create-the-session)
  1. [Upload the data for training](#Upload-the-data-for-training)
  1. [Training On SageMaker](#Training-on-SageMaker)
  1. [Optional cleanup](#Optional-cleanup)  
1. [Reference](#Reference)

_or_ I'm impatient, just [let me see the code](#The-Dockerfile)!

## When should I build my own algorithm container?

You may not need to create a container to bring your own code to Amazon SageMaker. When you are using a framework such as Apache MXNet or TensorFlow that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework. This set of supported frameworks is regularly added to, so you should check the current list to determine whether your algorithm is written in one of these common machine learning environments.

Even if there is direct SDK support for your environment or framework, you may find it more effective to build your own container. If the code that implements your algorithm is quite complex or you need special additions to the framework, building your own container may be the right choice.

Some of the reasons to build an already supported framework container are:
1. A specific version isn't supported.
2. Configure and install your dependencies and environment.
3. Use a different training/hosting solution than provided.

This walkthrough shows that it is quite straightforward to build your own container. So you can still use SageMaker even if your use case is not covered by the deep learning containers that we've built for you.

## Permissions

Running this notebook requires permissions in addition to the normal `SageMakerFullAccess` permissions. This is because it creates new repositories in Amazon ECR. The easiest way to add these permissions is simply to add the managed policy `AmazonEC2ContainerRegistryFullAccess` to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

## The example

In this example we show how to package a custom TensorFlow container with a Python example which works with the CIFAR-10 dataset and uses TensorFlow Serving for inference. However, different inference solutions other than TensorFlow Serving can be used by modifying the docker container.

In this example, we use a single image to support training and hosting. This simplifies the procedure because we only need to manage one image for both tasks. Sometimes you may want separate images for training and hosting because they have different requirements. In this case, separate the parts discussed below into separate Dockerfiles and build two images. Choosing whether to use a single image or two images is a matter of what is most convenient for you to develop and manage.

If you're only using Amazon SageMaker for training or hosting, but not both, only the functionality used needs to be built into your container.

[CIFAR-10]: http://www.cs.toronto.edu/~kriz/cifar.html

## The presentation

This presentation is divided into two parts: _building_ the container and _using_ the container.

## SageMaker Python SDK Local Training
To represent our training, we use the Estimator class, which needs to be configured in five steps. 
1. IAM role - our AWS execution role
2. train_instance_count - number of instances to use for training.
3. train_instance_type - type of instance to use for training. For training locally, we specify `local`.
4. image_name - our custom TensorFlow Docker image we created.
5. hyperparameters - hyperparameters we want to pass.

Let's start with setting up our IAM role. We make use of a helper function within the Python SDK. This function throw an exception if run outside of a SageMaker notebook instance, as it gets metadata from the notebook instance. If running outside, you must provide an IAM role with proper access stated above in [Permissions](#Permissions).

In [25]:
from sagemaker import get_execution_role

role = get_execution_role()

## Fit, Deploy, Predict

Now that the rest of our estimator is configured, we can call `fit()` with the path to our local CIFAR10 dataset prefixed with `file://`. This invokes our TensorFlow container with 'train' and passes in our hyperparameters and other metadata as json files in /opt/ml/input/config within the container.

After our training has succeeded, our training algorithm outputs our trained model within the /opt/ml/model directory, which is used to handle predictions.

We can then call `deploy()` with an instance_count and instance_type, which is 1 and `local`. This invokes our Tensorflow container with 'serve', which setups our container to handle prediction requests through TensorFlow Serving. What is returned is a predictor, which is used to make inferences against our trained model.

After our prediction, we can delete our endpoint.

We recommend testing and training your training algorithm locally first, as it provides quicker iterations and better debuggability.

In [35]:
from sagemaker.estimator import Estimator

# Training and Hosting your Algorithm in Amazon SageMaker
Once you have your container packaged, you can use it to train and serve models. Let's do that with the algorithm we made above.

## Set up the environment
Here we specify the bucket to use and the role that is used for working with SageMaker.

In [29]:
# S3 prefix
prefix = 'DEMO-tensorflow-fraud2'

## Create the session

The session remembers our connection parameters to SageMaker. We use it to perform all of our SageMaker operations.

In [30]:
import sagemaker as sage

sess = sage.Session()

In [32]:
data_location = "s3://sagemaker-us-east-1-079329190341/DEMO-tensorflow-fraud2"

## Training on SageMaker
Training a model on SageMaker with the Python SDK is done in a way that is similar to the way we trained it locally. This is done by changing our train_instance_type from `local` to one of our [supported EC2 instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/).

In addition, we must now specify the ECR image URL, which we just pushed above.

Finally, our local training dataset has to be in Amazon S3 and the S3 URL to our dataset is passed into the `fit()` call.

Let's first fetch our ECR image url that corresponds to the image we just built and pushed.

In [33]:
import boto3

client = boto3.client('sts')
account = client.get_caller_identity()['Account']

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = 'sagemaker-tf-frauddet-example'

ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)

print(ecr_image)

079329190341.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tf-frauddet-example:latest


In [34]:
from sagemaker.estimator import Estimator

hyperparameters = {'train-steps': 100}

instance_type = 'ml.m4.xlarge'

estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name=ecr_image,
                      hyperparameters=hyperparameters)

estimator.fit(data_location)



2019-06-04 15:26:49 Starting - Starting the training job...
2019-06-04 15:26:51 Starting - Launching requested ML instances.........
2019-06-04 15:28:29 Starting - Preparing the instances for training......
2019-06-04 15:29:42 Downloading - Downloading input data
2019-06-04 15:29:42 Training - Downloading the training image......
2019-06-04 15:30:24 Training - Training image download completed. Training in progress...
[31mTraining complete.[0m

2019-06-04 15:31:00 Uploading - Uploading generated training model
2019-06-04 15:33:42 Completed - Training job completed
Billable seconds: 258


In [None]:
predictor = estimator.deploy(1, instance_type, endpoint_name='fd-tf-customized-algo', update_endpoint=True)

------------

In [40]:
predictor.endpoint

'fd-tf-customized-algo'

In [39]:
data_location

's3://sagemaker-us-east-1-079329190341/DEMO-tensorflow-fraud2'

In [41]:
len(test_x.keys())

30

In [42]:
import os
import pandas as pd
test_path = os.path.join(data_location, 'creditcard.csv')
test = pd.read_csv(test_path)
test_x, test_y = test, test.pop('Class')
x = {}
for key in test_x.keys():
    x[key] = test_x.iloc[0][key]
print(x)
import json
from sagemaker.predictor import json_serializer, json_deserializer
predictor.accept = 'application/json'
predictor.content_type = 'application/json'
# predictor.serializer = json_serializer
# predictor.deserializer = json_deserializer
predictor.predict(json.dumps({"signature_name":"prediction","instances": [x]}))

{'Time': 0.0, 'V1': -1.3598071336738, 'V2': -0.0727811733098497, 'V3': 2.53634673796914, 'V4': 1.37815522427443, 'V5': -0.33832076994251803, 'V6': 0.462387777762292, 'V7': 0.239598554061257, 'V8': 0.0986979012610507, 'V9': 0.363786969611213, 'V10': 0.0907941719789316, 'V11': -0.551599533260813, 'V12': -0.617800855762348, 'V13': -0.991389847235408, 'V14': -0.31116935369987897, 'V15': 1.46817697209427, 'V16': -0.47040052525947795, 'V17': 0.20797124192924202, 'V18': 0.0257905801985591, 'V19': 0.403992960255733, 'V20': 0.251412098239705, 'V21': -0.018306777944153, 'V22': 0.277837575558899, 'V23': -0.110473910188767, 'V24': 0.0669280749146731, 'V25': 0.12853935827352803, 'V26': -0.189114843888824, 'V27': 0.13355837674038698, 'V28': -0.0210530534538215, 'Amount': 149.62}


b'{\n    "predictions": [\n        {\n            "class_ids": [0],\n            "probabilities": [0.946759, 0.0532413],\n            "logits": [0.828421, -2.04979]\n        }\n    ]\n}'

In [60]:
# import requests
# resp = requests.post('https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/sagemaker-tf-frauddet-example-2019-04-29-17-54-55-016/invocations', json=
#         {"signature_name":"prediction","instances": [x]}
#         )
# print(resp.json())
# resp.raise_for_status()