# Building your own algorithm container

With Amazon SageMaker, you can package your own algorithms that can than be deployed in the SageMaker hosting environment. This notebook will guide you through an example that shows you how to build a fast.ai Docker container for SageMaker and use it for inference.

By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. 

## Download the fast.ai trained model

The first thing we need to do is download the trained fast.ai model from a publicaly accessible S3 bucket. We will copy this file to our own S3 bucket so that we can create a SageMaker model from it.

In [1]:
import boto3

region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

bucket = 'sagemaker-{}-{}'.format(account_id, region)
print(f'Bucket is: {bucket}')

Bucket is: sagemaker-934676248949-eu-west-1


In [None]:
import os
import urllib.request

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)

        
def upload_to_s3(channel, file):
    s3 = boto3.resource('s3')
    data = open(file, "rb")
    key = channel + '/' + file
    s3.Bucket(bucket).put_object(Key=key, Body=data)


# # Download the trained fast.ai model
download('https://s3-eu-west-1.amazonaws.com/mmcclean-public-files/fastai_caltech256_model.tar.gz')
upload_to_s3('models', 'fastai_caltech256_model.tar.gz')

## Running your container during hosting

Hosting has a very different model than training because hosting is reponding to inference requests that come in via HTTP. In this example, we use our recommended Python serving stack to provide robust and scalable serving of inference requests:

![Request serving stack](stack.png)

This stack is implemented in the sample code here and you can mostly just leave it alone. 

Amazon SageMaker uses two URLs in the container:

* `/ping` will receive `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these will be passed in as well. 

The container will have the model files in the same place they were written during training:

    /opt/ml
    └── model
        └── <model files>


### The parts of the sample container

In the `container` directory are all the components you need to package the sample algorithm for Amazon SageMager:

    .
    ├── Dockerfile
    ├── build_and_push.sh
    └── fastai_caltech256
        ├── model.py
        ├── nginx.conf
        ├── predict.py        
        ├── predictor.py
        ├── serve
        ├── utils.py
        └── wsgi.py

Let's discuss each of these in turn:

* __`Dockerfile`__ describes how to build your Docker container image. More details below.
* __`build_and_push.sh`__ is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We'll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.
* __`fastai_caltech256`__ is the directory which contains the files that will be installed in the container.
* __`local_test`__ is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. We'll walk through local testing later in this notebook.

In this simple application, we only install seven files in the container. You may only need that many or, if you have many supporting routines, you may wish to install more. These five show the standard structure of our Python containers, although you are free to choose a different toolset and therefore could have a different layout. If you're writing in a different programming language, you'll certainly have a different layout depending on the frameworks and tools you choose.

The files that we'll put in the container are:

* __`model.py`__ is the singleton class that loads the fast.ai model and classes objects from the model directory.
* __`nginx.conf`__ is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
* __`predict.py`__ is the main class with the logic to do the fast.ai predictions. You'll want to customize the actual prediction parts to your application.
* __`predictor.py`__ is the program that actually implements the Flask web server. You'll want to customize the actual prediction parts to your application.
* __`serve`__ is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in `predictor.py`. You should be able to take this file as-is.
* __`utils.py`__ is a utility file with functions to do things such as transform the image before sending to the model for inference. It implements many of the fast.ai image transformation functions.
* __`wsgi.py`__ is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.

In summary, the file you will probably want to change for your application is `predictor.py`.

## The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

For the Python science stack, we will start from a standard Python installation and run the normal tools to install the things needed by fast.ai library. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under.

Along the way, we clean up extra space. This makes the container smaller and faster to start.

Let's look at the Dockerfile for the example:

In [2]:
!cat container/Dockerfile

# Build an image that can do training and inference in SageMaker
# This is an image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.

FROM python:3.6.5-slim-stretch

MAINTAINER Amazon AI <mmcclean@amazon.com>


RUN apt-get -y update && apt-get install -y --no-install-recommends \
         nginx \
         ca-certificates \
         libglib2.0-dev \
    && rm -rf /var/lib/apt/lists/*


# Here we get all python packages.
RUN pip install flask gevent gunicorn future
RUN pip install boto3 pyyaml dill numpy opencv-python-headless \
    http://download.pytorch.org/whl/cpu/torch-0.3.1-cp36-cp36m-linux_x86_64.whl \
    https://s3-eu-west-1.amazonaws.com/mmcclean-public-files/fastai-lib.zip && \ 
    rm -rf /root/.cache

# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writ

## Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh fastai_predict` to build the image `fastai_predict`. 

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this will be the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

In [5]:
! cd container && ./build_and_push.sh fastai_predict

Login Succeeded
Sending build context to Docker daemon  40.45kB
Step 1/10 : FROM python:3.6.5-slim-stretch
3.6.5-slim-stretch: Pulling from library/python

[1B67a397c4: Pulling fs layer 
[1B085bc22b: Pulling fs layer 
[1B7790bc68: Pulling fs layer 
[1B29adba1b: Pulling fs layer 
[1Bfd6eb5d0: Pull complete 069MB/2.069MBB[4A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[5A[1K[K[2A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[1A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[5A[1K[K[4A[1K[K[4A[1K[K[4A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[3A[1K[K[2A[1K[K[1A[1K[K[1A[1K[KDigest: sha256:56100f5b5e299f4488f51ea81cc1a67b5ff13ee2f926280eaf8e527a881afa61
Status: Downloaded newer image for python:3.6.5-slim-stretch
 ---> 29ea9c0b39c6
Step 2/10 : MAINTAINER Amazon AI <mmcclean@amazon.com>
 ---> Running in e3868bf987f5
Removing in

Get:41 http://deb.debian.org/debian stretch/main amd64 libglib2.0-data all 2.50.3-2 [2517 kB]
Get:42 http://deb.debian.org/debian stretch/main amd64 libglib2.0-bin amd64 2.50.3-2 [1615 kB]
Get:43 http://deb.debian.org/debian stretch/main amd64 libpcre16-3 amd64 2:8.39-3 [258 kB]
Get:44 http://deb.debian.org/debian stretch/main amd64 libpcre32-3 amd64 2:8.39-3 [248 kB]
Get:45 http://deb.debian.org/debian stretch/main amd64 libpcre3-dev amd64 2:8.39-3 [647 kB]
Get:46 http://deb.debian.org/debian stretch/main amd64 pkg-config amd64 0.29-4+b1 [63.3 kB]
Get:47 http://deb.debian.org/debian stretch/main amd64 zlib1g-dev amd64 1:1.2.8.dfsg-5 [205 kB]
Get:48 http://deb.debian.org/debian stretch/main amd64 libglib2.0-dev amd64 2.50.3-2 [2984 kB]
Get:49 http://security.debian.org/debian-security stretch/updates/main amd64 linux-libc-dev amd64 4.9.88-1+deb9u1 [1327 kB]
Get:50 http://deb.debian.org/debian stretch/main amd64 nginx-common all 1.10.3-1+deb9u1 [104 kB]
Get:51 http://deb.debian.org/debi

Selecting previously unselected package libxdmcp6:amd64.
Preparing to unpack .../19-libxdmcp6_1%3a1.1.2-3_amd64.deb ...
Unpacking libxdmcp6:amd64 (1:1.1.2-3) ...
Selecting previously unselected package libxcb1:amd64.
Preparing to unpack .../20-libxcb1_1.12-1_amd64.deb ...
Unpacking libxcb1:amd64 (1.12-1) ...
Selecting previously unselected package libx11-data.
Preparing to unpack .../21-libx11-data_2%3a1.6.4-3_all.deb ...
Unpacking libx11-data (2:1.6.4-3) ...
Selecting previously unselected package libx11-6:amd64.
Preparing to unpack .../22-libx11-6_2%3a1.6.4-3_amd64.deb ...
Unpacking libx11-6:amd64 (2:1.6.4-3) ...
Selecting previously unselected package libxpm4:amd64.
Preparing to unpack .../23-libxpm4_1%3a3.5.12-1_amd64.deb ...
Unpacking libxpm4:amd64 (1:3.5.12-1) ...
Selecting previously unselected package libgd3:amd64.
Preparing to unpack .../24-libgd3_2.2.4-2+deb9u2_amd64.deb ...
Unpacking libgd3:amd64 (2.2.4-2+deb9u2) ...
Selecting previously unselected package libgeoip1:amd64.
P

Setting up pkg-config (0.29-4+b1) ...
Setting up libxcb1:amd64 (1.12-1) ...
Setting up python3.5 (3.5.3-1) ...
Setting up libpython3-stdlib:amd64 (3.5.3-1) ...
Setting up libfontconfig1:amd64 (2.11.0-6.7+b1) ...
Setting up libx11-6:amd64 (2:1.6.4-3) ...
Setting up libxpm4:amd64 (1:3.5.12-1) ...
Setting up libgd3:amd64 (2.2.4-2+deb9u2) ...
Setting up libnginx-mod-http-image-filter (1.10.3-1+deb9u1) ...
Setting up nginx-full (1.10.3-1+deb9u1) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up nginx (1.10.3-1+deb9u1) ...
Setting up python3 (3.5.3-1) ...
running python rtupdate hooks for python3.5...
running python post-rtupdate hooks for python3.5...
Setting up libglib2.0-dev (2.50.3-2) ...
Setting up dh-python (2.20170125) ...
Processing triggers for libc-bin (2.24-11+deb9u3) ...
Removing intermediate container 82b821e62512
 ---> a57235fe4be9
Step 4/10 : RUN pip install flask gevent gunicorn future
 ---> Running in dca24f9

Removing intermediate container 8c39d712186a
 ---> 756bc58e63ae
Step 6/10 : ENV PYTHONUNBUFFERED=TRUE
 ---> Running in b5484ffdeaf2
Removing intermediate container b5484ffdeaf2
 ---> e69902e2009f
Step 7/10 : ENV PYTHONDONTWRITEBYTECODE=TRUE
 ---> Running in 94afe5ca94b4
Removing intermediate container 94afe5ca94b4
 ---> 21d94175959a
Step 8/10 : ENV PATH="/opt/program:${PATH}"
 ---> Running in bf29bb005d1d
Removing intermediate container bf29bb005d1d
 ---> 9a91419bace2
Step 9/10 : COPY fastai_predict /opt/program
 ---> 4f6d842fd0d6
Step 10/10 : WORKDIR /opt/program
Removing intermediate container b17fd715f15e
 ---> 27a6c264919a
Successfully built 27a6c264919a
Successfully tagged fastai_predict:latest
The push refers to repository [934676248949.dkr.ecr.eu-west-1.amazonaws.com/fastai_predict]

[1B1270c292: Preparing 
[1Bea65047c: Preparing 
[1Bb1e715ec: Preparing 
[1Bec2dc408: Preparing 
[1Bfd624534: Preparing 
[1B600998a1: Preparing 
[1Bfc4b2ede: Preparing 
[1B5658ddbc: Preparing

## Host

Stary by defining our model to hosting.  Amazon SageMaker Algorithm containers are published to accounts which are unique across region, so we've accounted for that here.

In [None]:
fastai_model = 'DEMO-fastai-byom-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

sm = boto3.client('sagemaker')

image = '{}.dkr.ecr.{}.amazonaws.com/fastai_predict:latest'.format(account, region)}

create_model_response = sm.create_model(
    ModelName=fastai_model,
    ExecutionRoleArn=role,
    PrimaryContainer={
        'Image': image,
        'ModelDataUrl': 's3://{}/models/fastai_caltech256_model.tar.gz'.format(bucket)})

print(create_model_response['ModelArn'])

Then setup our endpoint configuration.

In [None]:
fastai_endpoint_config = 'DEMO-fastai-byom-endpoint-config-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(fastai_endpoint_config)
create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=fastai_endpoint_config,
    ProductionVariants=[{
        'InstanceType': 'ml.m4.xlarge',
        'InitialInstanceCount': 1,
        'ModelName': fastai_model,
        'VariantName': 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Finally, initiate our endpoints.

In [None]:
%%time

fastai_endpoint = 'DEMO-fastai-byom-endpoint-' + time.strftime("%Y%m%d%H%M", time.gmtime())
print(fastai_endpoint)
create_endpoint_response = sm.create_endpoint(
    EndpointName=fastai_endpoint,
    EndpointConfigName=fastai_endpoint_config)
print(create_endpoint_response['EndpointArn'])

resp = sm.describe_endpoint(EndpointName=fastai_endpoint)
status = resp['EndpointStatus']
print("Status: " + status)

sm.get_waiter('endpoint_in_service').wait(EndpointName=fastai_endpoint)

resp = sm.describe_endpoint(EndpointName=fastai_endpoint)
status = resp['EndpointStatus']
print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

if status != 'InService':
    raise Exception('Endpoint creation did not succeed')

## Perform Inference
Finally, the customer can now validate the model for use. They can obtain the endpoint from the client library using the result from previous operations, and generate classifications from the trained model using that endpoint.

In [6]:
import boto3
runtime = boto3.Session().client(service_name='runtime.sagemaker') 

### Download test image

In [None]:
!wget -O /tmp/test.jpg http://www.vision.caltech.edu/Image_Datasets/Caltech256/images/008.bathtub/008_0007.jpg
file_name = '/tmp/test.jpg'
# test image
from IPython.display import Image
Image(file_name)  

In [None]:
import json
import numpy as np
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = bytearray(payload)
response = runtime.invoke_endpoint(EndpointName=endpoint_name, 
                                   ContentType='application/x-image', 
                                   Body=payload)
result = response['Body'].read()
result