# Building your own algorithm container

With Amazon SageMaker, you can package your own algorithms that can than be deployed in the SageMaker hosting environment. This notebook will guide you through an example that shows you how to build a fast.ai Docker container for SageMaker and use it for inference.

By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. 

In [None]:
import boto3
from sagemaker import get_execution_role

role = get_execution_role()
print(f'Role is: {role}')

region = boto3.session.Session().region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

bucket = f'sagemaker-{account_id}-{region}'
print(f'Bucket is: {bucket}')

## Download the fast.ai trained model

The first thing we need to do is check if there is a model in S3. If not then we will download a pre-trained fast.ai model from a publicaly accessible S3 bucket to your own S3 bucket. We will use this model to create a SageMaker model later on.

In [None]:
import os
import urllib.request

def download(url):
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)

def upload_to_s3(key, file):
    s3 = boto3.resource('s3')
    data = open(file, "rb")
    s3.Bucket(bucket).put_object(Key=key, Body=data)

def object_exists(bucket_name, key):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    objs = list(bucket.objects.filter(Prefix=key))
    if len(objs) > 0 and objs[0].key == key:
        return True
    return False

In [None]:
key = 'models/caltech256_fastai/model.tar.gz'
if not object_exists(bucket, key):
    # Download the trained fast.ai model
    print("Downloading pre-trained model file")
    download(f'https://s3-eu-west-1.amazonaws.com/mmcclean-public-files/{key}')
    upload_to_s3(key, 'model.tar.gz')
else:
    print("Object already exists")

## Running your container for model hosting

We need to provide a container for hosting that will be able to repond to inference requests that come in via HTTP. In this example, we use our recommended Python serving stack to provide robust and scalable serving of inference requests:

![Request serving stack](stack.png)

This stack is implemented in the sample code here and you can mostly just leave it alone. 

Amazon SageMaker uses two URLs in the container:

* `/ping` will receive `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these will be passed in as well. 

The container will have the model files in the same place they were written during training:

    /opt/ml
    └── model
        └── <model files>


### The parts of the sample container

In the `container` directory are all the components you need to package the sample algorithm for Amazon SageMager:

    .
    ├── Dockerfile
    ├── build_and_push.sh
    └── fastai_predict
        ├── nginx.conf
        ├── predict.py        
        ├── predictor.py
        ├── serve
        ├── utils.py
        └── wsgi.py

Let's discuss each of these in turn:

* __`Dockerfile`__ describes how to build your Docker container image. More details below.
* __`build_and_push.sh`__ is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We'll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.
* __`fastai_predict`__ is the directory which contains the files that will be installed in the container.
* __`local_test`__ is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. We'll walk through local testing later in this notebook.

In this simple application, we only install seven files in the container. You may only need that many or, if you have many supporting routines, you may wish to install more. These five show the standard structure of our Python containers, although you are free to choose a different toolset and therefore could have a different layout. If you're writing in a different programming language, you'll certainly have a different layout depending on the frameworks and tools you choose.

The files that we'll put in the container are:

* __`nginx.conf`__ is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.
* __`predict.py`__ is the main class with the logic to do the fast.ai predictions. You'll want to customize the actual prediction parts to your application.
* __`predictor.py`__ is the program that actually implements the Flask web server. You'll want to customize the actual prediction parts to your application, especially the content type check. In our example we accept the content type `image/jpeg`
* __`serve`__ is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in `predictor.py`. You should be able to take this file as-is.
* __`utils.py`__ is a utility file with functions to do things such as transform the image before sending to the model for inference. It implements many of the fast.ai image transformation functions.
* __`wsgi.py`__ is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.

In summary, the files you will probably want to change for your application are `predictor.py` and `predict.py`.

## The Dockerfile

The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. 

For the Python science stack, we will start from a standard Python installation and run the normal tools to install the things needed by fast.ai library. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under.

Along the way, we clean up extra space. This makes the container smaller and faster to start.

Let's look at the Dockerfile for the example:

In [None]:
!cat container/Dockerfile

## Building and registering the container

The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh fastai_predict` to build the image `fastai_predict`. 

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this will be the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

In [None]:
! cd container && ./build_and_push.sh fastai_predict

## Host

Stary by defining our model to hosting.  Amazon SageMaker Algorithm containers are published to accounts which are unique across region, so we've accounted for that here.

In [None]:
import time


fastai_model = 'DEMO-fastai-byom-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

sm = boto3.client('sagemaker')

image = '{}.dkr.ecr.{}.amazonaws.com/fastai_predict:latest'.format(account_id, region)

model_key = 'models/caltech256_fastai/model.tar.gz'

create_model_response = sm.create_model(
    ModelName=fastai_model,
    ExecutionRoleArn=role,
    PrimaryContainer={
        'Image': image,
        'ModelDataUrl': f's3://{bucket}/{model_key}'})

model_arn=create_model_response['ModelArn']

print(f'Model Arn: {model_arn}')

Then setup our endpoint configuration.

In [None]:
fastai_endpoint_config = 'DEMO-fastai-byom-endpoint-config-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(fastai_endpoint_config)
create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=fastai_endpoint_config,
    ProductionVariants=[{
        'InstanceType': 'ml.m4.xlarge',
        'InitialInstanceCount': 1,
        'ModelName': fastai_model,
        'VariantName': 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Finally, initiate our endpoints.

In [None]:
%%time

fastai_endpoint = 'DEMO-fastai-byom-endpoint-' + time.strftime("%Y%m%d%H%M", time.gmtime())
print(fastai_endpoint)
create_endpoint_response = sm.create_endpoint(
    EndpointName=fastai_endpoint,
    EndpointConfigName=fastai_endpoint_config)
print(create_endpoint_response['EndpointArn'])

resp = sm.describe_endpoint(EndpointName=fastai_endpoint)
status = resp['EndpointStatus']
print("Status: " + status)

sm.get_waiter('endpoint_in_service').wait(EndpointName=fastai_endpoint)

resp = sm.describe_endpoint(EndpointName=fastai_endpoint)
status = resp['EndpointStatus']
print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

if status != 'InService':
    raise Exception('Endpoint creation did not succeed')

## Perform Inference
Finally, the customer can now validate the model for use. They can obtain the endpoint from the client library using the result from previous operations, and generate classifications from the trained model using that endpoint.

In [None]:
import boto3
runtime = boto3.Session().client(service_name='runtime.sagemaker') 

### Download test image

In [None]:
!wget -O /tmp/test.jpg http://www.vision.caltech.edu/Image_Datasets/Caltech256/images/008.bathtub/008_0007.jpg
file_name = '/tmp/test.jpg'
# test image
from IPython.display import Image
Image(file_name)  

In [None]:
import json
import numpy as np
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = bytearray(payload)
response = runtime.invoke_endpoint(EndpointName=fastai_endpoint, 
                                   ContentType='image/jpeg', 
                                   Body=payload)
result = response['Body'].read()
res = json.loads(result)
res

### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

In [None]:
sm.delete_endpoint(EndpointName=fastai_endpoint)