---

# Contents

7. [How to Build the Custom Sagemaker Container for Model Deployment](#7.-How-to-Build-the-Custom-Sagemaker-Container-for-Model-Deployment)
8. [How to Deploy Models as Sagemaker Multi Model Endpoint and Invoke the Endpoint](#8.-How-to-Deploy-Models-as-Sagemaker-Multi-Model-Endpoint-and-Invoke-the-Endpoint)
9. [Clean up the resources](#9.-Clean-up-the-resources)
10. [Conclusion](#10.-Conclusion)

---

In [1]:
# Import libraries
import boto3
import jsonlines
import json
import time

from sagemaker import get_execution_role
from time import gmtime, strftime

# 7. How to Build the Custom Sagemaker Container for Model Deployment

Inspired by [an example of bringing your own container for deployment to a multi-model endpoint.](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/multi_model_bring_your_own), here we use the [Multi Model Server](https://github.com/awslabs/multi-model-server) framework and the [SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) for hosting the multiple forecasting models at the same time using one endpoint:

- Multi Model Server (MMS) is an open source framework for serving machine learning models. MMS supports a pluggable custom backend handler where you can implement your own algorithm. It provides the HTTP frontend and model management capabilities required by multi-model endpoints to host multiple models within a single container, load models into and unload models out of the container dynamically, and performing inference on a specified loaded model. MMS supports [various settings](https://github.com/awslabs/multi-model-server/blob/master/docker/advanced_settings.md#description-of-config-file-settings) for the frontend server it starts.
- SageMaker Inference Toolkit
[SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) is a library that bootstraps MMS in a way that is compatible with SageMaker multi-model endpoints, while still allowing you to tweak important performance parameters, such as the number of workers per model.

In this way, we can compare all the model forecasts in real-time more efficiently, and can save the cost of creating multiple endpoints.

## Define model handler
The code snippet __`container/model_handler.py`__ below shows how we define a custom handler that supports loading and inference for the GluonTs models.
- The `initialize` method will be called when a model is loaded into memory. In this example, it loads the model artifacts at `model_dir` into the GluonTS Predictor class.

- The `handle` method will be called when invoking the model. In this example, it validates the input payload and then forwards the input to the GluonTS Predictor class, returning the output. This handler class is instantiated for every model loaded into the container, so state in the handler is not shared across models.

In [2]:
!cat container/model_handler.py

"""
ModelHandler defines an example model handler for load and inference requests for MXNet CPU models
"""
from collections import namedtuple
import glob
import json
import logging
import io
import os
import re

import mxnet as mx
import numpy as np
import sys

from pathlib import Path
from gluonts.model.predictor import Predictor
from gluonts.dataset.common import ListDataset

class ModelHandler(object):
    """
    A sample Model handler implementation.
    """

    def __init__(self):
        self.initialized = False
        self.mx_model = None
        self.shapes = None
    
    def load_model(self, model_path):
        try:
            predictor = Predictor.deserialize(Path(model_path))
            print('Model loaded from %s'%model_path)
        except:
            print('Unable to load the model %s'%model_path)
            sys.exit(1)
        return predictor

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time


### Unit testing for the model handler
Before we build the custom docker container, it is good habit to do some unit testing (__`container/test_model_handler.py`__) as below.

In [3]:
%%bash

cd container
pytest -v test_model_handler.py

platform linux -- Python 3.6.10, pytest-5.0.1, py-1.9.0, pluggy-0.13.1 -- /home/ec2-user/SageMaker/time-series-blog-draft/.myenv/miniconda/envs/gluonts-multimodel/bin/python
cachedir: .pytest_cache
rootdir: /home/ec2-user/SageMaker/time-series-blog-draft/container
collecting ... collected 5 items

test_model_handler.py::test_load_model PASSED                            [ 20%]
test_model_handler.py::test_initialize PASSED                            [ 40%]
test_model_handler.py::test_preprocess PASSED                            [ 60%]
test_model_handler.py::test_handle[5-quantiles0] PASSED                  [ 80%]
test_model_handler.py::test_handle[25-quantiles1] PASSED                 [100%]



## Define Docker Entrypoint
The inference container in this example uses the Inference Toolkit to start MMS which can be seen in the __`container/dockerd-entrypoint.py`__ file as below.

In [4]:
!cat container/dockerd-entrypoint.py

import subprocess
import sys
import shlex
import os
from retrying import retry
from subprocess import CalledProcessError
from sagemaker_inference import model_server

def _retry_if_error(exception):
    return isinstance(exception, CalledProcessError or OSError)

@retry(stop_max_delay=1000 * 50,
       retry_on_exception=_retry_if_error)
def _start_mms():
    # by default the number of workers per model is 1, but we can configure it through the
    # environment variable below if desired.
    # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
    model_server.start_model_server(handler_service='/home/model-server/model_handler.py:handle')

def main():
    if sys.argv[1] == 'serve':
        _start_mms()
    else:
        subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))

    # prevent docker exit
    subprocess.call(['tail', '-f', '/dev/null'])
    
main()


## Building and registering a container

The shell script below will first build a custome Docker image which uses MMS as the front end (configured through SageMaker Inference Toolkit in `container/dockerd-entrypoint.py`), and `container/model_handler.py` shown above as the backend handler. It will then upload the image to an ECR repository in your account.

In [5]:
%%sh

# The name of our algorithm
algorithm_name=demo-sagemaker-multimodel-gluonts

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:4ec2e8bc51a6677e7bd6e3cedae6bcdd7fbc0819025fd63af46a40243552fa35
The push refers to repository [783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts]
c3a8961ea0dd: Preparing
a4e5c2f29143: Preparing
2ad0f72db02b: Preparing
9c5493c63a91: Preparing
277c49a9d3ff: Preparing
cd31335e8e41: Preparing
c1b3849a086a: Preparing
bf3f26e338ae: Preparing
2ca7984a1779: Preparing
dbc5ddf63966: Preparing
b255b26a82a5: Preparing
8f312108c760: Preparing
22144637480e: Preparing
24cd7a0a3078: Preparing
8980490753a8: Preparing
270e75e92418: Preparing
2ca7984a1779: Waiting
dbc5ddf63966: Waiting
b255b26a82a5: Waiting
8f312108c760: Waiting
22144637480e: Waiting
24cd7a0a3078: Waiting
8980490753a8: Waiting
270e75e92418: Waiting
cd31335e8e41: Waiting
c1b3849a086a: Waiting
bf3f26e338ae: Waiting
9c5493c63a91: Layer already exists
c3a8961ea0dd: Layer already exists
277c49a9d3ff: Layer already exists
a4e5c2f29143: Layer already exists
2ad0f72db02b: Layer already ex

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



# 8. How to Deploy Models as Sagemaker Multi Model Endpoint and Invoke the Endpoint

## Set up the environment

First, we need to define the S3 bucket and prefix of the model artifacts that will be invoked by the multi-model endpoint. we also need to define the IAM role that will give SageMaker access to the model artifacts and ECR image that was created above.

In [6]:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

bucket = 'sagemaker-{}-{}'.format(region, account_id)
prefix = 'demo-multimodel-gluonts-endpoint'

role = get_execution_role()

models_dir = "models"

## Create a multi-model endpoint
### Import models into hosting
When creating the Model entity for multi-model endpoints, the container's `ModelDataUrl` is the S3 prefix where the model artifacts that are invokable by the endpoint are located. The rest of the S3 path will be specified when invoking the model.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

In [7]:
model_name = 'DEMO-MultiModelGluonTSModel' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 'https://s3-{}.amazonaws.com/{}/{}/{}/'.format(region, bucket, prefix, models_dir)
container = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, 'demo-sagemaker-multimodel-gluonts')

print('Model name: ' + model_name)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'MultiModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

Model name: DEMO-MultiModelGluonTSModel2020-08-19-05-31-12
Model data Url: https://s3-ap-southeast-2.amazonaws.com/sagemaker-ap-southeast-2-783128296767/demo-multimodel-gluonts-endpoint/models/
Container image: 783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts:latest
Model Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:model/demo-multimodelgluontsmodel2020-08-19-05-31-12


### Create endpoint configuration
Endpoint config creation works the same way it does as single model endpoints.

In [8]:
endpoint_config_name = 'DEMO-MultiModelGluonTSEndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': 'ml.m5.xlarge',
        'InitialInstanceCount': 2,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Endpoint config name: DEMO-MultiModelGluonTSEndpointConfig-2020-08-19-05-31-12
Endpoint config Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:endpoint-config/demo-multimodelgluontsendpointconfig-2020-08-19-05-31-12


### Create the multi model endpoint
Similarly, endpoint creation works the same way as for single model endpoints.

In [9]:
endpoint_name = 'DEMO-MultiModelGluonTSEndpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

Endpoint name: DEMO-MultiModelGluonTSEndpoint-2020-08-19-05-31-12
Endpoint Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:endpoint/demo-multimodelgluontsendpoint-2020-08-19-05-31-12
Endpoint Status: Creating
Waiting for DEMO-MultiModelGluonTSEndpoint-2020-08-19-05-31-12 endpoint to be in service...


## Invoke models
Now we invoke the models that we uploaded to S3 previously. The first invocation of a model may be slow, since behind the scenes, SageMaker is downloading the model artifacts from S3 to the instance and loading it into the container.

### Invoke the Mean Model

First we will prepare two time series as the payload to invoke the model, then call InvokeEndpoint to invoke the Mean model to forecast. The `TargetModel` field is concatenated with the S3 prefix specified in `ModelDataUrl` when creating the model, to generate the location of the model in S3.

In [10]:
def read_data(file_path):
    data = []
    with jsonlines.open(file_path) as reader:
        for obj in reader:
            data.append(obj)
    return data

payload_jsonline = read_data('data/test.json')

In [11]:
n_time_series = 2 # select 2 time series for quick response
payload_list = []
for p in payload_jsonline[:n_time_series]:
    payload_list.append(json.dumps(p))
payload = '\n'.join(payload_list)

In [12]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='MeanPredictor.tar.gz', # this is the rest of the S3 path where the model artifacts are located
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [1.5405118683397176, 1.4554741560240922, 1.3917306274191918, 1.5535203902714856, 1.6868682440218221, 1.481299762167165, 1.6741983168122005, 1.6342995780939114, 1.70775578458321, 1.361224787680375, 1.543907365165602, 1.6231372876795858], "0.2": [1.7843796149207596, 1.8948115589364658, 1.7463301588067242, 2.064156981660703, 1.868239141851494, 1.7077491648392367, 1.9947164546725473, 2.0669598954758994, 2.0160320823484636, 1.7658390815196041, 1.9994900053363778, 1.9714402947370742], "0.3": [2.0142952356632726, 2.120184976345479, 2.0082169203526097, 2.3649007525673227, 2.2718384382275962, 2.080861105867572, 2.1973984203088777, 2.2773634266268363, 2.2461926145843347, 2.0758759234090185, 2.1394578174669703, 2.283098209008817], "0.4": [2.186745166682812, 2.3021799973559554, 2.2074804393102294, 2.5392996678280877, 2.4855614014338636, 2.2890245245649856, 2.3638192419521955, 2.4887149137270512, 2.4618889026138344, 2.2774178452563816, 2.346897183679217, 2.5156

When we invoke the same models a __`2nd`__ time, it is already downloaded to the instance and loaded in the container, so __`inference is faster`__.

In [13]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='MeanPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [1.253290150649065, 1.5168044071218103, 1.6058552647964146, 1.416644839456339, 1.4760739247069175, 1.5949178583152142, 1.5046364888294557, 1.6182047801102133, 1.6813422376595226, 1.5831720115001264, 1.4905019439579128, 1.485792456486161], "0.2": [1.6883829741568546, 1.9559592496099052, 1.9406866875460265, 1.8858503955335346, 1.9056384521332805, 1.9906169079670368, 1.7944211778316737, 1.951096027600538, 1.8786202759916928, 1.9189115919840627, 1.8142189415399481, 1.9217967401626725], "0.3": [2.0337826948358066, 2.1890946619488885, 2.1138589880510317, 2.057806274822087, 2.191305388395044, 2.1336631568457687, 2.1655618649014063, 2.156779627699678, 2.137045801229956, 2.0013479094397364, 2.0042509785857057, 2.1127901772399023], "0.4": [2.4027328475107335, 2.538071579410444, 2.3633799633586343, 2.267904465537704, 2.3578441572073134, 2.476561251246665, 2.3770608015331303, 2.3552567273079217, 2.2626005383177255, 2.296212779806407, 2.4248797478730797, 2.2860

### Invoke other models
Exercising the power of a multi-model endpoint, we can specify different models (e.g., DeepAREstimator.tar.gz) as `TargetModel` and perform inference on it using the same endpoint.

In [14]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='DeepAREstimator.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [2.0058059692382812, 2.0848548412323, 2.1319191455841064, 1.7626827955245972, 2.1028003692626953, 2.1341395378112793, 2.153954029083252, 1.4223262071609497, 1.8531743288040161, 2.1007485389709473, 2.023390293121338, 1.9111007452011108], "0.2": [2.250101089477539, 2.2457194328308105, 2.2871243953704834, 1.9063371419906616, 2.3267104625701904, 2.3346588611602783, 2.345787286758423, 1.5654994249343872, 2.0083627700805664, 2.2628791332244873, 2.159290075302124, 2.080110549926758], "0.3": [2.3288064002990723, 2.4233856201171875, 2.3600692749023438, 1.9544258117675781, 2.4445018768310547, 2.5232415199279785, 2.528930187225342, 1.7129929065704346, 2.116767168045044, 2.4026944637298584, 2.377838134765625, 2.2108047008514404], "0.4": [2.41412615776062, 2.476562738418579, 2.456324577331543, 2.087014675140381, 2.514258623123169, 2.5901424884796143, 2.676973342895508, 1.8229917287826538, 2.2375881671905518, 2.5167603492736816, 2.456094264984131, 2.299031496047

In [15]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='RForecastPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [0.8090870702874826, 0.8428123078492686, 0.4624473413299772, 0.5177015624404087, 0.43006920648953173, 0.19331751739085123, 0.17339511072744607, 0.15734350423240206, 0.3918786635030125, 0.5235862509618792, 0.31944327647862414, -0.01251747265512182], "0.2": [1.5225365628796608, 1.4819382501119645, 1.1533033470218867, 1.0144062634158844, 1.1683039107980289, 0.762261028826825, 0.8815687774245442, 0.9600288129509906, 1.0746018416332428, 1.115368170630708, 1.1169586250187686, 1.183204264275807], "0.3": [1.9913532929619617, 1.7195532084583576, 1.7384239422684757, 1.4819066900965199, 1.7071997854235548, 1.2307796247473433, 1.5698870494702337, 1.7928686549204662, 1.4200078960506284, 1.5720736133402875, 1.5373370171181526, 1.788114767147786], "0.4": [2.256737710810241, 1.9244638423168219, 2.115409957951876, 1.855350419412394, 1.9605222797037896, 1.9739920868919456, 2.1379014667629477, 2.287974580365686, 2.1558390349033236, 2.299462214313219, 2.0782121026859

In [16]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='ProphetPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [0.6400716631554431, 0.2843922656464699, 0.35509664238253213, 1.0394838466459895, 0.6226457211437011, 0.6743560186156927, 0.7125763570702937, 0.03229604267139741, 0.246863504267121, 0.29444244357734917, 0.06849625469506981, 0.2807402284661189], "0.2": [1.3472023714194665, 0.8573461445764494, 0.9799641306312864, 1.3348348389014935, 1.1686285261993905, 1.2323588895931548, 1.4712221515973218, 0.667259860611519, 0.7356308837085632, 0.9453179442253173, 0.7127467437224793, 0.8365200380226373], "0.3": [1.8360639870871602, 1.1487364970339642, 1.2092435466198395, 1.8352871667751556, 1.5464536161065041, 1.5188189134606236, 1.7475541664839103, 0.9833682991092416, 1.1240863302356185, 1.2981067559363946, 1.1173646553754746, 1.0925820292775927], "0.4": [1.9810608094294335, 1.515343451015149, 1.6965688186410537, 2.2989088624958236, 1.9562644114182821, 1.8890014878890877, 1.9905861959816025, 1.3602674903233212, 1.4057228307049074, 1.6968461982271728, 1.7013306239

In [17]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='SeasonalNaivePredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.2": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.3": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.4": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.37944173812866

## A Batch Transform Example
The MMS does not support batch transform directly, to perform batch tranform. We need to create models seperately in Sagemaker, and do the batch transform for each model one by one. Below shows an example of how to do batch transoform for one model.

In [18]:
from time import gmtime, strftime

model = 'RForecastPredictor'
model_name_bt = 'DEMO-GluonTSModel-{}-'.format(model) + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 'https://s3-{}.amazonaws.com/{}/{}/{}/{}.tar.gz'.format(region, bucket, prefix, models_dir, model)
container = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, 'demo-sagemaker-multimodel-gluonts')

print('Model name: ' + model_name_bt)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'SingleModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name_bt,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

Model name: DEMO-GluonTSModel-RForecastPredictor-2020-08-19-05-40-05
Model data Url: https://s3-ap-southeast-2.amazonaws.com/sagemaker-ap-southeast-2-783128296767/demo-multimodel-gluonts-endpoint/models/RForecastPredictor.tar.gz
Container image: 783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts:latest
Model Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:model/demo-gluontsmodel-rforecastpredictor-2020-08-19-05-40-05


In [19]:
test_data_s3_path = "s3://{}/{}/data/test.json".format(bucket, prefix)

In [20]:
transform_job_name = 'DEMO-GluonTS-{}-BT-'.format(model) + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

transform_input = {
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': test_data_s3_path
            }
        },
        'ContentType': 'application/json',
        'CompressionType': 'None',
        'SplitType': 'Line'
    }

transform_output = {
        'S3OutputPath': 's3://{}/{}/inference-results/{}'.format(bucket,prefix, model),
    }

transform_resources = {
        'InstanceType': 'ml.m5.xlarge',
        'InstanceCount': 1
    }

sm_client.create_transform_job(TransformJobName = transform_job_name,
                        ModelName = model_name_bt,
                        BatchStrategy='SingleRecord',
                        TransformInput = transform_input,
                        TransformOutput = transform_output,
                        TransformResources = transform_resources
)

{'TransformJobArn': 'arn:aws:sagemaker:ap-southeast-2:783128296767:transform-job/demo-gluonts-rforecastpredictor-bt-2020-08-19-05-40-05',
 'ResponseMetadata': {'RequestId': '26765de0-b2de-44e2-b099-8b69170deb33',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '26765de0-b2de-44e2-b099-8b69170deb33',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '136',
   'date': 'Wed, 19 Aug 2020 05:40:05 GMT'},
  'RetryAttempts': 0}}

In [21]:
print ('JobStatus')
print('----------')
from time import sleep

describe_response = sm_client.describe_transform_job(TransformJobName = transform_job_name)
job_run_status = describe_response['TransformJobStatus']
print (job_run_status)

while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm_client.describe_transform_job(TransformJobName = transform_job_name)
    job_run_status = describe_response['TransformJobStatus']
    print (job_run_status)
    sleep(30)

JobStatus
----------
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
Completed


### Inspect Batch Transform Results

In [28]:
s3_client = boto3.client('s3')
s3_client.download_file(Filename='data/test.json.out',
                        Bucket=bucket,
                        Key='{}/inference-results/{}/test.json.out'.format(prefix, model))
test_out_jsonline = read_data('data/test.json.out')
print(test_out_jsonline[:2])

[{'item_id': None, 'quantiles': {'0.1': [0.8739945142644148, 0.48590547682888907, 0.3495625716857571, 0.1881405425184468, 0.4120799826333963, 0.0887970798999555, 0.29328429101392084, -0.08477926123025958, -0.4279553426660536, -0.13218306572573635, -0.5106247937961955, -0.3484832050003658], '0.2': [1.1991201764912163, 1.1606712157313253, 0.9725920532317711, 0.8718231301022143, 0.823875705284148, 0.6846382892416263, 0.8540394642411657, 0.38673848116402554, 0.3677199915723022, 0.4581388006244055, 0.5384592225328939, 0.43358834066061624], '0.3': [1.479926211688551, 1.4538570885019322, 1.331539427810686, 1.2532369983806533, 1.4616627239928386, 1.276972950323639, 1.1884392369790824, 0.9636320417985633, 0.836005123010022, 0.8733827141425876, 1.2203907694739538, 0.7285377823797798], '0.4': [2.016449010383011, 1.9753166138911564, 1.6359777629053787, 1.9290972137991216, 1.8496370305016425, 1.7190062168178777, 1.9021125357740383, 1.5930178439096165, 1.6886204575683743, 1.5072896969056522, 1.87032

# 9. Clean up the resources

## (Optional) Delete the hosting resources

In [29]:
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=model_name)
sm_client.delete_model(ModelName=model_name_bt)

{'ResponseMetadata': {'RequestId': 'c1a2e521-143e-420d-b37b-cfd83eaf6ce7',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'c1a2e521-143e-420d-b37b-cfd83eaf6ce7',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Wed, 19 Aug 2020 05:50:00 GMT'},
  'RetryAttempts': 2}}

# 10. Conclusion

Time series data is a highly valuable data source to various businesses, and the ability to forecast such data is critial to making optimal and accurate business decisions. In stead of using AWS build-in services or algorithms, this tutorial has demonstated how to use AWS Sagemaker to build your own custom algorithm to do forecast, and deploy multiple forecast models into one Sagemaker endpoint. This will facilitate businesses to compare state-of-the-art algorithms more efficientily and effectively, and enable the possiblilities to do smarter decisions based on the forecast.

We have covered other use cases related to time series data as well, you can find other topics below:

- Forecast air pollution with SageMaker processing and the AWS Open Data Registry by Eric Greene
- Automate sales projections with Amazon Forecast, QuickSight and AWS Lambda by Yoshiyuki Ito
- Detect DDoS Attacks with Kineses Data Streams and SageMaker Isolation Forest by Seongmoon Kang