<h1>Multi Model Server Container</h1>

This notebook demonstrates how to build and use a custom Docker container for serving with Amazon SageMaker that leverages on the <strong>Multi Model Server (MMS)</strong> and <strong>sagemaker-inference-toolkit</strong> libraries for serving models through Amazon SageMaker's endpoints.
We will also see how MMS allows deploying multiple models on a single endpoint thanks to the multi-model endpoints functionality of Amazon SageMaker Hosting (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html).

Useful links:
- https://github.com/awslabs/multi-model-server/
- https://github.com/aws/sagemaker-inference-toolkit

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role

ecr_namespace = 'sagemaker-serving-containers/'
prefix = 'multi-model-server-container'

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)

<h2>Create the custom serving container</h2>

Let's take a look at the Dockerfile which defines the statements for building our serving container:

In [None]:
! pygmentize ../docker/Dockerfile

At high-level the Dockerfile specifies the following operations for building this container:

- Set two Docker labels to advertise multi-model support and to enable the container using the SAGEMAKER_BIND_TO_PORT environment variable, if present
- Install libraries (including OpenJDK since MMS frontend is Java-based) and Python 3.6 through miniconda
- Set e few environment variables, including PYTHONUNBUFFERED which is used to avoid buffering Python standard output (useful for logging)
- Install XGBoost (it is the ML framework of choice for this example)
- Install Multi Model Server (MMS) and SageMaker Inference Toolkit
- Copy a .tar.gz package named <strong>multi_model_serving-1.0.0.tar.gz</strong> in the WORKDIR
- Install this package
- Copy the serve.py file in the WORKDIR and use it as the Docker ENTRYPOINT

<h3>Build and push the container</h3>
We are now ready to build this container and push it to Amazon ECR. This task is executed using a shell script stored in the ../script/ folder. Let's take a look at this script and then execute it.

In [None]:
! pygmentize ../scripts/build_and_push.sh

<h3>--------------------------------------------------------------------------------------------------------------------</h3>

The script builds the Docker container, then creates the repository if it does not exist, and finally pushes the container to the ECR repository. The build task requires a few minutes to be executed the first time, then Docker caches build outputs to be reused for the subsequent build operations.

In [None]:
%%capture
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

<h2>Deploy with Amazon SageMaker</h2>


<h3>Get the container URI</h3>
Once we have correctly pushed our container to Amazon ECR, we are ready to deploy with Amazon SageMaker, which requires the ECR path to the Docker container used for serving as parameter for deployment.

In [None]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(container_image_uri)

<h3>Prepare two models</h3>

We are going to deploy two different XGBoost models to our model server. We will need the serialized models and the inference scripts that we want to use.
We will store them in the current notebook folder, under <strong>model_and_code_1/</strong> and <strong>model_and_code_2/</strong>.

The purpose of using different models is to show that you can also deploy models that require diverse features and pre/post processing code.

First model is a regression model trained on the [Abalone data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html) originally from the [UCI data repository](https://archive.ics.uci.edu/ml/datasets/abalone).
For further information, please refer to this [example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb).

Second model is a binary classification model built by following this workshop: https://github.com/aws-samples/amazon-sagemaker-build-train-deploy

In [None]:
! rm -rf ./model_and_code_1/.ipynb_checkpoints
! rm -rf ./model_and_code_1/code/.ipynb_checkpoints
! rm -rf ./model_and_code_2/.ipynb_checkpoints
! rm -rf ./model_and_code_2/code/.ipynb_checkpoints

! tar -C ./model_and_code_1/ -cvzf model1.tar.gz ./
! tar -C ./model_and_code_2/ -cvzf model2.tar.gz ./

<h3>Deploy a single model</h3>

In [None]:
s3_model_path = 's3://{0}/{1}/model/model1.tar.gz'.format(bucket, prefix)
!aws s3 cp model1.tar.gz {s3_model_path}

In [None]:
from sagemaker.model import Model
from time import gmtime, strftime

model_name = 'multi-model-server-model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model = Model(name = model_name,
              model_data = s3_model_path,
              image_uri = container_image_uri,
              role=role,
              env = {
                  'SAGEMAKER_PROGRAM': 'predictor'
              },
              predictor_cls = sagemaker.predictor.Predictor,
              #sagemaker_session=sagemaker_session #comment this line for local mode.
              )

<strong>Note:</strong> the environment variable SAGEMAKER_PREDICTOR is used to specify the name of the custom inference script.

In [None]:
endpoint_name = 'multi-model-server-single-ep-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
pred = model.deploy(initial_instance_count=1,
                    instance_type='local',
                    endpoint_name=endpoint_name)

In [None]:
from sagemaker.predictor import Predictor

pred.serializer = sagemaker.serializers.CSVSerializer()
item = '77,33,143.0,101,212.2,102,104.9,120,15.3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1'
pred.predict(item)

In [None]:
pred.delete_endpoint()
pred.delete_model()

<h3>Deploy multiple models</h3>

In [None]:
model_data_prefix = 's3://{0}/{1}/modeldata'.format(bucket, prefix)

s3_model_1_path = model_data_prefix + '/model1.tar.gz'
!aws s3 cp model1.tar.gz {s3_model_1_path}
s3_model_2_path = model_data_prefix + '/model2.tar.gz'
!aws s3 cp model2.tar.gz {s3_model_2_path}

In [None]:
from time import gmtime, strftime
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.model import Model

model_name = 'multi-model-server-multidatamodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model = Model(name = model_name,
              model_data = '',
              image_uri = container_image_uri,
              role=role,
              env = {
                  'SAGEMAKER_PROGRAM': 'predictor'
              },
              predictor_cls = sagemaker.predictor.Predictor,
              sagemaker_session=sagemaker_session)

multi_model = MultiDataModel(name = model_name,
                             model_data_prefix = model_data_prefix,
                             model = model,
                             sagemaker_session=sagemaker_session)

<strong>Note:</strong> the environment variable SAGEMAKER_PREDICTOR is used to specify the name of the custom inference script.

In [None]:
endpoint_name = 'multi-model-server-ep-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)

pred = multi_model.deploy(initial_instance_count=1,
                          instance_type='ml.m5.xlarge',
                          endpoint_name=endpoint_name)

<h3>Executing inferences</h3>
Once the multi-model endpoint is ready, we can invoke either model1 or model2 by changing the target_model variable in the predict() function call.

In [None]:
from sagemaker.predictor import Predictor
pred.serializer = sagemaker.serializers.CSVSerializer()

In [None]:
item = '77,33,143.0,101,212.2,102,104.9,120,15.3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1'
model_archive = '/model1.tar.gz'
pred.predict(item, target_model=model_archive)

In [None]:
item = '0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,73.0,79.0,32.0,27.0,45.0,48.0,13.0,62.0'
model_archive = '/model2.tar.gz'
pred.predict(item, target_model=model_archive)

In [None]:
pred.delete_endpoint()
pred.delete_model()