<h1>Multi Model Server Container for Differential Deep Learning</h1>

Jupyter install
sudo /opt/conda/bin/conda install ipython jupyter 
/opt/conda/bin/jupyter notebook --ip=0.0.0.0 --port=8080 --no-browser

This notebook demonstrates how to build and use a custom Docker container for serving with Amazon SageMaker that leverages on the <strong>Multi Model Server (MMS)</strong> and <strong>sagemaker-inference-toolkit</strong> libraries for serving models through Amazon SageMaker's endpoints.
We will also see how MMS allows deploying multiple models on a single endpoint thanks to the multi-model endpoints functionality of Amazon SageMaker Hosting (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html).

Useful links:
- https://github.com/awslabs/multi-model-server/
- https://github.com/aws/sagemaker-inference-toolkit

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

The whole deployment takes 10-12min.

In [3]:
import boto3
import sagemaker
from sagemaker import get_execution_role

isGPU = True
ecr_namespace = 'sagemaker-serving-containers/'
prefix = 'diffdl-container'

ecr_repository_name = ecr_namespace + prefix + ('-gpu' if isGPU else '')
role = get_execution_role()
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)
print(ecr_repository_name)

785577973223
us-east-1
arn:aws:iam::785577973223:role/service-role/AmazonSageMaker-ExecutionRole-20210715T110490
sagemaker-us-east-1-785577973223
sagemaker-serving-containers/diffdl-container-gpu


<h2>Build and push the container</h2>
We are now ready to build this container and push it to Amazon ECR. This task is executed using a shell script stored in the ../script/ folder. Let's take a look at this script and then execute it.

The script builds the Docker container, then creates the repository if it does not exist, and finally pushes the container to the ECR repository. The build task requires a few minutes to be executed the first time, then Docker caches build outputs to be reused for the subsequent build operations.

In [58]:
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name $isGPU

running sdist
running egg_info
writing src/multi_model_serving.egg-info/PKG-INFO
writing dependency_links to src/multi_model_serving.egg-info/dependency_links.txt
writing top-level names to src/multi_model_serving.egg-info/top_level.txt
reading manifest file 'src/multi_model_serving.egg-info/SOURCES.txt'
writing manifest file 'src/multi_model_serving.egg-info/SOURCES.txt'

running check

creating multi_model_serving-1.0.0
creating multi_model_serving-1.0.0/src
creating multi_model_serving-1.0.0/src/multi_model_serving
creating multi_model_serving-1.0.0/src/multi_model_serving.egg-info
copying files to multi_model_serving-1.0.0...
copying setup.py -> multi_model_serving-1.0.0
copying src/multi_model_serving/__init__.py -> multi_model_serving-1.0.0/src/multi_model_serving
copying src/multi_model_serving/blackscholes.py -> multi_model_serving-1.0.0/src/multi_model_serving
copying src/multi_model_serving/default_inference_handler.py -> multi_model_serving-1.0.0/src/multi_model_serving
copy

<h2>Deploy with Amazon SageMaker</h2>


<h3>Get the container URI</h3>
Once we have correctly pushed our container to Amazon ECR, we are ready to deploy with Amazon SageMaker, which requires the ECR path to the Docker container used for serving as parameter for deployment.

In [4]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(container_image_uri)

785577973223.dkr.ecr.us-east-1.amazonaws.com/sagemaker-serving-containers/diffdl-container-gpu:latest


Test the image locally `docker run -it -p 3030:8080 --rm 785577973223.dkr.ecr.us-east-1.amazonaws.com/sagemaker-serving-containers/diffdl-container:latest`.

<h3>Prepare two models</h3>

We are going to deploy two different XGBoost models to our model server. We will need the serialized models and the inference scripts that we want to use.
We will store them in the current notebook folder, under <strong>model_and_code_1/</strong> and <strong>model_and_code_2/</strong>.

The purpose of using different models is to show that you can also deploy models that require diverse features and pre/post processing code.

First model is a regression model trained on the [Abalone data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html) originally from the [UCI data repository](https://archive.ics.uci.edu/ml/datasets/abalone).
For further information, please refer to this [example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb).

Second model is a binary classification model built by following this workshop: https://github.com/aws-samples/amazon-sagemaker-build-train-deploy

In [5]:
! rm -rf ./model_and_code_1/.ipynb_checkpoints
! rm -rf ./model_and_code_1/code/.ipynb_checkpoints
! rm -rf ./model_and_code_2/.ipynb_checkpoints
! rm -rf ./model_and_code_2/code/.ipynb_checkpoints

! tar -C ./model_and_code_1/ -cvzf model1.tar.gz ./
! tar -C ./model_and_code_2/ -cvzf model2.tar.gz ./

./
./xgboost-model
./code/
./code/predictor.py
./
./model.bin
./code/
./code/predictor.py


<h3>Deploy multiple models</h3>

In [6]:
model_data_prefix = 's3://{0}/{1}/modeldata'.format(bucket, prefix)

s3_model_1_path = model_data_prefix + '/model1.tar.gz'
!aws s3 cp model1.tar.gz {s3_model_1_path}
s3_model_2_path = model_data_prefix + '/model2.tar.gz'
!aws s3 cp model2.tar.gz {s3_model_2_path}

upload: ./model1.tar.gz to s3://sagemaker-us-east-1-785577973223/diffdl-container/modeldata/model1.tar.gz
upload: ./model2.tar.gz to s3://sagemaker-us-east-1-785577973223/diffdl-container/modeldata/model2.tar.gz


In [9]:
from time import gmtime, strftime
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.model import Model

model_name = 'multi-model-server-multidatamodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model = Model(name = model_name,
              model_data = '',
              image_uri = container_image_uri,
              role=role,
              env = {
                  'SAGEMAKER_PROGRAM': 'predictor'
              },
              #predictor_cls = sagemaker.predictor.Predictor,
              sagemaker_session=sagemaker_session)

multi_model = MultiDataModel(name = model_name,
                             model_data_prefix = model_data_prefix,
                             model = model,
                             sagemaker_session=sagemaker_session)

<strong>Note:</strong> the environment variable SAGEMAKER_PREDICTOR is used to specify the name of the custom inference script.

In [None]:
multi_endpoint_name = 'multi-model-server-ep-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
multi_endpoint_name = multi_endpoint_name + ('-gpu' if isGPU else '')
print(multi_endpoint_name)

# ml.m5.xlarge = 4vCPU/16GB; ml.g4dn.xlarge = 4vCPU/16GB.
instance = 'ml.g4dn.xlarge' if isGPU else 'ml.m5.xlarge'
print(instance)
# GPU instances can't be used for multi-model endpoints! 
model_to_deploy = model if isGPU else multi_model
pred = model_to_deploy.deploy(initial_instance_count=1,
                          instance_type=instance,
                          endpoint_name=multi_endpoint_name)

multi-model-server-ep-2021-07-20-12-02-45-gpu
ml.g4dn.xlarge
-------

<h3>Executing inferences</h3>

In [44]:
from IPython.display import Markdown as md
md("Go to `scripts` and execute `python3 test_endpoint.py -e {}`".format(multi_endpoint_name))

Go to `scripts` and execute `python3 test_endpoint.py -e multi-model-server-ep-2021-07-16-17-28-36`

--------------------------------------

Once the multi-model endpoint is ready, we can invoke either model1 or model2 by changing the target_model variable in the predict() function call.

In [9]:
from sagemaker.predictor import Predictor
pred = Predictor(multi_endpoint_name)
pred.serializer = sagemaker.serializers.CSVSerializer()

In [10]:
item = '77,33,143.0,101,212.2,102,104.9,120,15.3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1'
model_archive = '/model1.tar.gz'
pred.predict(item, target_model=model_archive)
#pred.predict(item)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'xgboost'
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 110, in transform
    self.validate_and_initialize(model_dir=model_dir)
  File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 158, in validate_and_initialize
    self._model = self._model_fn(model_dir)
  File "/opt/ml/models/e4c005b7fa0c40203fa9b3bbcf0b28cf/model/code/predictor.py", line 6, in model_fn
    model = pkl.load(open(model_file, 'rb'))
ModuleNotFoundError: No module named 'xgboost'
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/multi-model-server-ep-2021-07-16-14-25-28 in account 785577973223 for more information.

In [None]:
item = '0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,73.0,79.0,32.0,27.0,45.0,48.0,13.0,62.0'
model_archive = '/model2.tar.gz'
pred.predict(item, target_model=model_archive)

In [None]:
pred.delete_endpoint()
pred.delete_model()