<h1>Nginx Gunicorn Flask Serving Container</h1>

This notebook demonstrates how to build and use a custom Docker container for serving with Amazon SageMaker that leverages on the serving stack provided by the sagemaker-containers library (Nginx-Gunicorn-Flask). Reference documentation is available at https://github.com/aws/sagemaker-containers.

We will use this serving stack to serve an XGBoost model created from the Customer Churn SageMaker example available <a href="https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.ipynb">here</a>.

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role

ecr_namespace = 'sagemaker-serving-containers/'
prefix = 'nginx-gunicorn-flask-container'

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)

Let's take a look at the Dockerfile which defines the statements for building our serving container:

In [None]:
! pygmentize ../docker/Dockerfile

At high-level the Dockerfile specifies the following operations for building this container:

<ul>
    <li>Start from Python 3.7.3</li>
    <li>Install nginx and other dependencies.</li>
    <li>Install XGBoost.</li>
    <li>Set e few environment variables, including PYTHONUNBUFFERED which is used to avoid buffering Python standard output (useful for logging)</li>
    <li>Copy a .tar.gz package named <strong>nginx_gunicorn_flask_serving-1.0.0.tar.gz</strong> in the WORKDIR</li>
    <li>Install this package.</li>
    <li>Finally, set the value of the environment variable <strong>SAGEMAKER_SERVING_MODULE</strong> to a Python module defined in the package we installed.</li>
</ul>

<h3>Build and push the container</h3>
We are now ready to build this container and push it to Amazon ECR. This task is executed using a shell script stored in the ../script/ folder. Let's take a look at this script and then execute it.

In [None]:
! pygmentize ../scripts/build_and_push.sh

<h3>--------------------------------------------------------------------------------------------------------------------</h3>

The script builds the Docker container, then creates the repository if it does not exist, and finally pushes the container to the ECR repository. The build task requires a few minutes to be executed the first time, then Docker caches build outputs to be reused for the subsequent build operations.

In [None]:
! ../scripts/build_and_push.sh $account_id $region $ecr_repository_name

<h3>Hosting with Amazon SageMaker</h3>

Once we have correctly pushed our container to Amazon ECR, we are ready to deploy with Amazon SageMaker, which requires the ECR path to the Docker container used for serving as parameter for deployment.

In [None]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print(container_image_uri)

In addition, we need to provide XGBoost model artifacts. For the purpose of this example, we will deploy a regrssion model trained on the [Abalone data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html) originally from the [UCI data repository](https://archive.ics.uci.edu/ml/datasets/abalone).
For further information, please refer to this [example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb).

In [None]:
s3_model_path = 's3://{0}/{1}/model/model.tar.gz'.format(bucket, prefix)
!aws s3 cp model.tar.gz {s3_model_path}

To deploy our model, in addition to the container we need to provide the model location in Amazon S3 and the code that will be used to load this model and execute inferences.
Using sagemaker-containers, the container is able to dynamically load the inference code from a package saved to Amazon S3. The Amazon SageMaker Python SDK is responsible to create this package (.tar.gz archive) and upload to S3 (**code_location**), given the entry point (**entry_point**) and a source directory (**source_dir**).

In addition, we are passing a dictionary of variables that will be set as environment variables at run-time by the sagemaker-containers library.

In [None]:
from sagemaker.model import FrameworkModel
from time import gmtime, strftime

model_name = 'nginx-gunicorn-flask-serving-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

s3_model_path = 's3://{0}/{1}/model/model.tar.gz'.format(bucket, prefix)
s3_code_location = 's3://{0}/{1}/code'.format(bucket, prefix)

model = FrameworkModel(name = model_name,
                       model_data = s3_model_path,
                       image_uri = container_image_uri,
                       role=role,
                       entry_point='predictor.py',
                       source_dir='source_dir/',
                       env = {
                           'SAGEMAKER_USE_NGINX': 'true',
                           'SAGEMAKER_MODEL_SERVER_WORKERS': '1',
                           'SAGEMAKER_MODEL_SERVER_TIMEOUT': '300'
                       },
                       predictor_cls = sagemaker.predictor.Predictor,
                       code_location=s3_code_location,
                       #sagemaker_session=sagemaker_session #comment this line for local mode.
                      )

Note that the **SAGEMAKER_USE_NGINX** variable allows to enable or disable using nginx based on needs: if nginx is disabled, the inference endpoints will be exposed by gunicorn.

In [None]:
endpoint_name = 'nginx-gunicorn-flask-serving-ep-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)

pred = model.deploy(initial_instance_count=1,
                    instance_type='local',
                    endpoint_name=endpoint_name)

In [None]:
from sagemaker.predictor import Predictor
pred.serializer = sagemaker.serializers.CSVSerializer()

item = '77,33,143.0,101,212.2,102,104.9,120,15.3,4,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1'
pred.predict(item)

In [None]:
pred.delete_endpoint()