## Build an Consum Inference Container

his notebook demonstrates how to build and use a custom Docker container for serving with Amazon SageMaker that leverages on <strong>sagemaker-inference-toolkit</strong> libraries for serving models through Amazon SageMaker's endpoints.


Useful links:
- https://github.com/awslabs/multi-model-server/
- https://github.com/aws/sagemaker-inference-toolkit

- https://github.com/aws-samples/amazon-sagemaker-mask-r-cnn-pytorch/blob/master/MaskRCNN_bring_your_own.ipynb

We start by defining some variables like the current execution role, the ECR repository that we are going to use for pushing the custom Docker container and a default Amazon S3 bucket to be used by Amazon SageMaker.

In [19]:
import boto3
import sagemaker
from sagemaker import get_execution_role

ecr_namespace = 'sagemaker-serving-containers/'
prefix = 'medical-image-server-container'

ecr_repository_name = ecr_namespace + prefix
role = get_execution_role()
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
bucket = sagemaker_session.default_bucket()

print(account_id)
print(region)
print(role)
print(bucket)

707754867495
us-east-1
arn:aws:iam::707754867495:role/SageMakerAPIExecutionRoleName-707754867495
sagemaker-us-east-1-707754867495


In [2]:
! pygmentize Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m
[37m# This is a Python 3 image that uses the nginx, gunicorn, flask stack[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m

[34mFROM[39;49;00m [33mubuntu:16.04[39;49;00m

[34mRUN[39;49;00m apt-get -y update && apt-get install -y --no-install-recommends [33m\[39;49;00m
         wget [33m\[39;49;00m
         gcc[33m\[39;49;00m
         g++[33m\[39;49;00m
         python3 [33m\[39;49;00m
         python3-dev[33m\[39;49;00m
         nginx [33m\[39;49;00m
         ca-certificates [33m\[39;49;00m
    && rm -rf /var/lib/apt/lists/*

[37m# Here we get all python packages.[39;49;00m
[37m# There's substantial overlap between scipy and numpy that we eliminate by[39;49;00m
[37m# linking them together. Likewise, pip leaves the install caches populated which uses[39;49;00m
[37m# a significant amount of space. These optimizations save a fair amount of s

In [3]:
! pygmentize build_and_push.sh

[37m#!/usr/bin/env bash[39;49;00m

[37m# This script shows how to build the Docker image and push it to ECR to be ready for use[39;49;00m
[37m# by SageMaker.[39;49;00m

[37m# The argument to this script is the image name. This will be used as the image on the local[39;49;00m
[37m# machine and combined with the account and region to form the repository name for ECR.[39;49;00m
[31mimage[39;49;00m=[31m$1[39;49;00m
[36mecho[39;49;00m [33m${[39;49;00m[31mimage[39;49;00m[33m}[39;49;00m



[34mif[39;49;00m [ [33m"[39;49;00m[31m$image[39;49;00m[33m"[39;49;00m == [33m""[39;49;00m ]
[34mthen[39;49;00m
    [36mecho[39;49;00m [33m"[39;49;00m[33mUsage: [39;49;00m[31m$0[39;49;00m[33m <image-name>[39;49;00m[33m"[39;49;00m
    [36mexit[39;49;00m [34m1[39;49;00m
[34mfi[39;49;00m

[37m# provide access to the local folder[39;49;00m
chmod +x Image_Inference

[37m# Get the account number associated with the current IAM credentials

In [5]:
prefix = 'medical-image-server-container'

In [3]:
%%capture
!build_and_push.sh $prefix 

In [6]:
!aws ecr list-images \
    --repository-name $prefix

{
    "imageIds": [
        {
            "imageDigest": "sha256:94a17bc8ed66f0440f043f1006411530f9c35744a66b91a4184b26d36e65c619",
            "imageTag": "latest"
        }
    ]
}


## Use the image for prediction 

find the model artifact in S3

In [11]:
s3_model_path = 's3://sagemaker-us-east-1-707754867495/pytorch-training-2022-01-27-07-48-30-122/output/model.tar.gz'

find the image uri from ECR 

In [9]:
container_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, prefix)

In [10]:
role

'arn:aws:iam::707754867495:role/SageMakerAPIExecutionRoleName-707754867495'

In [12]:
from time import gmtime, strftime
from sagemaker.model import Model

model_name = 'medical-image-model-server-model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model = Model(model_data = s3_model_path,
              image_uri = container_image_uri,
              env = {
                  'SAGEMAKER_PROGRAM': 'predictor'
              },
              role=role,
              name = model_name,
              predictor_cls = sagemaker.predictor.Predictor,
              #sagemaker_session=sagemaker_session #comment this line for local mode.
             )

In [17]:
sagemaker_client = boto3.client('sagemaker', region_name='us-east-1')
                                
model_name = 'medical-image-model-server-model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

create_model_response = sagemaker_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = {
        'Image': container_image_uri,
        'ModelDataUrl': s3_model_path,
    })

In [18]:
create_model_response

{'ModelArn': 'arn:aws:sagemaker:us-east-1:707754867495:model/medical-image-model-server-model-2022-02-23-06-29-26',
 'ResponseMetadata': {'RequestId': '6093f743-7d7e-4be5-b916-59e2a4f430e8',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '6093f743-7d7e-4be5-b916-59e2a4f430e8',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '114',
   'date': 'Wed, 23 Feb 2022 06:29:25 GMT'},
  'RetryAttempts': 0}}

In [20]:
bucket_prefix = 'Inference_output'
bucket = sagemaker_session.default_bucket()

In [None]:
## create a endpoint configure 

In [42]:
import datetime
from time import gmtime, strftime

# Create an endpoint config name. Here we create one based on the date  
# so it we can search endpoints based on creation time.
endpoint_config_name = f"MedicalImageEndpointConfig-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

# The name of the model that you want to host. This is the name that you specified when creating the model.
model_name='pytorch-inference-2022-01-27-08-48-43-106'

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name, # You will specify this name in a CreateEndpoint request.
    # List of ProductionVariant objects, one for each model that you want to host at this endpoint.
    ProductionVariants=[
        {
            "VariantName": "variant1", # The name of the production variant.
            "ModelName": model_name, 
            "InstanceType": "ml.m5.xlarge", # Specify the compute instance type.
            "InitialInstanceCount": 1 # Number of instances to launch initially.
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            # Location to upload response outputs when no location is provided in the request.
            "S3OutputPath": f"s3://{bucket}/{bucket_prefix}/output",
            # (Optional) specify Amazon SNS topics
            
        },
        "ClientConfig": {
            # (Optional) Specify the max number of inflight invocations per instance
            # If no value is provided, Amazon SageMaker will choose an optimal value for you
            "MaxConcurrentInvocationsPerInstance": 4
        }
    }
)

print(f"Created EndpointConfig: {create_endpoint_config_response['EndpointConfigArn']}")

Created EndpointConfig: arn:aws:sagemaker:us-east-1:707754867495:endpoint-config/medicalimageendpointconfig-2022-02-23-08-19-43


In [43]:
endpoint_name = 'AsynchronousMedicalInference3' 

# The name of the endpoint configuration associated with this endpoint.

create_endpoint_response = sagemaker_client.create_endpoint(
                                            EndpointName=endpoint_name, 
                                            EndpointConfigName=endpoint_config_name,
                                           ) 

In [44]:
endpoint_name

'AsynchronousMedicalInference3'

In [59]:
## invoke the endpoint
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name='us-east-1')
input_location = 's3://sagemaker-us-east-1-707754867495/inference_input/test-2.json'
response = sagemaker_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name,
    InputLocation=input_location
)

In [60]:
response

{'ResponseMetadata': {'RequestId': '0b1e8cca-c5a0-4920-a093-2cefa7a50929',
  'HTTPStatusCode': 202,
  'HTTPHeaders': {'x-amzn-requestid': '0b1e8cca-c5a0-4920-a093-2cefa7a50929',
   'x-amzn-sagemaker-outputlocation': 's3://sagemaker-us-east-1-707754867495/Inference_output/output/f99f3083-c41a-4deb-8dce-d8c9b759138a.out',
   'date': 'Wed, 23 Feb 2022 09:34:25 GMT',
   'content-type': 'application/json',
   'content-length': '54'},
  'RetryAttempts': 0},
 'OutputLocation': 's3://sagemaker-us-east-1-707754867495/Inference_output/output/f99f3083-c41a-4deb-8dce-d8c9b759138a.out',
 'InferenceId': 'd2c2d7d9-1179-4d9c-b90f-26c6c8b063b4'}