<h1>Deployment of a GPU-based SageMaker inference endpoint </h1>

First let us build the container:

In [1]:
ecr_namespace = 'sagemaker-serving-containers/'
prefix = 'diffdl-container-gpu'
container_name = ecr_namespace + prefix
print(container_name)

sagemaker-serving-containers/diffdl-container-gpu


In [6]:
!pushd container && chmod +x ./build_and_push.sh && ./build_and_push.sh $container_name && popd

~/environment/notebooks/sagemaker-container/gpu/container ~/environment/notebooks/sagemaker-container/gpu
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  74.24kB
Step 1/10 : FROM tensorflow/tensorflow:latest-gpu
 ---> 8b9d78381e5d
Step 2/10 : RUN apt-get update &&     apt-get -y install         apt-utils         build-essential         libatlas-base-dev          git         wget         curl         nginx         ca-certificates
 ---> Using cache
 ---> f2800fe1746d
Step 3/10 : RUN apt-get clean
 ---> Using cache
 ---> 39039f27d655
Step 4/10 : RUN pip --no-cache-dir install -U pip
 ---> Using cache
 ---> 7350cdf5fa68
Step 5/10 : RUN pip --no-cache-dir install flask gevent gunicorn tensorflow numpy scipy
 ---> Using cache
 ---> 84045bb03397
Step 6/10 : ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8
 ---> Using cache
 ---> aee5ea76e0d3
Step 7/10 : ENV

Smoke test the start of the `serve` endpoint locally:

In [2]:
from IPython.display import Markdown as md
md(f"`chmod +x ./container/local_test/serve_local.sh && ./container/local_test/serve_local.sh {container_name}`")

`chmod +x ./container/local_test/serve_local.sh && ./container/local_test/serve_local.sh sagemaker-serving-containers/diffdl-container-gpu`

Deploy the endpoint

In [3]:
import boto3

session = boto3.session.Session()
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = session.region_name
container_image_uri = f"{account_id}.dkr.ecr.{region}.amazonaws.com/{container_name}:latest"

print(account_id)
print(region)
print(container_image_uri)

785577973223
us-east-1
785577973223.dkr.ecr.us-east-1.amazonaws.com/sagemaker-serving-containers/diffdl-container-gpu:latest


In [4]:
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sagemaker.session.Session()

In [5]:
from time import gmtime, strftime
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.model import Model

model_name = 'diffdl-container-gpu-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model = Model(name = model_name,
              model_data = '',
              image_uri = container_image_uri,
              role=role,
              env = {
                  'SAGEMAKER_PROGRAM': 'predictor'
              },
              sagemaker_session=sagemaker_session)

In [8]:
endpoint_name = 'diffdl-container-gpu-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)

# ml.m5.xlarge = 4vCPU/16GB; ml.g4dn.xlarge = 4vCPU/16GB.
instance = 'ml.g4dn.xlarge'
pred = model.deploy(initial_instance_count=1,
                          instance_type=instance,
                          endpoint_name=endpoint_name)

diffdl-container-gpu-2021-07-21-15-36-48
-----------------!

Test the inference endpoint:

In [16]:
!python3 ../scripts/test_endpoint_gpu.py -e $endpoint_name

using seed 9771
simulating training, valid and test sets
done
Sent payload size: 48576
Received payload size: 4278
[0.004021836546851468, 0.0036974069904979617, 0.003369250334126092, 0.003037742510014374, 0.00270332389597476, 0.0023665637608861617, 0.0020280958191608855, 0.0016886826762782248, 0.0013492158287843614, 0.0010107156642923665, 0.0006744066479379696, 0.0003416206540793054, 1.3979561975116517e-05, -0.0003067551171267585, -0.0006185963126113248, -0.0009191917625073348, -0.0012059743863987032, -0.001476087098968737, -0.0017262431780110526, -0.0019528551554964385, -0.0021519166674282275, -0.0023189809719977444, -0.0024491394677399053, -0.0025369787298441526, -0.0025765912510767874, -0.002561489514402965, -0.0024846382157534264, -0.0023383898184910767, -0.002114463071566458, -0.0018040181959735058, -0.0013975709573715162, -0.0008850034070074786, -0.0002556175863272503, 0.0005018107684132272, 0.001399032226147956, 0.0024482484745450978, 0.00366199416986239, 0.005053061750491389, 0