##### NOTE: This example has been updated to work with SageMaker SDK 2.x which introduces breaking changes. Make sure you upgrade SageMaker SDK using the commands below

In [1]:
!pip install --upgrade pip
!pip -q install sagemaker awscli boto3 pandas --upgrade 

Collecting pip
  Using cached pip-20.2.2-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-20.2.2


## Example: PyTorch deployments using TorchServe and Amazon SageMaker

In this example, we’ll show you how you can build a TorchServe container and host it using Amazon SageMaker. With Amazon SageMaker hosting you get a fully-managed hosting experience. Just specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest.

With a few lines of code, you can ask Amazon SageMaker to launch the instances, download your model from Amazon S3 to your TorchServe container, and set up the secure HTTPS endpoint for your application. On the client side, get prediction with a simple API call to this secure endpoint backed by TorchServe.

Code, configuration files, Jupyter notebooks and Dockerfiles used in this example are available here:
https://github.com/shashankprasanna/torchserve-examples.git


### Clone the TorchServe repository and install torch-model-archiver

You'll use `torch-model-archiver` to create a model archive file (.mar). The .mar model archive file contains model checkpoints along with it’s `state_dict` (dictionary object that maps each layer to its parameter tensor).

In [2]:
!git clone https://github.com/pytorch/serve.git
!pip install serve/model-archiver/

Cloning into 'serve'...
remote: Enumerating objects: 73, done.[K
remote: Counting objects: 100% (73/73), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 10094 (delta 43), reused 60 (delta 35), pack-reused 10021
Receiving objects: 100% (10094/10094), 39.11 MiB | 50.76 MiB/s, done.
Resolving deltas: 100% (5500/5500), done.
Processing ./serve/model-archiver
Collecting enum-compat
  Downloading enum_compat-0.0.3-py3-none-any.whl (1.3 kB)
Building wheels for collected packages: torch-model-archiver
  Building wheel for torch-model-archiver (setup.py) ... [?25ldone
[?25h  Created wheel for torch-model-archiver: filename=torch_model_archiver-0.2.0b20200825-py3-none-any.whl size=14264 sha256=a6b915a5034c25e8eb2570039da80590b47ccf1501610525b1521dcaf7fa329b
  Stored in directory: /home/ec2-user/.cache/pip/wheels/fb/52/12/18080666b71dc6c8581dd830a07a93ddef8f47bd60a707998d
Successfully built torch-model-archiver
Installing collected packages: enum-compat, torch-model-

### Download a PyTorch model and create a TorchServe archive

In [3]:
!wget -q https://download.pytorch.org/models/densenet161-8d451a50.pth
    
model_file_name = 'densenet161'

!torch-model-archiver --model-name {model_file_name} \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier

!ls *.mar

densenet161.mar


### Upload the generated densenet161.mar archive file to Amazon S3
Create a compressed tar.gz file from the densenet161.mar file since Amazon SageMaker expects that models are in a tar.gz file. 
Uploads the model to your default Amazon SageMaker S3 bucket under the models directory

### Create a boto3 session and get specify a role with SageMaker access

In [4]:
import boto3, time, json
sess    = boto3.Session()
sm      = sess.client('sagemaker')
region  = sess.region_name
account = boto3.client('sts').get_caller_identity().get('Account')

In [5]:
import sagemaker
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)

In [6]:
bucket_name = sagemaker_session.default_bucket()
prefix = 'torchserve'

!tar cvfz {model_file_name}.tar.gz densenet161.mar
!aws s3 cp {model_file_name}.tar.gz s3://{bucket_name}/{prefix}/models/

densenet161.mar
upload: ./densenet161.tar.gz to s3://sagemaker-us-east-1-165793827590/torchserve/models/densenet161.tar.gz


### Create an Amazon ECR registry
Create a new docker container registry for your torchserve container images.

In [7]:
registry_name = 'torchserve-2'
!aws ecr create-repository --repository-name {registry_name}

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:165793827590:repository/torchserve-2",
        "registryId": "165793827590",
        "repositoryName": "torchserve-2",
        "repositoryUri": "165793827590.dkr.ecr.us-east-1.amazonaws.com/torchserve-2",
        "createdAt": 1598389884.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}


### Build a TorchServe Docker container and push it to Amazon ECR

In [11]:
image_label = 'v1'
image = f'{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_label}'

!docker build -t {registry_name}:{image_label} .
!$(aws ecr get-login --no-include-email --region {region})
!docker tag {registry_name}:{image_label} {image}
!docker push {image}

Sending build context to Docker daemon  405.1MB
Step 1/16 : FROM ubuntu:18.04
 ---> 6526a1858e5d
Step 2/16 : ENV PYTHONUNBUFFERED TRUE
 ---> Using cache
 ---> bc8d24e62c74
Step 3/16 : RUN apt-get update &&     DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y     fakeroot     ca-certificates     dpkg-dev     g++     python3-dev     openjdk-11-jdk     curl     vim     && rm -rf /var/lib/apt/lists/*     && cd /tmp     && curl -O https://bootstrap.pypa.io/get-pip.py     && python3 get-pip.py
 ---> Running in 99e14bd4db11
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:3 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [1044 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:6 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [18

### Deploy endpoint and make prediction using Amazon SageMaker SDK

In [12]:
from sagemaker.model import Model
from sagemaker.predictor import Predictor

model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161-2'

torchserve_model = Model(model_data = model_data, 
                         image_uri = image,
                         role  = role,
                         predictor_cls=Predictor,
                         name  = sm_model_name)

In [13]:
endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

predictor = torchserve_model.deploy(instance_type='ml.m4.xlarge',
                                    initial_instance_count=1,
                                    endpoint_name = endpoint_name)

-------------------!

#### Test the TorchServe hosted model

In [14]:
!wget -q https://s3.amazonaws.com/model-server/inputs/kitten.jpg    
file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = payload
    
response = predictor.predict(data=payload)
print(*json.loads(response), sep = '\n')

tiger_cat
tabby
Egyptian_cat
lynx
plastic_bag


### Deploy endpoint and make prediction using Python SDK (Boto3)

In [15]:
model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161-boto-2'

container = {
    'Image': image,
    'ModelDataUrl': model_data
}

create_model_response = sm.create_model(
    ModelName         = sm_model_name,
    ExecutionRoleArn  = role,
    PrimaryContainer  = container)

print(create_model_response['ModelArn'])

arn:aws:sagemaker:us-east-1:165793827590:model/torchserve-densenet161-boto-2


In [16]:
import time
endpoint_config_name = 'torchserve-endpoint-config-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(endpoint_config_name)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants = [{
        'InstanceType'        : 'ml.m4.xlarge',
        'InitialVariantWeight': 1,
        'InitialInstanceCount': 1,
        'ModelName'           : sm_model_name,
        'VariantName'         : 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

torchserve-endpoint-config-2020-08-25-21-45-01
Endpoint Config Arn: arn:aws:sagemaker:us-east-1:165793827590:endpoint-config/torchserve-endpoint-config-2020-08-25-21-45-01


In [17]:
endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(endpoint_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName         = endpoint_name,
    EndpointConfigName   = endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

torchserve-endpoint-2020-08-25-21-45-02
arn:aws:sagemaker:us-east-1:165793827590:endpoint/torchserve-endpoint-2020-08-25-21-45-02


In [18]:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

while status=='Creating':
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp['EndpointStatus']
    print("Status: " + status)

print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: InService
Arn: arn:aws:sagemaker:us-east-1:165793827590:endpoint/torchserve-endpoint-2020-08-25-21-45-02
Status: InService


In [19]:
!wget https://s3.amazonaws.com/model-server/inputs/kitten.jpg    
file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = payload

--2020-08-25 21:53:07--  https://s3.amazonaws.com/model-server/inputs/kitten.jpg
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.99.173
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.99.173|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 110969 (108K) [text/plain]
Saving to: ‘kitten.jpg.1’


2020-08-25 21:53:07 (24.9 MB/s) - ‘kitten.jpg.1’ saved [110969/110969]



In [20]:
import json
client = boto3.client('runtime.sagemaker')

response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                   ContentType='application/x-image', 
                                   Body=payload)

print(*json.loads(response['Body'].read()), sep = '\n')

tiger_cat
tabby
Egyptian_cat
lynx
plastic_bag


### Listing on Marketplace

In [40]:
sm_model_name = 'torchserve-densenet161-2'

batch_inference_input_prefix = "batch-inference-input-data"
TRANSFORM_WORKDIR = "transform"

In [29]:
%%sh

mkdir transform
cd transform
wget https://s3.amazonaws.com/model-server/inputs/kitten.jpg   

--2020-08-25 22:36:32--  https://s3.amazonaws.com/model-server/inputs/kitten.jpg
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.169.117
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.169.117|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 110969 (108K) [text/plain]
Saving to: ‘kitten.jpg’

     0K .......... .......... .......... .......... .......... 46% 21.2M 0s
    50K .......... .......... .......... .......... .......... 92% 16.5M 0s
   100K ........                                              100% 2.67M=0.008s

2020-08-25 22:36:32 (12.7 MB/s) - ‘kitten.jpg’ saved [110969/110969]



In [41]:
transform_input = sagemaker_session.upload_data(TRANSFORM_WORKDIR, key_prefix=batch_inference_input_prefix) + "/kitten.jpg"
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-1-165793827590/batch-inference-input-data/kitten.jpg


### Test the batch transform

In [43]:
transformer = sagemaker.transformer.Transformer(model_name=sm_model_name, instance_count=1, instance_type='ml.m4.xlarge',
                            strategy=None, assemble_with=None, output_path=None, sagemaker_session=sagemaker_session)

In [44]:
transformer.transform(transform_input, content_type='image/jpeg')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

.................................
[34mPYTHONUNBUFFERED=TRUE[0m
[34mSAGEMAKER_SAFE_PORT_RANGE=10000-10999[0m
[34mHOSTNAME=cab3f11b80d3[0m
[34mPWD=/home/model-server[0m
[34mHOME=/root[0m
[34mSAGEMAKER_BATCH=true[0m
[34mAWS_REGION=us-east-1[0m
[34mSAGEMAKER_BIND_TO_PORT=8080[0m
[34mSHLVL=1[0m
[34mAWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/dpMDNsMVD_kCvwX0V4UOqFH5nxRfplJGGYKVzv65Ses[0m
[34mTEMP=/home/model-server/tmp[0m
[34mPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin[0m
[34m_=/usr/bin/printenv[0m
[34mml[0m
[34m2020-08-25 23:30:42,491 [INFO ] main org.pytorch.serve.ModelServer - [0m
[34mTorchserve version: 0.2.0[0m
[34mTS Home: /usr/local/lib/python3.6/dist-packages[0m
[34mCurrent directory: /home/model-server[0m
[34mTemp directory: /home/model-server/tmp[0m
[34mNumber of GPUs: 0[0m
[34mNumber of CPUs: 4[0m
[34mMax heap size: 3112 M[0m
[34mPython executable: /usr/bin/python3[0m
[34mConfig file: /home/model-serv

### Create the model package

In [61]:
transform_input_prefix = 's3://sagemaker-us-east-1-165793827590/batch-inference-input-data/'

In [87]:
from src.inference_specification import InferenceSpecification
import json

modelpackage_inference_specification = InferenceSpecification().get_inference_specification_dict(
    ecr_image=image,
    supports_gpu=True,
    supported_content_types=["image/jpeg", "image/png"],
    supported_mime_types=["application/json"])

# Specify the model data resulting from the previously completed training job
modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]= model_data
print(json.dumps(modelpackage_inference_specification, indent=4, sort_keys=True))

{
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": "165793827590.dkr.ecr.us-east-1.amazonaws.com/torchserve-2:v1",
                "ModelDataUrl": "s3://sagemaker-us-east-1-165793827590/torchserve/models/densenet161.tar.gz"
            }
        ],
        "SupportedContentTypes": [
            "image/jpeg",
            "image/png"
        ],
        "SupportedRealtimeInferenceInstanceTypes": [
            "ml.m4.xlarge",
            "ml.m4.2xlarge",
            "ml.m4.4xlarge",
            "ml.m4.10xlarge",
            "ml.m4.16xlarge",
            "ml.m5.large",
            "ml.m5.xlarge",
            "ml.m5.2xlarge",
            "ml.m5.4xlarge",
            "ml.m5.12xlarge",
            "ml.m5.24xlarge",
            "ml.c4.xlarge",
            "ml.c4.2xlarge",
            "ml.c4.4xlarge",
            "ml.c4.8xlarge",
            "ml.c5.xlarge",
            "ml.c5.2xlarge",
            "ml.c5.4xlarge",
            "ml.c5.9xlarge",
       

In [108]:
from src.modelpackage_validation_specification import ModelPackageValidationSpecification
import time

modelpackage_validation_specification = ModelPackageValidationSpecification().get_validation_specification_dict(
    validation_role = role,
    batch_transform_input = transform_input,
    input_content_type = "image/jpeg",
    output_content_type = "application/json",
    instance_type = "ml.c4.xlarge",
    output_s3_location = 's3://{}/{}'.format(sagemaker_session.default_bucket(), "/batch-inference-output-data"))

print(json.dumps(modelpackage_validation_specification, indent=4, sort_keys=True))

{
    "ValidationSpecification": {
        "ValidationProfiles": [
            {
                "ProfileName": "ValidationProfile1",
                "TransformJobDefinition": {
                    "MaxConcurrentTransforms": 1,
                    "MaxPayloadInMB": 6,
                    "TransformInput": {
                        "CompressionType": "None",
                        "ContentType": "image/jpeg",
                        "DataSource": {
                            "S3DataSource": {
                                "S3DataType": "S3Prefix",
                                "S3Uri": "s3://sagemaker-us-east-1-165793827590/batch-inference-input-data/kitten.jpg"
                            }
                        }
                    },
                    "TransformOutput": {
                        "Accept": "application/json",
                        "KmsKeyId": "",
                        "S3OutputPath": "s3://sagemaker-us-east-1-165793827590//batch-inference-output-data/ba

In [109]:
model_package_name = sm_model_name + str(round(time.time()))
create_model_package_input_dict = {
    "ModelPackageName" : model_package_name,
    "ModelPackageDescription" : "Model of pre-trained DenseNet161",
    "CertifyForMarketplace" : True
}
create_model_package_input_dict.update(modelpackage_inference_specification)
create_model_package_input_dict.update(modelpackage_validation_specification)
print(json.dumps(create_model_package_input_dict, indent=4, sort_keys=True))

sm.create_model_package(**create_model_package_input_dict)

{
    "CertifyForMarketplace": true,
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": "165793827590.dkr.ecr.us-east-1.amazonaws.com/torchserve-2:v1",
                "ModelDataUrl": "s3://sagemaker-us-east-1-165793827590/torchserve/models/densenet161.tar.gz"
            }
        ],
        "SupportedContentTypes": [
            "image/jpeg",
            "image/png"
        ],
        "SupportedRealtimeInferenceInstanceTypes": [
            "ml.m4.xlarge",
            "ml.m4.2xlarge",
            "ml.m4.4xlarge",
            "ml.m4.10xlarge",
            "ml.m4.16xlarge",
            "ml.m5.large",
            "ml.m5.xlarge",
            "ml.m5.2xlarge",
            "ml.m5.4xlarge",
            "ml.m5.12xlarge",
            "ml.m5.24xlarge",
            "ml.c4.xlarge",
            "ml.c4.2xlarge",
            "ml.c4.4xlarge",
            "ml.c4.8xlarge",
            "ml.c5.xlarge",
            "ml.c5.2xlarge",
            "ml.c5.4xlarge",
 

{'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:165793827590:model-package/torchserve-densenet161-21598403659',
 'ResponseMetadata': {'RequestId': '7ef468bf-9c71-44e2-a1c2-e4789c08acc4',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '7ef468bf-9c71-44e2-a1c2-e4789c08acc4',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '111',
   'date': 'Wed, 26 Aug 2020 01:01:00 GMT'},
  'RetryAttempts': 0}}

In [110]:
while True:
    response = sm.describe_model_package(ModelPackageName=model_package_name)
    status = response["ModelPackageStatus"]
    print (status)
    if (status == "Completed" or status == "Failed"):
        print (response["ModelPackageStatusDetails"])
        break
    time.sleep(100)

InProgress
InProgress
InProgress
InProgress
Completed
{'ValidationStatuses': [{'Name': 'ValidationProfile1', 'Status': 'Completed'}], 'ImageScanStatuses': [{'Name': '165793827590.dkr.ecr.us-east-1.amazonaws.com/torchserve-2@sha256:cc62eb6c373651832dfbaa4cdeaacab80ecf09d973428374cececeb355e3e4bb', 'Status': 'Completed'}]}


In [53]:
image

'165793827590.dkr.ecr.us-east-1.amazonaws.com/torchserve-2:v1'

In [54]:
model_data

's3://sagemaker-us-east-1-165793827590/torchserve/models/densenet161.tar.gz'