# Deploying PyTorch model with managed framework

Amazon SageMaker Neo is API to compile machine learning models to optimize them for our choice of hardward targets. Currently, Neo supports pre-trained PyTorch models from [TorchVision](https://pytorch.org/docs/stable/torchvision/models.html). General support for other PyTorch models is forthcoming.

In this example notebook, we will compare the performace of PyTorch pretrained model before Vs after compiled using Neo.

In [None]:
!conda install -y pytorch=1.3.1 torchvision=0.4.2 pillow=6 matplotlib=3.1.3

In [8]:
!conda install -y boto3

In [None]:
!pip install sagemaker

## Import VGG19 from TorchVision

We'll import [VGG19_bn](https://arxiv.org/pdf/1409.1556.pdf) model from TorchVision and create a model artifact `model.tar.gz`:

In [1]:
model_name = 'vgg19_bn' # densenet161

model_prefix = 'elastic_inference'
model_folder = f'./tmp/{model_prefix}_{model_name}'

!mkdir -p {model_folder}

model_pth = 'model.pth'
model_onnx = 'model.onnx'
model_mar = 'model.mar'
model_tar_gz = 'model.tar.gz'



In [2]:
import torch
from torchvision import datasets, models, transforms

model = getattr(models, model_name)(pretrained=True)

input_shape = [1, 3, 224, 224]
# traced_model = torch.jit.trace(model.float().eval(), torch.zeros(input_shape).float())
scripted_model = torch.jit.script(model)
scripted_model.save(f"{model_folder}/{model_pth}")

In [3]:
import tarfile

with tarfile.open(f"{model_folder}/{model_tar_gz}", 'w:gz') as f:
    f.add(f"{model_folder}/{model_pth}", arcname='model.pt')

### Set up the environment

In [None]:
import sagemaker

role = sagemaker.get_execution_role()

In [11]:
%%time
import boto3
iam_resource = boto3.resource('iam')
account_id = iam_resource.CurrentUser().arn.split(':')[4]

iam_client = boto3.client('iam')
response = iam_client.list_roles(
    PathPrefix='/service-role/',
    MaxItems=123
)

role_name = None
for i in response['Roles']:
    if i['RoleName'].startswith('AmazonSageMaker-ExecutionRole-'):
        role_name = i['RoleName']

role = f"arn:aws:iam::{account_id}:role/service-role/{role_name}"

CPU times: user 40.4 ms, sys: 0 ns, total: 40.4 ms
Wall time: 783 ms


In [15]:
import boto3
import sagemaker
from sagemaker.utils import name_from_base

boto_session = boto3.session.Session()

sess = sagemaker.Session(boto_session)
region = sess.boto_region_name
bucket = sess.default_bucket()

job_name = name_from_base('sagemaker-containers')
prefix = job_name+'/elastic-inference'

model_path = sess.upload_data(path=f"{model_folder}/{model_tar_gz}", key_prefix=prefix)

data_shape = '{"input0":[1,3,224,224]}'

ClientError: An error occurred (SignatureDoesNotMatch) when calling the GetCallerIdentity operation: Credential should be scoped to a valid region, not 'us-west-2'. 

### Create an Amazon ECR registry
Create a new docker container registry for your torchserve container images.

In [16]:
ecr_repository_name = f"{model_prefix}-{model_name}"

print(account_id)
print(region)
print(role)
print(bucket)
print(ecr_repository_name)

533025023261
us-west-2
arn:aws:iam::533025023261:role/service-role/AmazonSageMaker-ExecutionRole-20200407T020134
sagemaker-us-west-2-533025023261
elastic_inference-vgg19_bn


In [17]:
docker_file = "docker/Dockerfile.eia"
#!pygmentize -l docker {docker_file}

In [None]:
#!pygmentize -l bash docker/build_and_push.sh

### Build a TorchServe Docker container and push it to Amazon ECR

In [18]:
# %%capture
!sh docker/build_and_push.sh {ecr_repository_name} {docker_file} #$account_id $region $ecr_repository_name

[8B1e320240: Pushed   2.719GB/2.684GB[7A[2K[17A[2K[16A[2K[17A[2K[8A[2K[7A[2K[8A[2K[17A[2K[8A[2K[7A[2K[17A[2K[16A[2K[17A[2K[17A[2K[7A[2K[13A[2K[16A[2K[8A[2K[16A[2K[7A[2K[7A[2K[16A[2K[16A[2K[13A[2K[13A[2K[13A[2K[13A[2K[8A[2K[13A[2K[8A[2K[13A[2K[16A[2K[6A[2K[16A[2K[6A[2K[8A[2K[6A[2K[8A[2K[6A[2K[16A[2K[7A[2K[16A[2K[8A[2K[16A[2K[6A[2K[8A[2K[13A[2K[8A[2K[13A[2K[8A[2K[5A[2K[8A[2K[5A[2K[16A[2K[5A[2K[16A[2K[7A[2K[5A[2K[7A[2K[5A[2K[8A[2K[7A[2K[16A[2K[7A[2K[16A[2K[7A[2K[8A[2K[4A[2K[8A[2K[7A[2K[16A[2K[8A[2K[7A[2K[8A[2K[4A[2K[5A[2K[3A[2K[5A[2K[16A[2K[7A[2K[16A[2K[7A[2K[8A[2K[16A[2K[5A[2K[16A[2K[7A[2K[16A[2K[7A[2K[16A[2K[7A[2K[8A[2K[7A[2K[8A[2K[5A[2K[16A[2K[7A[2K[1A[2K[16A[2K[5A[2K[16A[2K[7A[2K[5A[2K[7A[2K[1A[2K[7A[2K[1A[2K[8A[2K[7A[2K[8A[2K[5A[2K[8A[2K[1A[2K[8A[2K[5A[2K[

In [19]:
container_image_uri = f'{account_id}.dkr.ecr.{region}.amazonaws.com/{ecr_repository_name}:latest'
print(container_image_uri)

533025023261.dkr.ecr.us-west-2.amazonaws.com/elastic_inference-vgg19_bn:latest


### Deploy endpoint and make prediction using Amazon SageMaker SDK

In [None]:
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor

sm_model_name = f'{model_prefix}-{model_name}'.replace('_', '-')

model = Model(model_data=model_path, 
                 image=container_image_uri,
                 role=role,
                 predictor_cls=RealTimePredictor,
                 name=sm_model_name)

In [20]:
from sagemaker.pytorch import PyTorchModel
from sagemaker.predictor import RealTimePredictor

sm_model_name = f'{model_prefix}-{model_name}'.replace('_', '-')

model = PyTorchModel(model_data=model_path, 
                     image=container_image_uri,
                     role=role,
                     predictor_cls=RealTimePredictor,
                     name=sm_model_name,
                     entry_point='script.py',
                     framework_version="1.5",
                     sagemaker_session=sess)

In [21]:
import time

endpoint_name = f'{model_prefix}-{model_name}-endpoint-'.replace('_', '-') \
                    + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

predictor = model.deploy(instance_type='ml.c5.large',
                            initial_instance_count=1,
                            accelerator_type='ml.eia2.medium',
                            endpoint_name=endpoint_name)



-*

UnexpectedStatusException: Error hosting endpoint elastic-inference-vgg19-bn-endpoint-2020-05-01-08-02-42: Failed. Reason:  The repository 'sagemaker-containers-2020-05-01-07-53-20-664' does not exist in the registry with id '533025023261'..

#### Test the TorchServe hosted model

### Invoke the endpoint

Let's test with a cat image.

In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

cat_img = 'image/cat1.jpg'
input_image = Image.open(cat_img)

plt.imshow(np.asarray(input_image))

In [None]:
import torch
from torchvision import transforms

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

In [None]:
%%time

import json

with open(cat_img, 'rb') as f:
    payload = f.read()

response = predictor.predict(data=payload)
print(*json.loads(response), sep = '\n')

In [None]:
input_shape = [1, 3, 224, 224]

# Input to the model
x = torch.randn(*input_shape, requires_grad=True)

# Export the model
torch.onnx.export(model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "model.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable lenght axes
                                'output' : {0 : 'batch_size'}})

with tarfile.open('model.tar.gz', 'w:gz') as f:
    f.add('model.onnx')

In [None]:
https://s3.console.aws.amazon.com/s3/buckets/amazonei-pytorch
https://amazonei-pytorch.s3.amazonaws.com/torch_eia-1.3.1-cp36-cp36m-manylinux1_x86_64.whl

In [10]:
!git clone https://github.com/aws/sagemaker-pytorch-serving-container.git

In [20]:
import onnx

onnx_model = onnx.load("super_resolution.onnx")
onnx.checker.check_model(onnx_model)

In [21]:
import onnxruntime

ort_session = onnxruntime.InferenceSession("super_resolution.onnx")

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!


In [23]:
from PIL import Image
import torchvision.transforms as transforms

img = Image.open("cat_224x224.jpg")

resize = transforms.Resize([224, 224])
img = resize(img)

img_ycbcr = img.convert('YCbCr')
img_y, img_cb, img_cr = img_ycbcr.split()

to_tensor = transforms.ToTensor()
img_y = to_tensor(img_y)
img_y.unsqueeze_(0)

tensor([[[[0.2157, 0.1961, 0.1922,  ..., 0.5294, 0.5569, 0.5725],
          [0.2039, 0.1922, 0.1922,  ..., 0.5333, 0.5529, 0.5686],
          [0.2000, 0.1843, 0.1843,  ..., 0.5216, 0.5373, 0.5490],
          ...,
          [0.6667, 0.6745, 0.6392,  ..., 0.6902, 0.6667, 0.6078],
          [0.6392, 0.6431, 0.6235,  ..., 0.8000, 0.7608, 0.6745],
          [0.6392, 0.6353, 0.6510,  ..., 0.8118, 0.7686, 0.6667]]]])

In [24]:
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(img_y)}
ort_outs = ort_session.run(None, ort_inputs)
img_out_y = ort_outs[0]

In [25]:
img_out_y = Image.fromarray(np.uint8((img_out_y[0] * 255.0).clip(0, 255)[0]), mode='L')

# get the output image follow post-processing step from PyTorch implementation
final_img = Image.merge(
    "YCbCr", [
        img_out_y,
        img_cb.resize(img_out_y.size, Image.BICUBIC),
        img_cr.resize(img_out_y.size, Image.BICUBIC),
    ]).convert("RGB")

# Save the image, we will compare this with the output image from mobile device
final_img.save("cat_superres_with_ort.jpg")

In [None]:
from torchvision import datasets

dataroot = './data'

trainset = datasets.QMNIST(root=dataroot, train=True, download=True)
testset = datasets.QMNIST(root=dataroot, train=False, download=True)

Amazon SageMaker 为您创建了一个默认的 Amazon S3 桶，用来存取机器学习工作流程中可能需要的各种文件和数据。 我们可以通过 SageMaker SDK 中 sagemaker.session.Session 类的 default_bucket 方法获得这个桶的名字。

In [None]:
from sagemaker.session import Session

sess = Session()

# S3 bucket for saving code and model artifacts.
# Feel free to specify a different bucket here if you wish.
bucket = sess.default_bucket()

# Location to save your custom code in tar.gz format.
s3_custom_code_upload_location = f's3://{bucket}/customcode/byos-pytorch-gan'

# Location where results of model training are saved.
s3_model_artifacts_location = f's3://{bucket}/artifacts/'

SageMaker SDK 提供了操作 Amazon S3 服务的包和类，其中 S3Downloader 类用于访问或下载 S3 里的对象，而 S3Uploader 则用于将本地文件上传至 S3。您将已经下载的数据上传至 Amazon S3，供模型训练使用。模型训练过程不要从互联网下载数据，避免通过互联网获取训练数据的产生的网络延迟，同时也规避了因直接访问互联网对模型训练可能产生的安全风险。


In [None]:
import os
from sagemaker.s3 import S3Uploader as s3up

s3_data_location = s3up.upload(os.path.join(dataroot, "QMNIST"), f"s3://{bucket}/data/qmnist")

### 训练执行




通过 sagemaker.get_execution_role() 方法，当前笔记本可以得到预先分配给笔记本实例的角色，这个角色将被用来获取训练用的资源，比如下载训练用框架镜像、分配 Amazon EC2 计算资源等等。

In [None]:
from sagemaker import get_execution_role

# IAM execution role that gives SageMaker access to resources in your AWS account.
# We can use the SageMaker Python SDK to get the role from our notebook environment. 
role = get_execution_role()

训练模型用的超参数可以在笔记本里定义，实现与算法代码的分离，在创建训练任务时传入超参数，与训练任务动态结合。

In [None]:
import json

hps = {
         'seed': 0,
         'learning-rate': 0.0002,
         'epochs': 15,
         'dataset': 'qmnist',
         'pin-memory': 1,
         'beta1': 0.5,
         'nc': 1,
         'nz': 100,
         'ngf': 64,
         'ndf': 64,
         'batch-size': 64,
         'sample-interval': 100,
         'log-interval': 20,
     }


print(json.dumps(hps, indent = 4))

sagemaker.pytorch 包里的 ```PyTorch``` 类是基于 PyTorch 框架的模型拟合器，可以用来创建、执行训练任务，还可以对训练完的模型进行部署。参数列表中， ``train_instance_type`` 用来指定CPU或者GPU实例类型，训练脚本和包括模型代码所在的目录通过 ``source_dir`` 指定，训练脚本文件名必须通过 ``entry_point`` 明确定义。这些参数将和其余参数一起被传递给训练任务，他们决定了训练任务的运行环境和模型训练时参数。

In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(role=role,
                        entry_point='train.py',
                        source_dir='dcgan',
                        output_path=s3_model_artifacts_location,
                        code_location=s3_custom_code_upload_location,
                        train_instance_count=1,
                        train_instance_type='ml.c5.large',
                        train_use_spot_instances=True,
                        train_max_wait=86400,
                        framework_version='1.4.0',
                        py_version='py3',
                        hyperparameters=hps)

请特别注意 ``train_use_spot_instances`` 参数，``True`` 值代表您希望优先使用 SPOT 实例。由于机器学习训练工作通常需要大量计算资源长时间运行，善用 SPOT 可以帮助您实现有效的成本控制，SPOT 实例价格可能是按需实例价格的 20% 到 60%，依据选择实例类型、区域、时间不同实际价格有所不同。 

您已经创建了 PyTorch 对象，下面可以用它来拟合预先存在 Amazon S3 上的数据了。下面的指令将执行训练任务，训练数据将以名为 **QMNIST** 的输入通道的方式导入训练环境。训练开始执行过程中，Amazon S3 上的训练数据将被下载到模型训练环境的本地文件系统，训练脚本 ```train.py``` 将从本地磁盘加载数据进行训练。

In [None]:
# Start training
estimator.fit({"QMNIST": s3_data_location}, wait=False)

根据您选择的训练实例不同，训练过程中可能持续几十分钟到几个小时不等。建议设置 ``wait`` 参数为 ``False`` ，这个选项将使笔记本与训练任务分离，在训练时间长、训练日志多的场景下，可以避免笔记本上下文因为网络中断或者会话超时而丢失。训练任务脱离笔记本后，输出将暂时不可见，可以执行如下代码，笔记本将获取并载入此前的训练回话，

In [None]:
%%time
from sagemaker.estimator import Estimator

# Attaching previous training session
training_job_name = estimator.latest_training_job.name
attached_estimator = Estimator.attach(training_job_name)

由于的模型设计考虑到了GPU对训练加速的能力，所以用GPU实例训练会比CPU实例快一些，例如，p3.2xlarge 实例大概需要15分钟左右，而 c5.xlarge 实例则可能需要6小时以上。目前模型不支持分布、并行训练，所以多实例、多CPU/GPU并不会带来更多的训练速度提升。

训练完成后，模型将被上传到 Amazon S3 里，上传位置由创建 `PyTorch` 对象时提供的 `output_path` 参数指定。


### 模型的验证

您将从 Amazon S3 下载经过训练的模型到笔记本所在实例的本地文件系统，下面的代码将载入模型，然后输入一个随机数，获得推理结果，以图片形式展现出来。


In [None]:
from helper import *

last_artifact_location = s3_model_artifacts_location + training_job_name

last_model_url = get_object_path_by_filename(last_artifact_location, 'model.tar.gz')
last_output_url = get_object_path_by_filename(last_artifact_location, 'output.tar.gz')

print(last_model_url)
print(last_output_url)
!rm -rf ./tmp/* ./model/*

In [None]:
from sagemaker.s3 import S3Downloader as s3down

s3down.download(last_model_url, './tmp')
s3down.download(last_output_url, './tmp')

In [None]:
!tar -zxf tmp/model.tar.gz -C ./tmp
!tar -zxf tmp/output.tar.gz -C ./tmp
!cp ./tmp/generator_state.pth ./model

执行如下指令加载训练好的模型，并通过这个模型产生一组『手写』数字字体。