# PyTorch models and Multi-Model Endpoints with the SageMaker Python SDK

## Setup

For the MME section, we'll need some additional libraries not available in the notebook kernel by default:

In [None]:
!pip install "sagemaker-pytorch-inference>=2" torch-model-archiver

In [None]:
# Python Built-Ins:
from datetime import datetime
import json
import os
import shutil
import tarfile

# External Dependencies:
import boto3
import numpy as np
import pandas as pd
import sagemaker
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.pytorch import PyTorch
from sagemaker.pytorch.model import PyTorchModel


sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# Configuration:
bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-pytorch-rea"

You only need to run this cell if you ran the [Custom_Container.ipynb](Custom_Container.ipynb) already and would like to use the built custom container images from there:

In [None]:
%store -r custom_training_uri
%store -r custom_inference_uri

## Upload data to S3

In [None]:
inputs_list = sagemaker_session.upload_data("./data/list_seq.pickle", bucket=bucket, key_prefix=prefix+'/train')
print(inputs_list)
inputs_dict = sagemaker_session.upload_data("./data/dict_loc.pickle", bucket=bucket, key_prefix=prefix+'/train')
print(inputs_dict)

## Sagemaker Pytorch Estimator - train your model

In [None]:
# place to save model artifact
output_path = f"s3://{bucket}/{prefix}/output/"

estimator = PyTorch(
    entry_point="train.py",
    source_dir="src",
    role=role,
    framework_version="1.7.1",
    py_version="py3",
    # If you built and are using customized containers:
    #image_uri=custom_training_uri,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    output_path=output_path,
    hyperparameters={
        "embedding_dims": 128,
        "initial_lr": 0.025,
        "epochs": 3,
        "batch_size": 16,
        "n_workers": 16,
    }
)

estimator.fit({ "training": f"s3://{bucket}/{prefix}/train" })

## Deploying to an endpoint

The `PyTorchModel` class will **re-pack** the training job result tarball, adding in a `code/` folder of your inference code and re-configuring the container's entrypoint from the previously-specified `train.py` to `inference.py`.

**IF** your training job:
- Already copied the required code assets to `model_dir/code` (including train.py), and
- Used an entry-point `train.py` which will still expose the required functions when imported as a module (e.g. by including `from inference import *` in it)

**THEN** this re-packaging is not necessary and a simple `estimator.deploy(...)` will work: Because the container will import `code/train.py` and get the definitions it needs (`model_fn` and so on).

In [None]:
model_path = estimator.latest_training_job.describe()["ModelArtifacts"]["S3ModelArtifacts"]
print(model_path)

In [None]:
pytorch_model = PyTorchModel(
    model_data=model_path,
    #name="pytorch-single-model",
    entry_point="inference.py",
    source_dir="src",
    role=role,
    framework_version="1.7.1",
    py_version="py3",
    # If you built and are using customized containers:
    #image_uri=custom_inference_uri,
)

In [None]:
predictor = pytorch_model.deploy(
    #endpoint_name="pytorch-single-model",
    instance_type="ml.c5.xlarge",
    initial_instance_count=1,
)

# This endpoint expects (and returns) JSON, rather than the default numpy format for the PyTorchPredictor
predictor.serializer = sagemaker.serializers.JSONSerializer()
predictor.deserializer = sagemaker.deserializers.JSONDeserializer()

## Invoke the endpoint

Depending on your usage context, you may want to invoke the endpoint via the SageMaker Python SDK:

In [None]:
predictor.predict({ "locationIDInput": ["mycty_51549"], "count": 5 })

...Or via plain Boto3:

In [None]:
runtime_client = boto3.client("runtime.sagemaker")

endpoint_name = predictor.endpoint_name
single_test = json.dumps({ "locationIDInput": ["mycty_51549"], "count": 5 })

print(f"Invoking endpoint {endpoint_name}...")
response = runtime_client.invoke_endpoint(
    EndpointName = endpoint_name,
    ContentType = "application/json",
    Body = single_test,
)
result = response["Body"].read().decode("utf-8")
print(f"Predicted label is {result}.")

## Preparing a model archive for MME with TorchServe

PyTorch v1.6+ inference containers use [TorchServe](https://pytorch.org/serve/), for consistency with standard practices on the framework. TorchServe requires a particular [model archive](https://github.com/pytorch/serve/tree/master/model-archiver#torch-model-archiver-for-torchserve) format to load and serve models.

In SageMaker, the service itself handles downloading and untaring of SageMaker model zips to endpoint containers: Including loading and unloading for Multi-Model Endpoints.

While the PyTorch framework container for a single-model endpoint can build the TorchServe model archive from the input model folder on start-up (because the artifacts have already been downloaded), this is not currently supported on MME - because of the dynamic model loading.

Therefore to use a model with a TorchServe-based MME endpoint, we need to:

- Convert it to a TorchServe Model Archive ourselves beforehand, and
- Use `tar.gz` compression (which SageMaker expects) rather than the standard `.mar` (which is, [just a zip](https://github.com/pytorch/serve/blob/40405f90e3c590638871d92fc6cda48f1dcfe570/model-archiver/model_archiver/model_packaging_utils.py#L198) under the covers anyway)

First, let's download and unpack the contents of our deployment-ready `model.tar.gz` to `data/model`:

In [None]:
input_model_s3uri = pytorch_model.repacked_model_data

shutil.rmtree("data/model", ignore_errors=True)
os.makedirs("data/model", exist_ok=True)

!aws s3 cp $input_model_s3uri data/model/model.tar.gz

print(f"Extracting in data/model...")
# Can extract with CLI or Python, whichever is preferred:
#!cd data/model && tar -xzvf model.tar.gz
with tarfile.open("data/model/model.tar.gz", "r") as tar:
    tar.extractall("data/model")

print(f"Deleting model.tar.gz...")
os.remove("data/model/model.tar.gz")
print(f"Done")

Now, we can construct a TorchServe model archive.

This process will ask us for a **handler service**, but since we're deploying to the PyTorch framework container image we can, [like the container does for single-model endpoints](https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/6610a410c0cf40bcf15267abe722d20d50e77bcf/src/sagemaker_pytorch_serving_container/torchserve.py#L118), reference the default SageMaker PyTorch handler:

In [None]:
from sagemaker_pytorch_serving_container import handler_service as default_handler_service

default_handler_pyfile = default_handler_service.__file__
print(f"Using SageMaker PyTorch default 'handler_service' from:\n{default_handler_pyfile}")

In [None]:
torch_model_name = "MyModel"
os.makedirs(f"data/torchserve-models/{torch_model_name}", exist_ok=True)
# (-f just forces overwrite if the output archive already exists)
# "tgz" is a supported archive-format but it creates an archive with a nested folder, which won't work
!torch-model-archiver \
    -f \
    --model-name $torch_model_name \
    --handler $default_handler_pyfile \
    --export-path data/torchserve-models \
    --version 1.0 \
    --extra-files data/model/ \
    --archive-format "no-archive"

# ...So we'll tar it separately after creating un-compressed:
tmp_arch_loc = f"../{torch_model_name}.tar.gz"
!cd data/torchserve-models/$torch_model_name && tar -czvf $tmp_arch_loc .

## Deploying a Multi-Model Endpoint

In [None]:
mme_name = "pytorch-mme" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
print(f"Creating MME: {mme_name}")
model_data_prefix = f"s3://{bucket}/{prefix}/mme-artifacts/{mme_name}/"
print(f"MME artifact store:\n{model_data_prefix}")

mme = MultiDataModel(
    name=mme_name,
    model_data_prefix=model_data_prefix,
    # The PyTorchModel is passed just to define container/environment the spec:
    model=pytorch_model,
    sagemaker_session=sagemaker_session,
)

In [None]:
mme_predictor = mme.deploy(
    #endpoint_name=mme_name,
    initial_instance_count=1,
    instance_type="ml.c5.xlarge",
)

mme_predictor.serializer = sagemaker.serializers.JSONSerializer()
mme_predictor.deserializer = sagemaker.deserializers.JSONDeserializer()

At first, there are no models in the MME's S3 prefix and no models available on the MME:

In [None]:
!aws s3 ls $model_data_prefix

In [None]:
# No models visible!
list(mme.list_models())

## Dynamically adding models to the endpoint

In [None]:
ts_model_s3uri = mme.add_model(model_data_source="./data/torchserve-models/MyModel.tar.gz", model_data_path=torch_model_name)
print(ts_model_s3uri)

In [None]:
list(mme.list_models())

In [None]:
print(f"Calling model {torch_model_name}...")
mme_predictor.predict(
    { "locationIDInput": ["mycty_51549"], "count": 5 },
    target_model=torch_model_name,
)

In [None]:
response = runtime_client.invoke_endpoint(
    EndpointName = mme_name,
    ContentType = "application/json",
    TargetModel = torch_model_name,
    Body = single_test,
)

result = response["Body"].read().decode("utf-8")
print(f"Predicted label is {result}.")

## Clean-up

In [None]:
predictor.delete_endpoint(delete_endpoint_config=True)
mme_predictor.delete_endpoint(delete_endpoint_config=True)

In [None]:
mme.delete_model()
pytorch_model.delete_model()