# Moving from PoC to Production in SageMaker to deploy models

Moving from PoC to production to deploy models at high performance and low cost can be challenging. To help facilitate your move,SageMaker offers several features to test models and gradually move between models to production. In this notebook we will setup the necessary SageMaker resources(such as dependecies,model etc.) that will be used later in the following notebooks. Primarily there are 3 offerings that we'll look at -

1. A/B Testing with SageMaker variants
2. Deployment guardrails 
    - Blue/green deployments
        - All At Once Traffic Shifting
        - Canary Traffic Shifting
        - Linear Traffic Shifting
3. Shadow Testing

Furthermore, we'll look at Load testing and Inference Recommender tool that is offered by SageMaker.

***
This notebooks is designed to run on `Python 3 (Data Science 2.0)` kernel in Amazon SageMaker Studio
***


## Setup
Let's set up some required imports and basic initial variables:

In [49]:
%pip install -U transformers ipywidgets sagemaker torch -q --quiet

Note: you may need to restart the kernel to use updated packages.


In [16]:
%matplotlib inline
import datetime
import time
import os, sys
import boto3
import re
import json
import pandas as pd
import numpy as np
import sagemaker
from sagemaker import get_execution_role, image_uris
import shutil
import tarfile
from pathlib import Path
from uuid import uuid4

import torch
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.s3 import S3Uploader, s3_path_join
from transformers import AutoModel, AutoModelForSequenceClassification, AutoTokenizer, pipeline

import csv
import matplotlib.pyplot as plt
from sklearn import metrics

# p = os.path.abspath('..')
# if p not in sys.path:
#     sys.path.append(p)
import utils

sm_session = sagemaker.Session()
role = get_execution_role()
region = sm_session.boto_region_name
bucket = sm_session.default_bucket()
sm_client = sm_session.sagemaker_client
sm_runtime = sm_session.sagemaker_runtime_client
prefix = "sagemaker/huggingface-pytorch-sentiment-analysis"
time_now = f'{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}'


### Useful objects and variables
Common objects to interact with SageMaker API

In [4]:
sm_session = sagemaker.Session()
role = get_execution_role()
bucket = sm_session.default_bucket()
region = sm_session.boto_region_name
sm_client = sm_session.sagemaker_client
sm_runtime = boto3.client("sagemaker-runtime")
prefix = "sagemaker/huggingface-pytorch-sentiment-analysis"
deploy_instance_type = "ml.m5.xlarge"
%store deploy_instance_type

# The name of the Model Package Group in Amazon SageMaker Model Registry
model_package_group_name = "HuggingFaceModels"
%store model_package_group_name

print(region)
print(role)
print(bucket)

Stored 'deploy_instance_type' (str)
Stored 'model_package_group_name' (str)
us-west-2
arn:aws:iam::757967535041:role/service-role/AmazonSageMaker-ExecutionRole-20211027T114497
sagemaker-us-west-2-757967535041


## Step 0: Download HuggingFace Transformer models and Create SageMaker models

#### twitter-roberta-base-sentiment Pretrained Model

In this example we are downloading a pre-trained HuggingFace model - `twitter-roberta-base-sentiment` from the HuggingFace library. We will use this model for classifying the text as `Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive`.

In [5]:
HF_MODEL_ROBERTA = "cardiffnlp/twitter-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(HF_MODEL_ROBERTA)
tokenizer = AutoTokenizer.from_pretrained(HF_MODEL_ROBERTA)
model.save_pretrained("model_token_roberta")
tokenizer.save_pretrained("model_token_roberta")

HBox(children=(FloatProgress(value=0.0, description='Downloading (…)lve/main/config.json', max=747.0, style=Pr…




HBox(children=(FloatProgress(value=0.0, description='Downloading pytorch_model.bin', max=498679497.0, style=Pr…




HBox(children=(FloatProgress(value=0.0, description='Downloading (…)olve/main/vocab.json', max=898822.0, style…




HBox(children=(FloatProgress(value=0.0, description='Downloading (…)olve/main/merges.txt', max=456318.0, style…




HBox(children=(FloatProgress(value=0.0, description='Downloading (…)cial_tokens_map.json', max=150.0, style=Pr…




('model_token_roberta/tokenizer_config.json',
 'model_token_roberta/special_tokens_map.json',
 'model_token_roberta/vocab.json',
 'model_token_roberta/merges.txt',
 'model_token_roberta/added_tokens.json',
 'model_token_roberta/tokenizer.json')

### #Package the saved model to tar.gz format
Once the model is downloaded, we need to package (tokenizer and model weights) it to `.tar.gz` format as expected by Amazon SageMaker.

In [6]:
tar_file_roberta = "model_roberta.tar.gz"
tar_size = utils.create_tar(tar_file_roberta, Path("model_token_roberta"))
print(f"Created {tar_file_roberta}, size {tar_size:.2f} MB")

model_token_roberta/vocab.json: 100%|██████████| 7/7 [00:28<00:00,  4.13s/files]             


Created model_roberta.tar.gz, size 464.05 MB


#### Download distilbert-base-uncased-finetuned-sst-2-english by initiating a `Huggingface pipeline`

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the [task summary](https://huggingface.co/transformers/task_summary.html) for examples of use.

In [7]:
HF_MODEL_DISTILBERT = "distilbert-base-uncased-finetuned-sst-2-english"
HF_TASK = "sentiment-analysis"
local_artifact_path = Path("model_token_distilbert")
local_artifact_path.mkdir(exist_ok=True, parents=True)
tar_file_distilbert = "model_distilbert.tar.gz"

In [8]:
sentiment_analysis = pipeline(HF_TASK, model=HF_MODEL_DISTILBERT)
sentiment_analysis.save_pretrained(local_artifact_path)

#### Write the Inference Script

To deploy a pretrained `PyTorch` model, you'll need to use the `PyTorch` estimator object to create a `PyTorchModel` object and set a different `entry_point`.

You'll use the `PyTorchModel` object to deploy a `PyTorchPredictor`. This creates a `SageMaker` Endpoint -- a hosted prediction service that we can use to perform inference.

An implementation of `model_fn` is required for inference script. We are going to use default implementations of `input_fn`, `predict_fn`, `output_fn` and `model_fn` defined in [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers).

Here's an example of the inference script:

In [21]:
! pwd

/root/sagemaker-inference-poc-to-production


In [22]:
!cat code/inference.py  # uncomment this line of code to see the details in the py file

from transformers import pipeline
import json

CSV_CONTENT_TYPE = 'text/csv'
JSON_CONTENT_TYPE = 'application/json'

def model_fn(model_dir):
    sentiment_analysis = pipeline(
        "sentiment-analysis",
        model=model_dir,
        tokenizer=model_dir,
        return_all_scores=True
    )
    return sentiment_analysis


def input_fn(serialized_input_data, content_type=CSV_CONTENT_TYPE):
    if content_type == CSV_CONTENT_TYPE:
        input_data = serialized_input_data.splitlines()
        return input_data
    elif content_type == JSON_CONTENT_TYPE:
        data = json.loads(serialized_input_data)
        input_data = data.pop("inputs", data)
        return input_data
    else:
        raise Exception('Requested unsupported ContentType in Accept: ' + content_type)
        return


def predict_fn(input_data, model):
    return model(input_data)


In [23]:
cat code/requirements.txt  # uncomment this line to show the packages defined in the requirements.txt

transformers

#### Create the directory structure for your model files

The directory structure where you saved your PyTorch model should look something like the following:

```
|   model
|        |--pytorch_model.bin
|        |--config.json
|        |--vocab.txt
|        |--tokenizer.json
|        |--tokenizer_config.json
|        |--special_tokens_map.json
|
|        code
|            |--inference.py
|            |--requirements.txt
```

Where `requirements.txt` is an optional file that specifies dependencies on third-party libraries.

#### Copy code to the model directory and tar the model and code

In [27]:
import sys; sys.version

'3.7.10 (default, Jun  4 2021, 14:48:32) \n[GCC 7.5.0]'

In [28]:
if sys.version_info >= (3, 8):
    shutil.copytree("code", "model_token_distilbert/code", dirs_exist_ok=True)
else:
    shutil.copytree("code", "model_token_distilbert/code")
    
tar_size =utils.create_tar(tar_file_distilbert, local_artifact_path)
print(f"Created {tar_file_distilbert}, size {tar_size:.2f} MB")

model_token_distilbert/code/inference.py: 100%|██████████| 8/8 [00:14<00:00,  1.87s/files]      

Created model_distilbert.tar.gz, size 247.31 MB





#### Upload the model to S3

We now have the model archives ready. We need to upload them to S3 before we can use them for hosting.

In [29]:
model_data_path = s3_path_join("s3://", bucket, prefix + "/models")
print(f"Uploading Models to {model_data_path}")
model_roberta_uri = S3Uploader.upload("model_roberta.tar.gz", model_data_path)
print(f"Uploaded roberta model to {model_roberta_uri}")
model_distilbert_uri = S3Uploader.upload("model_distilbert.tar.gz", model_data_path)
print(f"Uploaded distilbert model to {model_distilbert_uri}")
%store model_data_path
%store model_roberta_uri
%store model_distilbert_uri

Uploading Models to s3://sagemaker-us-west-2-757967535041/sagemaker/huggingface-pytorch-sentiment-analysis/models
Uploaded roberta model to s3://sagemaker-us-west-2-757967535041/sagemaker/huggingface-pytorch-sentiment-analysis/models/model_roberta.tar.gz
Uploaded distilbert model to s3://sagemaker-us-west-2-757967535041/sagemaker/huggingface-pytorch-sentiment-analysis/models/model_distilbert.tar.gz
Stored 'model_data_path' (str)
Stored 'model_roberta_uri' (str)
Stored 'model_distilbert_uri' (str)


#### Prebuilt HuggingFace DLC
You can choose to use a prebuilt HuggingFace DLC as the inference image, which has the [SageMaker huggingface inference toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit) for serving 🤗 Transformers models on Amazon SageMaker. The inference toolkit leverages the pipeline for the transformer library to allow zero-code deployments of models, without requiring any code for pre- or post-processing. (see more information of the default [handler service](https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/handler_service.py) provided bythe inference toolkit).

In addition to zero-code deployment, the Inference Toolkit supports "bring your own code" methods, where you can override the default methods. You can learn more about "bring your own code" in the documentation [here](https://github.com/aws/sagemaker-huggingface-inference-toolkit#-user-defined-codemodules). In the second lab section, we will use the bring your own code method to deploy models.

In [30]:
framework = "huggingface"
transformer_version = "4.17.0"
py_version = "py38"
instance_type = "ml.g"
image_scope = "inference"
ml_framework = "PYTORCH"
framework_version = "1.10.2"

inference_image_roberta = image_uris.retrieve(
    framework=framework,
    base_framework_version=ml_framework.lower() + framework_version,
    region=region,
    version=transformer_version,
    py_version=py_version,
    instance_type=instance_type,
    image_scope=image_scope,
)

print(inference_image_roberta)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04


#### Prebuilt Pytorch DLC
You can also use a SageMaker prebuilt [Pytorch DLC](https://github.com/aws/deep-learning-containers/tree/master/pytorch) to deploy the huggingface model. In this case, as the prebuilt Pytorch container doesn't have the transformer package, we have provided a `requirements.txt` file with the additional packages that are required to be installed to the container in the model package. See section [Create the directory structure for your model files](#Create-the-directory-structure-for-your-model-files). We also included the `inference.py` file to define the necessary functions for model loading and model serving.

In [31]:
inference_image_distilbert = image_uris.retrieve(
    framework=ml_framework.lower(),
    region=region,
    version=framework_version,
    py_version=py_version,
    instance_type=instance_type,
    image_scope=image_scope,
)

print(inference_image_distilbert)

763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.10.2-gpu-py38


In [32]:
# provide the consistent time stamp for model, endpoint config and endpoint
now_roberta = f"{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}"
print(now_roberta)
roberta_model_name = f"hf-pytorch-model-roberta-{now_roberta}"
print("Model name : {}".format(roberta_model_name))
%store roberta_model_name

2023-05-08-18-21-08
Model name : hf-pytorch-model-roberta-2023-05-08-18-21-08
Stored 'roberta_model_name' (str)


In [33]:
primary_container = {
    'Image': inference_image_roberta,
    'ModelDataUrl': model_roberta_uri
}
create_roberta_model_response = sm_client.create_model(
    ModelName=roberta_model_name,
    ExecutionRoleArn=role,
    PrimaryContainer=primary_container
)

In [34]:
now_distilbert = f"{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}"
print(now_distilbert)
distilbert_model_name = f"hf-pytorch-model-distilbert-{now_distilbert}"
print("Model name : {}".format(distilbert_model_name))
%store distilbert_model_name

2023-05-08-18-21-10
Model name : hf-pytorch-model-distilbert-2023-05-08-18-21-10
Stored 'distilbert_model_name' (str)


In [35]:
primary_container = {
    'Image': inference_image_distilbert,
    'ModelDataUrl': model_distilbert_uri
}
create_roberta_model_response = sm_client.create_model(
    ModelName=distilbert_model_name,
    ExecutionRoleArn=role,
    PrimaryContainer=primary_container
)

### Create a new model for Hugging Face roberta model with entry point script helper function
To deploy the models in one container, we will use the Hugging Face prebuilt container which has the required packages for transformer models. However, we will use a custom entry point script for each of the model and define our own data preprocessing function. The model package structure will be similar to the one that was created for the distilbert model.

In [36]:
local_artifact_path = Path("model_artifacts")
local_artifact_path.mkdir(exist_ok=True, parents=True)
model_tar_name = 'model_roberta_script.tar.gz'

In [43]:
if sys.version_info >= (3, 8):
    shutil.copytree('./model_token_roberta', local_artifact_path, dirs_exist_ok=True) 
    shutil.copytree('./code', local_artifact_path / 'code', dirs_exist_ok=True) 
else:
    shutil.copytree('./model_token_roberta', local_artifact_path)
    shutil.copytree('./code', local_artifact_path / 'code')

In [44]:
tar_size = utils.create_tar(model_tar_name, local_artifact_path)
print(f"Created {model_tar_name}, size {tar_size:.2f} MB")

model_artifacts/code/inference.py: 100%|██████████| 9/9 [00:27<00:00,  3.03s/files]      

Created model_roberta_script.tar.gz, size 464.05 MB





In [45]:
model_data_path = s3_path_join("s3://",bucket,prefix+"/models")
model_roberta_script_uri =S3Uploader.upload(model_tar_name, model_data_path)
print(f"Uploaded roberta script model to {model_roberta_script_uri}")
%store model_roberta_script_uri

Uploaded roberta script model to s3://sagemaker-us-west-2-757967535041/sagemaker/huggingface-pytorch-sentiment-analysis/models/model_roberta_script.tar.gz
Stored 'model_roberta_script_uri' (str)


#### Create the Roberta script model object


In [46]:
now_roberta_script = f'{datetime.datetime.now():%Y-%m-%d-%H-%M-%S}'
roberta_script_model_name = f"hf-pytorch-model-roberta-script-{now_roberta_script}"
print(f"Model name : {roberta_script_model_name}")
%store roberta_script_model_name

Model name : hf-pytorch-model-roberta-script-2023-05-08-18-37-59
Stored 'roberta_script_model_name' (str)


In [47]:
inference_image_roberta_script = image_uris.retrieve(
    framework=framework,
    base_framework_version=ml_framework.lower() + framework_version,
    region=region,
    version=transformer_version,
    py_version=py_version,
    instance_type="ml.c",
    image_scope=image_scope,
)

print(inference_image_roberta_script)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-cpu-py38-ubuntu20.04


In [48]:
primary_container_roberta_script = {
    'Image': inference_image_roberta_script,
    'ModelDataUrl': model_roberta_script_uri
}

create_model_roberta_script_respose = sm_client.create_model(
    ModelName=roberta_script_model_name, 
    ExecutionRoleArn=role, 
    PrimaryContainer=primary_container_roberta_script
)

print(f"Model arn : {create_model_roberta_script_respose['ModelArn']}")

Model arn : arn:aws:sagemaker:us-west-2:757967535041:model/hf-pytorch-model-roberta-script-2023-05-08-18-37-59
