## Author: Agustinus Nalwan
## Model Training & Model Registry Experiment

This experiment demonstrates the following MLOps process using Amazon SageMaker.
- Train a model
- Register a model into a Model Registry Group
- Deploy the model from a Model Registry into an endpoint
- Update an existing endpoint with a new model from Model Registry

In [None]:
import sagemaker
sm_session = sagemaker.session.Session()
s3_bucket = "your bucket name here"
s3_folder = f"s3://{s3_bucket}"
data_path = "data"
sagemaker_role = "add your role here"
sagemaker_role_arn = "add your role arn here"

In [None]:
import numpy as np
import os
import tensorflow as tf

mnist = tf.keras.datasets.mnist

if not os.path.exists(data_path):
    os.mkdir(data_path)

# Get mnist dataset, split it into train/test and save them into local folder
(x_train, y_train), (x_test, y_test) = mnist.load_data()
np.save(f"{data_path}/x_train.dat", x_train)
np.save(f"{data_path}/y_train.dat", y_train)

np.save(f"{data_path}/x_test.dat", x_test)
np.save(f"{data_path}/y_test.dat", y_test)

# Upload them to S3 bucket for SageMaker training
import boto3
s3_client = boto3.client('s3')
s3_client.upload_file(f"{data_path}/x_train.dat.npy", s3_bucket, "dataset/train/x_train.dat.npy")
s3_client.upload_file(f"{data_path}/y_train.dat.npy", s3_bucket, "dataset/train/y_train.dat.npy")
s3_client.upload_file(f"{data_path}/x_test.dat.npy", s3_bucket, "dataset/eval/x_test.dat.npy")
s3_client.upload_file(f"{data_path}/y_test.dat.npy", s3_bucket, "dataset/eval/y_test.dat.npy")

In [None]:
Training a new Tensorflow model using SageMaker Python SDK via a custom training script.
The training script is a common Tensorflow training script which is minimally retrofitted to SageMaker

In [None]:
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(
    entry_point="tf-train.py",
    role=sagemaker_role,
    instance_count=1,
    instance_type="ml.p2.xlarge",
    framework_version="2.2",
    output_path=f"{s3_folder}/output/",
    py_version="py37")

tf_estimator.fit({'train': f'{s3_folder}/dataset/train',
                  'eval': f'{s3_folder}/dataset/eval'}, logs="Training")

Deploy the model into a new SageMaker endpoint. Which involved:
- Creating a SageMaker Model
- Creating a SageMaker Model Package (making it avail on AWS SageMaker Model Console)
- Creating an Endpoint Configuration
- Creating an Endpoint

In [None]:
# Ideally, you should always register the model first and deploy from there so that all the metrics, training job, etc are 
# properly recorded. But here we just want to test, so...
predictor = tf_estimator.deploy(initial_instance_count=1, instance_type="ml.c5.xlarge")
# Record this endpoint name to be used at later section
first_end_point_name = predictor.endpoint_name

Testing the endpoint using a test data

In [None]:
import numpy as np

!aws s3 cp s3://sagemaker-experimentation/dataset/eval data/ --recursive
    
x_test = np.load(f"{data_path}/x_test.dat.npy")
y_test = np.load(f"{data_path}/y_test.dat.npy")

input = {'instances': [x_test[0].tolist()]}
result = predictor.predict(input)
print(result)

Next we are going to create a model group and register this model as one model version under that group.
You can create the group via:
1. GUI - SageMaker Resources side tab - Model Registry
2. Code - Using boto3 to create SageMaker model group. Python SDK is not supporting this yet
Tutorial -> https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-model-group.html
Here we will use boto3 to create the group

In [None]:
import time
import boto3

sm_client = boto3.client('sagemaker')

model_group_name = "MNIST-group-" + str(round(time.time()))
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_group_name,
 "ModelPackageGroupDescription" : "MNIST digit classification model group"
}

create_model_pacakge_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)


Register this model into a Model Registry under the newly created group as a new version.
Once you registered this model, you can view it from SageMaker Resources side tab - Model Registry - model group name

In [None]:
import time

model = tf_estimator.create_model(role=sagemaker_role)

create_sagemaker_model_object = False
# Optionally set create_sagemaker_model_object if you want this model to be also avail in AWS SageMaker model console
# so we can deploy as an endpoint

if create_sagemaker_model_object:
    container_def = model.prepare_container_def(instance_type="ml.c5.xlarge")
    timestamp = time.strftime("-%Y-%m-%d-%H-%M", time.gmtime())
    model_name = f"DIGITS-model-{timestamp}"
    created_model_name=sm_session.create_model(model_name, role=sagemaker_role, container_defs=container_def)
    print(created_model_name)

model.register(
    model_package_group_name=model_group_name,
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium", "ml.c5.xlarge"],
    transform_instances=["ml.c5.xlarge"],
    approval_status="Approved",
    description="Trial 1 - Epoch 10, Learning rate 0.7 Accuracy 95%"
)

In [None]:
model_group_name

Run another training session to create the second model to test model versioning within our Model Registry Group

In [None]:
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(
    entry_point="tf-train.py",
    role=sagemaker_role,
    instance_count=1,
    instance_type="ml.p2.xlarge",
    framework_version="2.2",
    output_path=f"{s3_folder}/output/",
    py_version="py37")

tf_estimator.fit({'train': f'{s3_folder}/dataset/train',
                  'eval': f'{s3_folder}/dataset/eval'}, logs="Training")

model = tf_estimator.create_model(role=sagemaker_role)

model.register(
    model_package_group_name=model_group_name,
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium", "ml.c5.xlarge"],
    transform_instances=["ml.c5.xlarge"],
    approval_status="Approved",
    description="Trial 2 - Epoch 5, Learning rate 0.3 Accuracy 91%"
)

Viewing a list of model version within a group

In [None]:
import boto3
sm_client = boto3.client('sagemaker')
model_package_list = sm_client.list_model_packages(ModelPackageGroupName=model_group_name)
print(model_package_list)

Let's get the latest model details which will get us ImageUrl and model_data_url. These 2 fields are required to deploy specific model version.

In [None]:
# Describe the model version details (eg: getting the ImageUri and model_data_uri) so we can deploy specific model version

latest_model_package_name = model_package_list['ModelPackageSummaryList'][0]['ModelPackageArn']
latest_model_version_details = sm_client.describe_model_package(ModelPackageName=latest_model_package_name)

latest_model_image_url = latest_model_version_details['InferenceSpecification']['Containers'][0]['Image']
latest_model_data_url = latest_model_version_details['InferenceSpecification']['Containers'][0]['ModelDataUrl']
latest_model_version = latest_model_version_details['ModelPackageVersion']
latest_model_package_arn = latest_model_version_details['ModelPackageArn']
print(latest_model_version_details)
print(f"Model version {latest_model_version}\nImageUrl {latest_model_image_url}\nModelDataUrl {latest_model_version}\nModelPackageArn {latest_model_package_arn}")

We are now going to deploy this latest version of the model into a new endpoint. We simply create a ModelPackage with the model_package_arn pointing to the model_package_arn of the specific model version (Note that there is a /[version] at the end of the model ARN string which indicate the version number of the model within this Model Group.

In [None]:
from sagemaker import ModelPackage

model = ModelPackage(role=sagemaker_role, model_package_arn=latest_model_package_name,
                    sagemaker_session=sm_session)

timestamp = time.strftime("-%Y-%m-%d-%H-%M", time.gmtime())

endpoint_name = f"NEW-DIGITS-model-version-{latest_model_version}-{timestamp}"
print(endpoint_name)
model.deploy(1, "ml.c5.xlarge", endpoint_name=endpoint_name)


Lets now learn how to update an existing endpoint with the new model. This is crucial for a continuous training process to deploy a new model to existing endpoint so that we do not need to notify the endpoint clients (eg: REST API) about the endpoint name changes.
So, we are going to update our first endpoint we created earlier with the latest version of our model

In [None]:
import time
import boto3
from sagemaker import Predictor
sm_client = boto3.client('sagemaker')
timestamp = time.strftime("-%Y-%m-%d-%H-%M", time.gmtime())

model_name = f'DEMO-modelregistry-model-{timestamp}'
print("Model name : {}".format(model_name))
container_list = [{'ModelPackageName': latest_model_package_name}]

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role_arn,
    Containers = container_list
)

print("Model arn : {}".format(create_model_response["ModelArn"]))
predictor = Predictor(first_end_point_name)
predictor.update_endpoint(model_name=model_name, initial_instance_count=1, instance_type="ml.c5.xlarge")