# Sagemaker multi model endpoints (MME) lab

#### In this notebook you will create a SageMaker MME endpoint using pre-existing model artifacts. You will learn how to invoke the models hosted on an MME endpoint and also how to add a new model on the fly

## Imports and initializations

In [1]:
import boto3
import sagemaker
import time

In [2]:
from sagemaker.image_uris import retrieve
from time import gmtime, strftime
from sagemaker.amazon.amazon_estimator import image_uris

sagemaker_session = sagemaker.Session()
default_bucket = sagemaker_session.default_bucket()


region = sagemaker_session.boto_region_name
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")
role = sagemaker.get_execution_role()

# S3 locations used for parameterizing the notebook run
model_prefix = "XGBOOST_BOSTON_HOUSING/multi_model_artifacts"

# S3 location of trained model artifact
model_artifacts = f"s3://{default_bucket}/{model_prefix}/"
#model_artifacts = 's3://amazon-lakeformation-forecast-blog-artifacts/mme-immersion-day/multi_model_artifacts'
print(model_artifacts)


# Location
location = ['Chicago_IL', 'Houston_TX', 'LosAngeles_CA']

test_data = [1997, 2527, 6, 2.5, 0.57, 1]

s3://sagemaker-us-west-1-650222655237/XGBOOST_BOSTON_HOUSING/multi_model_artifacts/


## Copy all the pre-trained models into the local S3 bucket

In [3]:
s3 = boto3.resource('s3')
bucket = s3.Bucket(default_bucket)

for i in range (0,3):
    copy_source = {'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': f'mme-immersion-day/multi_model_artifacts/{location[i]}.tar.gz'}
    print (copy_source)
    bucket.copy(copy_source, f"{model_prefix}/{location[i]}.tar.gz")


{'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': 'mme-immersion-day/multi_model_artifacts/Chicago_IL.tar.gz'}
{'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': 'mme-immersion-day/multi_model_artifacts/Houston_TX.tar.gz'}
{'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': 'mme-immersion-day/multi_model_artifacts/LosAngeles_CA.tar.gz'}


## Create a SageMaker model object

Here we are creating a SageMaker model object that specifies the container to use and the S3 location where the model.tar.gz files are located. The key items to note are:
* "Mode" = "MultiModel". This tells Sagemaker to setup this model configuration object for an MME endpoint
* "ModelDataUrl" = model_artifacts (a variable that points to the S3 bucket location where model.tar.gz files are)

In [4]:
# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")

# Specify an unique model name that does not exist
model_name = "housing-prices-prediction-xgb"
primary_container = {
                     "Image": training_image,
                     "ModelDataUrl": model_artifacts,
                     "Mode": "MultiModel"
                    }

model = sm_client.create_model(ModelName=model_name,
                                   PrimaryContainer=primary_container,
                                   ExecutionRoleArn=role)


## Create SageMaker endpoint configuration

The endpoint configuration specifies the infrastructure that will run behind your MME endpoint

In [5]:
# Endpoint Config name
endpoint_config_name = f"{model_name}-endpoint-config"

endpoint_config_response = sm_client.create_endpoint_config(EndpointConfigName=endpoint_config_name,
                                                            ProductionVariants=[
                                                                    {
                                                                        "InstanceType": "ml.m5.xlarge",
                                                                        "InitialInstanceCount": 1,
                                                                        "InitialVariantWeight": 1,
                                                                        "ModelName": model_name,
                                                                        "VariantName": "AllTraffic",
                                                                    }
                                                                ]
                                                            )

## Create SageMaker endpoint

In [6]:
# Endpoint name
endpoint_name = f"{model_name}-endpoint"

endpoint_response = sm_client.create_endpoint(EndpointName=endpoint_name,
                                              EndpointConfigName=endpoint_config_name
                                             )  
 
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
    print(f"Endpoint Status: {status}...")
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")


Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: Creating...
Endpoint Status: InService


## CHECKPOINT: Go to the Endpoints section within SageMaker Studio to see the endpoint being created

## Invoke the multi model endpoint

Here we are invoking 3 city models (Chicago, Houston, LosAngeles) - all 3 are hosted on the single MME endpoint by the fact that the model files are located in the S3 bucket that the SageMaker model (created earlier) is looking at

In [7]:
payload = ' '.join([str(elem) for elem in test_data])
print('payload= '+payload)
for i in range (0,3):
    start_time = time.time()
    predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"{location[i]}.tar.gz", ContentType="text/csv", Body=payload)
    duration = time.time() - start_time
    print(f"Predicted Value for {location[i]} target model:\n ${predicted_value['Body'].read().decode('utf-8')}")
    print("took {:,d} ms\n".format(int(duration * 1000)))

payload= 1997 2527 6 2.5 0.57 1
Predicted Value for Chicago_IL target model:
 $[392504.75]
took 1,459 ms

Predicted Value for Houston_TX target model:
 $[387296.5625]
took 1,157 ms

Predicted Value for LosAngeles_CA target model:
 $[379517.5]
took 1,099 ms



## Invoke the new NewYork model (with error)
At first you will get an error because the model artifact for the NewYork model doesn't exist in the model_artifacts bucket. This is intentional to make it clear that the new model artifact needs to exist in the S3 bucket

In [8]:
predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"NewYork_NY.tar.gz", ContentType="text/csv", Body=payload)
print(f"Predicted Value for NewYork_NY target model:\n ${predicted_value['Body'].read().decode('utf-8')}")

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Failed to download model data(bucket: sagemaker-us-west-1-650222655237, key: XGBOOST_BOSTON_HOUSING/multi_model_artifacts/NewYork_NY.tar.gz). Please ensure that there is an object located at the URL and that the role passed to CreateModel has permissions to download the model.


## Copy the NewYork model into the MME S3 bucket

Here we are copying the new model (NewYork_NY.tar.gz) to the S3 bucket registered with the MME endpoint

In [9]:
s3 = boto3.resource('s3')
bucket = s3.Bucket(default_bucket)

copy_source = {'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': 'mme-immersion-day/multi_model_artifacts/newyork/NewYork_NY.tar.gz'}
print (copy_source)
bucket.copy(copy_source, f'{model_prefix}/NewYork_NY.tar.gz')


{'Bucket': 'amazon-lakeformation-forecast-blog-artifacts', 'Key': 'mme-immersion-day/multi_model_artifacts/newyork/NewYork_NY.tar.gz'}


## Invoke the new model (with success)

Note: Wait a few mins for S3's eventual consistency

In [16]:
start_time = time.time()
predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"NewYork_NY.tar.gz", ContentType="text/csv", Body=payload)
duration = time.time() - start_time
print(f"Predicted Value for NewYork_NY target model:\n ${predicted_value['Body'].read().decode('utf-8')}")
print("took {:,d} ms\n".format(int(duration * 1000)))


Predicted Value for NewYork_NY target model:
 $[390451.53125]
took 1,090 ms



## Invoke NewYork model again to see latency difference

Here we invoke the NewYork model again to see the latency being significantly lower than the first invoke because on the first time, the model had to be downloaded from S3 but on the second invoke the model was already in memory

In [17]:
start_time = time.time()
predicted_value = sm_runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel=f"NewYork_NY.tar.gz", ContentType="text/csv", Body=payload)
duration = time.time() - start_time
print(f"Predicted Value for NewYork_NY target model:\n ${predicted_value['Body'].read().decode('utf-8')}")
print("took {:,d} ms\n".format(int(duration * 1000)))


Predicted Value for NewYork_NY target model:
 $[390451.53125]
took 14 ms



## Cleanup

In [18]:
# Delete model
sm_client.delete_model(ModelName=model_name)

# Delete endpoint configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# Delete endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)


{'ResponseMetadata': {'RequestId': '372a2fc7-cb62-40fe-b392-7c4e4dd26698',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '372a2fc7-cb62-40fe-b392-7c4e4dd26698',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Sun, 14 Aug 2022 06:42:53 GMT'},
  'RetryAttempts': 0}}