## Set up the environment
Define the S3 bucket and prefix where the model artifacts that will be invokable by your multi-model endpoint will be located.

Also define the IAM role that will give SageMaker access to the model artifacts and ECR image that was created above.

!pip install -qU awscli boto3 sagemaker

In [1]:
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name="sagemaker")
runtime_sm_client = boto3.client(service_name="sagemaker-runtime")

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name

bucket = "sagemaker-{}-{}".format(region, account_id)
prefix = "demo-multimodel-endpoint"

role = "arn:aws:iam::171774164293:role/service-role/AmazonSageMaker-ExecutionRole-20200608T073821" # get_execution_role()

## Create a multi-model endpoint
### Import models into hosting
When creating the Model entity for multi-model endpoints, the container's `ModelDataUrl` is the S3 prefix where the model artifacts that are invokable by the endpoint are located. The rest of the S3 path will be specified when invoking the model.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

In [73]:
from time import gmtime, strftime

model_name = "DEMO-MultiModelModel" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = "https://tigermle-explorations.s3.amazonaws.com/lenin/flask_on_sagemaker/multi_model/"
container = "171774164293.dkr.ecr.us-east-1.amazonaws.com/sagemaker:demo-sagemaker-multimodel"

print("Model name: " + model_name)
print("Model data Url: " + model_url)
print("Container image: " + container)

container = {"Image": container, "ModelDataUrl": model_url, "Mode": "MultiModel"}

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, Containers=[container]
)

print("Model Arn: " + create_model_response["ModelArn"])

Model name: DEMO-MultiModelModel2021-08-30-09-46-49
Model data Url: https://tigermle-explorations.s3.amazonaws.com/lenin/flask_on_sagemaker/multi_model/
Container image: 171774164293.dkr.ecr.us-east-1.amazonaws.com/sagemaker:demo-sagemaker-multimodel
Model Arn: arn:aws:sagemaker:us-east-1:171774164293:model/demo-multimodelmodel2021-08-30-09-46-49


### Create endpoint configuration
Endpoint config creation works the same way it does as single model endpoints.

In [74]:
endpoint_config_name = "DEMO-MultiModelEndpointConfig-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint config name: " + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.m5.xlarge",
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

Endpoint config name: DEMO-MultiModelEndpointConfig-2021-08-30-09-46-50
Endpoint config Arn: arn:aws:sagemaker:us-east-1:171774164293:endpoint-config/demo-multimodelendpointconfig-2021-08-30-09-46-50


### Create endpoint
Similarly, endpoint creation works the same way as for single model endpoints.

In [75]:
import time

endpoint_name = "DEMO-MultiModelEndpoint-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint name: " + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Endpoint Status: " + status)

print("Waiting for {} endpoint to be in service...".format(endpoint_name))
waiter = sm_client.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=endpoint_name)

Endpoint name: DEMO-MultiModelEndpoint-2021-08-30-09-46-51
Endpoint Arn: arn:aws:sagemaker:us-east-1:171774164293:endpoint/demo-multimodelendpoint-2021-08-30-09-46-51
Endpoint Status: Creating
Waiting for DEMO-MultiModelEndpoint-2021-08-30-09-46-51 endpoint to be in service...


## Model 1

In [134]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    TargetModel="model.tar.gz",  # this is the rest of the S3 path where the model artifacts are located
    Body=js,
)

response_df = pd.DataFrame(json.loads(response["Body"].read()))
response_df

CPU times: user 19.6 ms, sys: 0 ns, total: 19.6 ms
Wall time: 82.2 ms


Unnamed: 0,col 1,col 2
row 1,a,b
row 2,x,v


## Model 2

In [None]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    TargetModel="model_2.tar.gz",  # this is the rest of the S3 path where the model artifacts are located
    Body=js,
)

response_df = pd.DataFrame(json.loads(response["Body"].read()))
response_df

## Model 3

In [None]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    TargetModel="model_3.tar.gz",  # this is the rest of the S3 path where the model artifacts are located
    Body=js,
)

response_df = pd.DataFrame(json.loads(response["Body"].read()))
response_df

## Model 4

In [139]:
%%time

import json

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    TargetModel="model_4.tar.gz",  # this is the rest of the S3 path where the model artifacts are located
    Body=js,
)

response_df = pd.DataFrame(json.loads(response["Body"].read()))
response_df

CPU times: user 18.7 ms, sys: 68 µs, total: 18.8 ms
Wall time: 448 ms


Unnamed: 0,col 1,col 2,Source
row 1,a,b,From Model 4
row 2,c,d,From Model 4
