# Bring your own container

This notebook shows an example of bring your own container. This example leverages the MultiModelServer to host MME and this example 
can be further modified and adapted to fit your needs


# Multi-Model Endpoint - CatBoost

This example notebook also showcases how to use a custom container to host multiple CatBoost models on a SageMaker Multi Model Endpoint. The model this notebook deploys is taken from this [CatBoost tutorial](https://github.com/catboost/tutorials/blob/master/python_tutorial_with_tasks.ipynb). 

We are using this framework as an example to demonstrate deployment and serving using MultiModel Endpoint and showcase the capability. This notebook can be extended to any framework.

Catboost is gaining in popularity and is not yet supported as a framework for SageMaker MultiModelEndpoint. Further this example serves to demostrate how to bring your own container to a MultiModelEndpoint

In this Notebook we will use identical model to simulate multiple models for loading and inference

## Prerequisites
### Packages and Permissions
The SageMaker SDK uses the SageMaker default S3 bucket when needed. If the get_execution_role does not return a role with the appropriate permissions, you'll need to specify an IAM role ARN that does. Please make sure the `SageMakerFullAccess` policy is attached to the execution role you are using.

## Load model and test local inference
Here, install `catboost` to test we can load up the model locally and make inference. 

We load up the model locally using `CatBoostClassifier()`. `test_data.csv` contains a single row of test inference data.

In [1]:
# Cell 01

!pip install catboost

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com, https://pypi.ngc.nvidia.com
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p38/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

In [2]:
# Cell 02
from catboost import CatBoostClassifier, Pool as CatboostPool, cv
import os
import pandas

model_file = CatBoostClassifier()
model_file = model_file.load_model("./models/mme_catboost/catboost_model.bin")
df = pandas.read_csv("./data/mme_catboost/test_data.csv")
df.head(2)

Unnamed: 0,RESOURCE,MGR_ID,ROLE_ROLLUP_1,ROLE_ROLLUP_2,ROLE_DEPTNAME,ROLE_TITLE,ROLE_FAMILY_DESC,ROLE_FAMILY,ROLE_CODE
0,37793,81744,117902,117903,118783,118451,130134,118453,118454


In [None]:
# Cell 03
import pandas as pd
import io
import json

out = model_file.predict_proba(df)
print(out)

## Upload tar ball to s3


### Create a model tar ball

SageMaker requires our model to be packaged in a tar.gz file.

In [None]:
# Cell 04
!cd models/mme_catboost && tar -czvf catboost-model.tar.gz catboost_model.bin

### Upload 100 copies of the model to S3

Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket. 

This is a simulation of having 100 different models which we need to use to predict. In reality you would probably have each of these models trained separately

In [3]:
# Cell 05
import sagemaker

sess = sagemaker.Session()
s3_bucket = sess.default_bucket()  # Replace with your own bucket name if needed
print(s3_bucket)

sagemaker-us-east-1-622343165275


### Upload the model tar balls using boto3 with a unique name

In [None]:
# Cell 06
import boto3

s3 = boto3.client("s3")
for i in range(0, 100):
    with open("models/mme_catboost/catboost-model.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-{}.tar.gz".format(i))

print("Models:uploaded and ready for use")

In [4]:
# Cell 06
import boto3

s3 = boto3.client("s3")
for i in range(0, 140):
    with open("models/mme_catboost/catboost-model-big.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost_large/catboost-model-big-{}.tar.gz".format(i))

print("Models:uploaded and ready for use")

Models:uploaded and ready for use


### List all models in s3 prefix we will use for our Multi-Model Endpoint

In [5]:
# Cell 07
!aws s3 ls s3://$s3_bucket/catboost/catboost_large/

2022-10-17 20:12:12          0 
2022-10-17 20:13:49    9623121 catboost-model-big-0.tar.gz
2022-10-17 20:13:49    9623121 catboost-model-big-1.tar.gz
2022-10-17 20:13:52    9623121 catboost-model-big-10.tar.gz
2022-10-17 20:14:24    9623121 catboost-model-big-100.tar.gz
2022-10-17 20:14:24    9623121 catboost-model-big-101.tar.gz
2022-10-17 20:14:24    9623121 catboost-model-big-102.tar.gz
2022-10-17 20:14:25    9623121 catboost-model-big-103.tar.gz
2022-10-17 20:14:25    9623121 catboost-model-big-104.tar.gz
2022-10-17 20:14:25    9623121 catboost-model-big-105.tar.gz
2022-10-17 20:14:26    9623121 catboost-model-big-106.tar.gz
2022-10-17 20:14:26    9623121 catboost-model-big-107.tar.gz
2022-10-17 20:14:26    9623121 catboost-model-big-108.tar.gz
2022-10-17 20:14:27    9623121 catboost-model-big-109.tar.gz
2022-10-17 20:13:53    9623121 catboost-model-big-11.tar.gz
2022-10-17 20:14:27    9623121 catboost-model-big-110.tar.gz
2022-10-17 20:14:27    9623121 catboost-model-big-111.tar.g

## Building the custom container

The container folder in this example contains 3 files:
```
├── container
│   ├── dockerd-entrypoint.py
│   ├── Dockerfile
│   └── model_handler.py
```

- `dockerd-entrypoint.py` is the entry point script that will start the multi model server.
- `Dockerfile` contains the container definition that will be used to assemble the image. This includes the packages that need to be installed.
- `model_handler.py` is the script that will contain the logic to load up the model and make inference.

Take a look through the files to see if there is any customization that you would like to do.
Below cells highlight the main part of the files. 


### Install catboost in the `Dockerfile`

In [None]:
# Cell 08
! sed -n '26,30p' container/Dockerfile

### Update `initialize` function in `model_handler.py` with logic to load up the model
In this case we are using `CatBoostClassifier()`. Feel free to update the loading logic in this function to your needs.

In [None]:
# Cell 09
! sed -n '22,40p' container/model_handler.py

### Update `handle` function in `model_handler.py` with logic to load up the model

In [None]:
# Cell 10
! sed -n '70,85p' container/model_handler.py

### Build and Push the custom image to ECR

**This steps takes atleast 5-6 minutes so please be patient and ignore any "warnings" **

In [None]:
%%sh
# Cell 11

echo "Starting Docker Build"

# The name of our algorithm
algorithm_name=catboost-sagemaker-multimodel

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-east-1 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

echo "fullname:image=${fullname}"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

echo "Starting the Docker Build with ${algorithm_name}"
docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

echo "Pushing Docker image ${fullname} to ECR "
docker push ${fullname}

### Deploy Multi Model Endpoint

In [7]:
# Cell 12
from sagemaker import get_execution_role

sm_client = boto3.client(service_name="sagemaker")
runtime_sm_client = boto3.client(service_name="sagemaker-runtime")

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name

role = get_execution_role()
print(role)

arn:aws:iam::622343165275:role/service-role/AmazonSageMaker-ExecutionRole-20220208T115633


### Create the SageMaker Multi-Model

In [46]:
# Cell 13
from time import gmtime, strftime

model_name = "catboost-multimodel-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = "s3://{}/catboost/catboost_large/".format(s3_bucket)  ## MODEL S3 URL
container = "{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest".format(
    account_id, region
)
instance_type = "ml.m5d.4xlarge" #"ml.m5.xlarge"

print("Model name: " + model_name)
print("Model data Url: " + model_url)
print("Container image: " + container)

container = {"Image": container, "ModelDataUrl": model_url, "Mode": "MultiModel"}

create_model_response = sm_client.create_model(
    ModelName=model_name, ExecutionRoleArn=role, Containers=[container]
)

print("Model ARN: " + create_model_response["ModelArn"])

Model name: catboost-multimodel-2022-10-17-20-55-16
Model data Url: s3://sagemaker-us-east-1-622343165275/catboost/catboost_large/
Container image: 622343165275.dkr.ecr.us-east-1.amazonaws.com/catboost-sagemaker-multimodel:latest
Model ARN: arn:aws:sagemaker:us-east-1:622343165275:model/catboost-multimodel-2022-10-17-20-55-16


### Create the SageMaker Endpoint Configuration


In [47]:
# Cell 14
endpoint_config_name = "catboost-multimodel-config" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint config name: " + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": instance_type,
            "InitialInstanceCount": 1,
            "InitialVariantWeight": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint config ARN: " + create_endpoint_config_response["EndpointConfigArn"])

Endpoint config name: catboost-multimodel-config2022-10-17-20-55-20
Endpoint config ARN: arn:aws:sagemaker:us-east-1:622343165275:endpoint-config/catboost-multimodel-config2022-10-17-20-55-20


### Create the SageMaker Multi-Model Endpoint

**This step will take a couple of minutes**

In [48]:
%%time
# Cell 15

import time

endpoint_name = "catboost-multimodel-endpoint-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Endpoint name: " + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Endpoint Status: " + status)

print("Waiting for {} endpoint to be in service...".format(endpoint_name))
waiter = sm_client.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=endpoint_name)

print("Created {} endpoint is in Service and read to invoke ...".format(endpoint_name))

Endpoint name: catboost-multimodel-endpoint-2022-10-17-20-55-22
Endpoint Arn: arn:aws:sagemaker:us-east-1:622343165275:endpoint/catboost-multimodel-endpoint-2022-10-17-20-55-22
Endpoint Status: Creating
Waiting for catboost-multimodel-endpoint-2022-10-17-20-55-22 endpoint to be in service...
Created catboost-multimodel-endpoint-2022-10-17-20-55-22 endpoint is in Service and read to invoke ...
CPU times: user 51 ms, sys: 2.54 ms, total: 53.6 ms
Wall time: 2min


In [11]:
!pwd

/home/ec2-user/SageMaker/sagemaker-im/deploy/mme


In [14]:
df

Unnamed: 0,RESOURCE,MGR_ID,ROLE_ROLLUP_1,ROLE_ROLLUP_2,ROLE_DEPTNAME,ROLE_TITLE,ROLE_FAMILY_DESC,ROLE_FAMILY,ROLE_CODE
0,37793,81744,117902,117903,118783,118451,130134,118453,118454


### Invoke each of the 100 models
We have identical models here to simulate multiple models belonging to the same framework

In [15]:
import pandas as pd
continue_var = ['I' + str(i) for i in range(1, 14)]
cat_features = ['C' + str(i) for i in range(1,27)]
col_names = ['Label'] + continue_var + cat_features

# "./data/mme_catboost/test_data.csv")
test_data_set_end_point = pd.read_csv('./data/mme_catboost/dac_sample_small.txt', sep='\t', names=col_names).fillna(0)
test_data_set_end_point = test_data_set_end_point.iloc[:, 1:] # remove the LABEL for predictions 

payload=test_data_set_end_point.to_csv(index=False)
len(payload)

2638

In [16]:
test_data_set_end_point

Unnamed: 0,I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,...,C17,C18,C19,C20,C21,C22,C23,C24,C25,C26
0,1.0,1,5.0,0.0,1382,4.0,15,2,181,1.0,...,e5ba7672,f54016b9,21ddcdc9,b1252a9d,07b5194c,0,3a171ecb,c5c50484,e8b83407,9727dd16
1,2.0,0,44.0,1.0,102,8.0,2,2,4,1.0,...,07c540c4,b04e4670,21ddcdc9,5840adea,60f6221e,0,3a171ecb,43f13e8b,e8b83407,731c3655
2,2.0,0,1.0,14.0,767,89.0,4,2,245,1.0,...,8efede7f,3412118d,0,0,e587c466,ad3062eb,3a171ecb,3b183c5c,0,0
3,0.0,893,0.0,0.0,4392,0.0,0,0,0,0.0,...,1e88c74f,74ef3502,0,0,6b3a5ca6,0,3a171ecb,9117a34a,0,0
4,3.0,-1,0.0,0.0,2,0.0,3,0,0,1.0,...,1e88c74f,26b3c7a7,0,0,21c9516a,0,32c7478e,b34f3128,0,0
5,0.0,-1,0.0,0.0,12824,0.0,0,0,6,0.0,...,776ce399,92555263,0,0,242bb710,8ec974f4,be7c41b4,72c78f11,0,0
6,0.0,1,2.0,0.0,3168,0.0,0,1,2,0.0,...,776ce399,cdfa8259,0,0,20062612,0,93bad2c0,1b256e61,0,0
7,1.0,4,2.0,0.0,0,0.0,1,0,0,1.0,...,e5ba7672,74ef3502,0,0,5316a17f,0,32c7478e,9117a34a,0,0
8,0.0,44,4.0,8.0,19010,249.0,28,31,141,0.0,...,e5ba7672,42a2edb9,0,0,0014c32a,0,32c7478e,3b183c5c,0,0
9,0.0,35,0.0,1.0,33737,21.0,1,2,3,0.0,...,d4bb7bd8,70d0f5f9,0,0,0e63fca0,0,32c7478e,0e8fe315,0,0


In [18]:
%%time
import json
response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        TargetModel="catboost-model-big-1.tar.gz".format(i),
        Body=payload, #df.to_csv(index=False),
)
json.loads(response["Body"].read().decode("utf-8"))

CPU times: user 11.3 ms, sys: 0 ns, total: 11.3 ms
Wall time: 66 ms


{'0': 0.0009840355032754156,
 '1': 0.9990159644967246,
 '2': 0.015149877666872924,
 '3': 0.9848501223331271,
 '4': 0.890075999848484,
 '5': 0.10992400015151606,
 '6': 0.9999999998034538,
 '7': 1.9654622555312898e-10,
 '8': 0.9838270558728023,
 '9': 0.016172944127197726,
 '10': 0.000477799986809635,
 '11': 0.9995222000131904,
 '12': 0.9999999999637554,
 '13': 3.6244516847519683e-11,
 '14': 0.999989377804934,
 '15': 1.0622195066015204e-05,
 '16': 0.9999999993004461,
 '17': 6.995538781619641e-10,
 '18': 0.9999189651246768,
 '19': 8.103487532315886e-05}

In [19]:
%%time
# Cell 16
from datetime import datetime
import time
for i in range(0, 140):
    start_time = datetime.now()
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        TargetModel="catboost-model-big-{}.tar.gz".format(i),
        Body=payload, #df.to_csv(index=False),
    )
    time_delta = (datetime.now()-start_time).total_seconds() * 1000 
    time_delta = "{:.2f}".format(time_delta)
    print(f'Time={time_delta}')
    #print(f'Time={time_delta} --- > ::{json.loads(response["Body"].read().decode("utf-8"))}')

Time=22.45 --- > ::{'0': 0.0009840355032754156, '1': 0.9990159644967246, '2': 0.015149877666872924, '3': 0.9848501223331271, '4': 0.890075999848484, '5': 0.10992400015151606, '6': 0.9999999998034538, '7': 1.9654622555312898e-10, '8': 0.9838270558728023, '9': 0.016172944127197726, '10': 0.000477799986809635, '11': 0.9995222000131904, '12': 0.9999999999637554, '13': 3.6244516847519683e-11, '14': 0.999989377804934, '15': 1.0622195066015204e-05, '16': 0.9999999993004461, '17': 6.995538781619641e-10, '18': 0.9999189651246768, '19': 8.103487532315886e-05}
Time=15.83 --- > ::{'0': 0.0009840355032754156, '1': 0.9990159644967246, '2': 0.015149877666872924, '3': 0.9848501223331271, '4': 0.890075999848484, '5': 0.10992400015151606, '6': 0.9999999998034538, '7': 1.9654622555312898e-10, '8': 0.9838270558728023, '9': 0.016172944127197726, '10': 0.000477799986809635, '11': 0.9995222000131904, '12': 0.9999999999637554, '13': 3.6244516847519683e-11, '14': 0.999989377804934, '15': 1.0622195066015204e-05

In [49]:
def _invoke_endpoint(i):
    target_model = "catboost-model-big-{}.tar.gz".format(i)
    response = runtime_sm_client.invoke_endpoint(
            EndpointName=endpoint_name,
            TargetModel=target_model,
            Body=payload)
    print(target_model)
    result = json.loads(response["Body"].read().decode("utf-8")) # json.loads(response[‘Body’].read())
    return result


In [50]:
args = list(range(0,140))

In [51]:
_invoke_endpoint(1)

catboost-model-big-1.tar.gz


{'0': 0.0009840355032754156,
 '1': 0.9990159644967246,
 '2': 0.015149877666872924,
 '3': 0.9848501223331271,
 '4': 0.890075999848484,
 '5': 0.10992400015151606,
 '6': 0.9999999998034538,
 '7': 1.9654622555312898e-10,
 '8': 0.9838270558728023,
 '9': 0.016172944127197726,
 '10': 0.000477799986809635,
 '11': 0.9995222000131904,
 '12': 0.9999999999637554,
 '13': 3.6244516847519683e-11,
 '14': 0.999989377804934,
 '15': 1.0622195066015204e-05,
 '16': 0.9999999993004461,
 '17': 6.995538781619641e-10,
 '18': 0.9999189651246768,
 '19': 8.103487532315886e-05}

In [None]:
64g -- 16 vCPU -- MODELS -- 179 mb  x 140 = 25200 mb  -- 25G < 64G hence no queue and no thrashing 

Largest is 600 MB -- 
Sum of all cells is 9.5 GB without the largest 2 models -- 

-- sequentially no errors if pre warmed  --
ModuleNotFoundError(""No module named 'faiss.swigfaiss_avx2'"")"

In [52]:
%%time
import os
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
    _ = executor.map(_invoke_endpoint, args)

catboost-model-big-1.tar.gz
catboost-model-big-0.tar.gz
catboost-model-big-4.tar.gz
catboost-model-big-3.tar.gz
catboost-model-big-2.tar.gz
catboost-model-big-5.tar.gz
catboost-model-big-6.tar.gz
catboost-model-big-7.tar.gz
catboost-model-big-8.tar.gz
catboost-model-big-9.tar.gz
catboost-model-big-10.tar.gz
catboost-model-big-11.tar.gz
catboost-model-big-12.tar.gz
catboost-model-big-13.tar.gz
catboost-model-big-14.tar.gz
catboost-model-big-15.tar.gz
catboost-model-big-16.tar.gz
catboost-model-big-17.tar.gz
catboost-model-big-18.tar.gz
catboost-model-big-19.tar.gz
catboost-model-big-20.tar.gz
catboost-model-big-21.tar.gz
catboost-model-big-22.tar.gz
catboost-model-big-23.tar.gz
catboost-model-big-24.tar.gz
catboost-model-big-25.tar.gz
catboost-model-big-26.tar.gz
catboost-model-big-27.tar.gz
catboost-model-big-28.tar.gz
catboost-model-big-29.tar.gz
catboost-model-big-30.tar.gz
catboost-model-big-31.tar.gz
catboost-model-big-32.tar.gz
catboost-model-big-33.tar.gz
catboost-model-big-34.ta

In [44]:
%%time
import os
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
    _ = executor.map(_invoke_endpoint, args)

catboost-model-big-1.tar.gz
catboost-model-big-0.tar.gz
catboost-model-big-3.tar.gz
catboost-model-big-2.tar.gz
catboost-model-big-4.tar.gz
catboost-model-big-5.tar.gz
catboost-model-big-6.tar.gz
catboost-model-big-7.tar.gz
catboost-model-big-8.tar.gz
catboost-model-big-9.tar.gz
catboost-model-big-10.tar.gz
catboost-model-big-11.tar.gz
catboost-model-big-12.tar.gz
catboost-model-big-13.tar.gz
catboost-model-big-14.tar.gz
catboost-model-big-15.tar.gz
catboost-model-big-16.tar.gz
catboost-model-big-17.tar.gz
catboost-model-big-18.tar.gz
catboost-model-big-19.tar.gz
catboost-model-big-20.tar.gz
catboost-model-big-21.tar.gz
catboost-model-big-22.tar.gz
catboost-model-big-23.tar.gz
catboost-model-big-24.tar.gz
catboost-model-big-25.tar.gz
catboost-model-big-26.tar.gz
catboost-model-big-27.tar.gz
catboost-model-big-28.tar.gz
catboost-model-big-29.tar.gz
catboost-model-big-30.tar.gz
catboost-model-big-31.tar.gz
catboost-model-big-32.tar.gz
catboost-model-big-33.tar.gz
catboost-model-big-34.ta

### Invoke just one of models 1000 times 
Since the models are in memory and loaded, these invocations should not have any latency 


In [None]:
# Cell 17
import numpy as np

print("Starting invocation for model::catboost-model-1.tar.gz, please wait ...")
results = []
for i in range(0, 1000):
    start = time.time()
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        TargetModel="catboost-model-big-1.tar.gz",
        Body=payload# df.to_csv(index=False),
    )
    results.append((time.time() - start) * 1000)
print("\nPredictions for model latency: \n")
print("\nP95: " + str(np.percentile(results, 95)) + " ms\n")
print("P90: " + str(np.percentile(results, 90)) + " ms\n")
print("Average: " + str(np.average(results)) + " ms\n")

## Optional Clean up
Clean up and delete the end point

In [45]:
# delete the end point
# Cell 18

sm_client.delete_endpoint(EndpointName=endpoint_name)

{'ResponseMetadata': {'RequestId': 'b61c15e2-ab99-4284-bc89-93db2fd37054',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'b61c15e2-ab99-4284-bc89-93db2fd37054',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Mon, 17 Oct 2022 20:55:01 GMT'},
  'RetryAttempts': 0}}