# Multi-Model Endpoint - CatBoost

This example notebook will showcase how to use a custom container to host multiple CatBoost models. 

## Load model and test local inference

Example model is taken from this [CatBoost tutorial](https://github.com/catboost/tutorials/blob/master/python_tutorial_with_tasks.ipynb). We will load up the model locally using `CatBoostClassifier()`. `test_data.csv` contains a single row of test inference data.

In [None]:
!pip install catboost

In [16]:
from catboost import CatBoostClassifier, Pool as CatboostPool, cv
import os
import pandas
model_file = CatBoostClassifier()
model_file = model_file.load_model("catboost_model.bin")
df = pandas.read_csv("test_data.csv")

In [17]:
import pandas as pd
import io
import json

out = model_file.predict_proba(df)
print(out)

[[0.0203764 0.9796236]]


## Create a model tar ball

SageMaker requires our model to be packaged in a tar.gz file.

In [18]:
! tar -czvf catboost-model.tar.gz catboost_model.bin

catboost_model.bin


## Upload 100 copies of the model to S3

Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket. 

In [19]:
import sagemaker

sess = sagemaker.Session()
s3_bucket=sess.default_bucket()  # Replace with your own bucket name if needed
print(s3_bucket)

sagemaker-us-east-1-171503325295


### Upload the model tar balls using boto3 with a unique name

In [20]:
import boto3

s3 = boto3.client('s3')
for i in range(0,100):
    with open("catboost-model.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-{}.tar.gz".format(i))


#### Upload the Bigger model

In [21]:
import boto3

s3 = boto3.client('s3')
for i in range(0,100):
    with open("./models/catboost-model-big.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-big-{}.tar.gz".format(i))



### List all models in s3 prefix we will use for our Multi-Model Endpoint

In [23]:
!aws s3 ls s3://$s3_bucket/catboost/

2022-06-26 23:41:53     184489 catboost-model-0.tar.gz
2022-06-26 23:41:53     184489 catboost-model-1.tar.gz
2022-06-26 23:41:54     184489 catboost-model-10.tar.gz
2022-06-26 23:41:54     184489 catboost-model-11.tar.gz
2022-06-26 23:41:54     184489 catboost-model-12.tar.gz
2022-06-26 23:41:54     184489 catboost-model-13.tar.gz
2022-06-26 23:41:54     184489 catboost-model-14.tar.gz
2022-06-26 23:41:54     184489 catboost-model-15.tar.gz
2022-06-26 23:41:54     184489 catboost-model-16.tar.gz
2022-06-26 23:41:54     184489 catboost-model-17.tar.gz
2022-06-26 23:41:55     184489 catboost-model-18.tar.gz
2022-06-26 23:41:55     184489 catboost-model-19.tar.gz
2022-06-26 23:41:53     184489 catboost-model-2.tar.gz
2022-06-26 23:41:55     184489 catboost-model-20.tar.gz
2022-06-26 23:41:55     184489 catboost-model-21.tar.gz
2022-06-26 23:41:55     184489 catboost-model-22.tar.gz
2022-06-26 23:41:55     184489 catboost-model-23.tar.gz
2022-06-26 23:41:55     184489 catboost-model-24.ta

## Building the custom container

The container folder in this example contains 3 files:
```
├── container
│   ├── dockerd-entrypoint.py
│   ├── Dockerfile
│   └── model_handler.py
```

- `dockerd-entrypoint.py` is the entry point script that will start the multi model server.
- `Dockerfile` contains the container definition that will be used to assemble the image. This include the packages that need to be installed.
- `model_handler.py` is the script that will contain the logic to load up the model and make inference.

Take a look through the files to see if there is any customization that you would like to do.
Below cells highlight the main part of the files. 


### Install catboost in the `Dockerfile`

In [8]:
! sed -n '26,30p' container/Dockerfile

RUN pip3 --no-cache-dir install multi-model-server \
                                sagemaker-inference \
                                retrying \
                                catboost \
                                pandas


### Update `initialize` function in `model_handler.py` with logic to load up the model
In this case we are using `CatBoostClassifier()`. Feel free to update the loading logic in this function to your needs.

In [9]:
! sed -n '22,40p' container/model_handler.py

    def initialize(self, ctx):
        start = time.time()
        self.device = 'cpu'
        
        properties = ctx.system_properties
        self.device = 'cpu'
        model_dir = properties.get('model_dir')
        
        print('model_dir {}'.format(model_dir))
        print(os.system("ls {}".format(model_dir)))

        model_file = CatBoostClassifier()
        
        onlyfiles = [f for f in os.listdir(model_dir) if os.path.isfile(os.path.join(model_dir, f)) and f.endswith(".bin")]
        print(f"Modelhandler:model_file location::{model_dir}:: files:bin:={onlyfiles} :: going to load the first one::")
        #self.model = model_file = model_file.load_model("{}/catboost_model.bin".format(model_dir))
        self.model = model_file = model_file.load_model(onlyfiles[0])




### Update `handle` function in `model_handler.py` with logic to load up the model

In [10]:
! sed -n '70,85p' container/model_handler.py

        
        start = time.time()
        inference_output = dict(enumerate(inference_output.flatten(), 0))
        print(f" perf postprocess {(time.time() - start) * 1000} ms")
        return [inference_output]
    
    def handle(self, data, context):
        """
        Call pre-process, inference and post-process functions
        :param data: input data
        :param context: mms context
        """
        start = time.time()
       
        input_data = data[0]['body'].decode()
        df = pd.read_csv(io.StringIO(input_data))


### Build and Push custom image to ECR

In [11]:
%%sh

# The name of our algorithm
algorithm_name=catboost-sagemaker-multimodel

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:4c8bbad423da767b3079bbcff7c1addbf15538e3165ccf5b7dc749465bb23ee4
The push refers to repository [171503325295.dkr.ecr.us-east-1.amazonaws.com/catboost-sagemaker-multimodel]
2ea202b7122f: Preparing
fae25108c4b1: Preparing
dd6ee5da2792: Preparing
12735dfd8b0e: Preparing
85f683319bd3: Preparing
2caf5a875703: Preparing
53b84bb5ed79: Preparing
fa60aeb2afcf: Preparing
585a1508f408: Preparing
3e549931e024: Preparing
2caf5a875703: Waiting
53b84bb5ed79: Waiting
fa60aeb2afcf: Waiting
585a1508f408: Waiting
3e549931e024: Waiting
2ea202b7122f: Layer already exists
fae25108c4b1: Layer already exists
12735dfd8b0e: Layer already exists
85f683319bd3: Layer already exists
dd6ee5da2792: Layer already exists
2caf5a875703: Layer already exists
53b84bb5ed79: Layer already exists
fa60aeb2afcf: Layer already exists
585a1508f408: Layer already exists
3e549931e024: Layer already exists
latest: digest: sha256:f662c9f7008f3dfc301c49da25224d655c8a0575554637d7c98837f530ea54bb size: 2407


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



In [27]:
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

role = get_execution_role()

### Create the SageMaker Multi-Model

In [28]:
from time import gmtime, strftime

model_name = 'catboost-multimodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 's3://{}/catboost/'.format(s3_bucket) ## MODEL S3 URL
container = '{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest'.format(account_id, region)
instance_type = 'ml.m5.xlarge'

print('Model name: ' + model_name)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'MultiModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

Model name: catboost-multimodel-2022-06-26-23-45-57
Model data Url: s3://sagemaker-us-east-1-171503325295/catboost/
Container image: 171503325295.dkr.ecr.us-east-1.amazonaws.com/catboost-sagemaker-multimodel:latest
Model Arn: arn:aws:sagemaker:us-east-1:171503325295:model/catboost-multimodel-2022-06-26-23-45-57


### Create the SageMaker Endpoint Configuration


In [29]:
endpoint_config_name = 'catboost-multimodel-config' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': instance_type,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Endpoint config name: catboost-multimodel-config2022-06-26-23-46-04
Endpoint config Arn: arn:aws:sagemaker:us-east-1:171503325295:endpoint-config/catboost-multimodel-config2022-06-26-23-46-04


### Create the SageMaker Multi-Model Endpoint

In [30]:
%%time

import time

endpoint_name = 'catboost-multimodel-endpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

Endpoint name: catboost-multimodel-endpoint-2022-06-26-23-46-11
Endpoint Arn: arn:aws:sagemaker:us-east-1:171503325295:endpoint/catboost-multimodel-endpoint-2022-06-26-23-46-11
Endpoint Status: Creating
Waiting for catboost-multimodel-endpoint-2022-06-26-23-46-11 endpoint to be in service...
CPU times: user 219 ms, sys: 9.56 ms, total: 229 ms
Wall time: 7min 31s


### Invoke each of the smaller 100 models

In [47]:
for i in range (0,100):
        response = runtime_sm_client.invoke_endpoint(
                EndpointName=endpoint_name,
                TargetModel="catboost-model-{}.tar.gz".format(i),
                Body=df.to_csv(index=False))
        print(json.loads(response['Body'].read().decode('utf-8')))

{'0': 0.020376404499626855, '1': 0.9796235955003731}


### Invoke one of the smaller models 1000s times 


In [52]:
import time
import numpy as np
results = []
for i in range(0,1000):
    start = time.time()
    response = runtime_sm_client.invoke_endpoint(
                EndpointName=endpoint_name,
                TargetModel="catboost-model-1.tar.gz",
                Body=df.to_csv(index=False))
    results.append((time.time() - start) * 1000)
print("\nPredictions for smaller model (end to end): \n")
print('\nP95: ' + str(np.percentile(results, 95)) + ' ms\n')    
print('P90: ' + str(np.percentile(results, 90)) + ' ms\n')
print('Average: ' + str(np.average(results)) + ' ms\n')


Predictions for smaller model (end to end): 


P95: 37.55887746810912 ms

P90: 33.966064453125 ms

Average: 28.13196039199829 ms



*It is also possible to add new models on demand by uploading the tar balls to the S3 prefix*.

### Invoke the bigger model to Test same container managing multiple model types

In [25]:
import pandas as pd
continue_var = ['I' + str(i) for i in range(1, 14)]
cat_features = ['C' + str(i) for i in range(1,27)]
col_names = ['Label'] + continue_var + cat_features

test_data_set_end_point = pd.read_csv('./data/dac_sample_small.txt', sep='\t', names=col_names).fillna(0)
test_data_set_end_point = test_data_set_end_point.iloc[:, 1:] # remove the LABEL for predictions 

payload=test_data_set_end_point.to_csv(index=False)
len(payload)

2638

In [32]:
response = runtime_sm_client.invoke_endpoint(
            EndpointName=endpoint_name,
            TargetModel="catboost-model-big-{}.tar.gz".format(i),
            Body=payload)
print(json.loads(response['Body'].read().decode('utf-8')))

{'0': 0.0009840355032754156, '1': 0.9990159644967246, '2': 0.015149877666872924, '3': 0.9848501223331271, '4': 0.890075999848484, '5': 0.10992400015151606, '6': 0.9999999998034538, '7': 1.9654622555312898e-10, '8': 0.9838270558728023, '9': 0.016172944127197726, '10': 0.000477799986809635, '11': 0.9995222000131904, '12': 0.9999999999637554, '13': 3.6244516847519683e-11, '14': 0.999989377804934, '15': 1.0622195066015204e-05, '16': 0.9999999993004461, '17': 6.995538781619641e-10, '18': 0.9999189651246768, '19': 8.103487532315886e-05}


### Invoke one of the bigger models 1000s times 


In [53]:
import time
import numpy as np
results = []
for i in range(0,1000):
    start = time.time()
    response = runtime_sm_client.invoke_endpoint(
                EndpointName=endpoint_name,
                TargetModel="catboost-model-big-1.tar.gz",
                Body=payload)
    results.append((time.time() - start) * 1000)
print("\nPredictions for big model (end to end): \n")
print('\nP95: ' + str(np.percentile(results, 95)) + ' ms\n')    
print('P90: ' + str(np.percentile(results, 90)) + ' ms\n')
print('Average: ' + str(np.average(results)) + ' ms\n')


Predictions for big model (end to end): 


P95: 39.060032367706285 ms

P90: 36.721587181091316 ms

Average: 32.29228758811951 ms

