# Multi-Model Endpoint - CatBoost

This example notebook will showcase how to use a custom container to host multiple CatBoost models. 

## Load model and test local inference

Example model is taken from this [CatBoost tutorial](https://github.com/catboost/tutorials/blob/master/python_tutorial_with_tasks.ipynb). We will load up the model locally using `CatBoostClassifier()`. `test_data.csv` contains a single row of test inference data.

In [None]:
!pip install catboost

In [3]:
from catboost import CatBoostClassifier, Pool as CatboostPool, cv
import os
import pandas
model_file = CatBoostClassifier()
model_file = model_file.load_model("catboost_model.bin")
df = pandas.read_csv("test_data.csv")

In [5]:
import pandas as pd
import io
import json

out = model_file.predict_proba(df)
print(out)

[[0.0203764 0.9796236]]


## Create a model tar ball

SageMaker requires our model to be packaged in a tar.gz file.

In [6]:
! tar -czvf catboost-model.tar.gz catboost_model.bin

catboost_model.bin


## Upload 100 copies of the model to S3

Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket. 

In [None]:
import sagemaker

sess = sagemaker.Session()
s3_bucket=sess.default_bucket()  # Replace with your own bucket name if needed
print(s3_bucket)

### Upload the model tar balls using boto3 with a unique name

In [None]:
import boto3

s3 = boto3.client('s3')
for i in range(0,100):
    with open("catboost-model.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-{}.tar.gz".format(i))


#### Upload the Bigger model

In [4]:
import boto3

s3 = boto3.client('s3')
for i in range(0,100):
    with open("./models/catboost-model-big.tar.gz", "rb") as f:
        s3.upload_fileobj(f, s3_bucket, "catboost/catboost-model-big-{}.tar.gz".format(i))



### List all models in s3 prefix we will use for our Multi-Model Endpoint

In [None]:
!aws s3 ls s3://$s3_bucket/catboost/

## Building the custom container

The container folder in this example contains 3 files:
```
├── container
│   ├── dockerd-entrypoint.py
│   ├── Dockerfile
│   └── model_handler.py
```

- `dockerd-entrypoint.py` is the entry point script that will start the multi model server.
- `Dockerfile` contains the container definition that will be used to assemble the image. This include the packages that need to be installed.
- `model_handler.py` is the script that will contain the logic to load up the model and make inference.

Take a look through the files to see if there is any customization that you would like to do.
Below cells highlight the main part of the files. 


### Install catboost in the `Dockerfile`

In [None]:
! sed -n '26,30p' container/Dockerfile

### Update `initialize` function in `model_handler.py` with logic to load up the model
In this case we are using `CatBoostClassifier()`. Feel free to update the loading logic in this function to your needs.

In [None]:
! sed -n '22,40p' container/model_handler.py

### Update `handle` function in `model_handler.py` with logic to load up the model

In [None]:
! sed -n '70,85p' container/model_handler.py

### Build and Push custom image to ECR

In [None]:
%%sh

# The name of our algorithm
algorithm_name=catboost-sagemaker-multimodel

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

In [None]:
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

role = get_execution_role()

### Create the SageMaker Multi-Model

In [None]:
from time import gmtime, strftime

model_name = 'catboost-multimodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 's3://{}/catboost/'.format(s3_bucket) ## MODEL S3 URL
container = '{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest'.format(account_id, region)
instance_type = 'ml.m5.xlarge'

print('Model name: ' + model_name)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'MultiModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

### Create the SageMaker Endpoint Configuration


In [None]:
endpoint_config_name = 'catboost-multimodel-config' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': instance_type,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

### Create the SageMaker Multi-Model Endpoint

In [None]:
%%time

import time

endpoint_name = 'catboost-multimodel-endpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

### Invoke each of the 100 models

In [None]:
for i in range (0,100):
        response = runtime_sm_client.invoke_endpoint(
                EndpointName=endpoint_name,
                TargetModel="catboost-model-{}.tar.gz".format(i),
                Body=df.to_csv(index=False))
        print(json.loads(response['Body'].read().decode('utf-8')))

*It is also possible to add new models on demand by uploading the tar balls to the S3 prefix*.

## Invoke 
Invoke the Bigger Model to Test same container managing multiple model types

In [1]:
import pandas as pd
continue_var = ['I' + str(i) for i in range(1, 14)]
cat_features = ['C' + str(i) for i in range(1,27)]
col_names = ['Label'] + continue_var + cat_features

test_data_set_end_point = pd.read_csv('./data/dac_sample_small.txt', sep='\t', names=col_names).fillna(0)
test_data_set_end_point = test_data_set_end_point.iloc[:, 1:] # remove the LABEL for predictions 

payload=test_data_set_end_point.to_csv(index=False)
len(payload)

2638

In [None]:
for i in range (0,100):
        response = runtime_sm_client.invoke_endpoint(
                EndpointName=endpoint_name,
                TargetModel="catboost-model-big-{}.tar.gz".format(i),
                Body=payload)
        print(json.loads(response['Body'].read().decode('utf-8')))