# Deploy a Machine Learning Model using a Serverless Inference Endpoint
Deploying a pre-trained binary classification XGBoost model that has been trained on a synthetic auto insurance claims data, to a serverless endpoint in SageMaker. 

Based on the following AWS sample: https://aws.amazon.com/getting-started/hands-on/deploy-a-machine-learning-model-to-a-serverless-inference-endpoint/

First install the aiobotocore package which provides an interface to the AWS services that we'll be using. We won't restart the kernel yet, so ignore that message.

In [None]:
%pip install --upgrade -q aiobotocore 

We also need to install s3fs which enables Python to work with S3, after this be sure to restart the kernel.

In [None]:
pip install s3fs

Import the libararies we need to build and deploy our model, and configure some parameters, including locations for model artifacts in S3.

In [None]:
import pandas as pd
import boto3
import sagemaker
import time
import json
import io
from io import StringIO
import base64
import re
import s3fs

from sagemaker.image_uris import retrieve

sess = sagemaker.Session()

region = sess.boto_region_name
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")

sagemaker_role = sagemaker.get_execution_role()


# S3 locations used for parameterizing the notebook run
read_bucket = "sagemaker-sample-files"
read_prefix = "datasets/tabular/synthetic_automobile_claims" 
model_prefix = "models/xgb-fraud"

# S3 location of trained model artifact
model_uri = f"s3://{read_bucket}/{model_prefix}/fraud-det-xgb-model.tar.gz"

# S3 locatin of test data
test_data_uri = f"s3://{read_bucket}/{read_prefix}/test.csv"

We're using the SageMaker managed XGBoost image, in this step we retrieve the image and set the model name.

In [None]:
# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")

# Specify an unique model name that does not exist
model_name = "fraud-detect-xgb"
primary_container = {
                     "Image": training_image,
                     "ModelDataUrl": model_uri
                    }

model_matches = sm_client.list_models(NameContains=model_name)["Models"]
if not model_matches:
    model = sm_client.create_model(ModelName=model_name,
                                   PrimaryContainer=primary_container,
                                   ExecutionRoleArn=sagemaker_role)
else:
    print(f"Model with name {model_name} already exists! Change model name to create new")

Here's our endpoint configuration, specifying the memory we want to allocate to the serverless endpoint, and the max concurrent invocations.

In [None]:
# Endpoint Config name
endpoint_config_name = f"{model_name}-serverless-epconfig"

# Endpoint conifg parameters
production_variant_dict = {
                           "VariantName": "Alltraffic",
                           "ModelName": model_name,
                           "ServerlessConfig": {"MemorySizeInMB": 3072, # Endpoint memory in MB
                                                "MaxConcurrency": 1 # Number of concurrent invocations
                                               }
                          }

# Create endpoint config if one with the same name does not exist
endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)["EndpointConfigs"]
if not endpoint_config_matches:
    endpoint_config_response = sm_client.create_endpoint_config(
                                                                EndpointConfigName=endpoint_config_name,
                                                                ProductionVariants=[production_variant_dict]
                                                               )
else:
    print(f"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new")

Next, we deploy the model by creating the endpoint using the endpoint configuration that we created, it might take a few minutes to deploy.

In [None]:
# Endpoint name
endpoint_name = f"{model_name}-serverless-ep"

# Create endpoint if one with the same name does not exist
endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)["Endpoints"]
if not endpoint_matches:
    endpoint_response = sm_client.create_endpoint(
                                                  EndpointName=endpoint_name,
                                                  EndpointConfigName=endpoint_config_name
                                                 )
else:
    print(f"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new")

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
    print(f"Endpoint Status: {status}...")
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")

Invoke the endpoint by running some predictions using some sample data that is formatted using serialization and deserialization. The model output is an example of binary classification, predicting whether the sampled insurance claims are fraudulent. 1 means fraud, 0 means not fraud. 

In [None]:
# Fetch test data to run predictions with the endpoint
test_df = pd.read_csv(test_data_uri)

# For content type text/csv, payload should be a string with commas separating the values for each feature
# This is the inference request serialization step
# CSV serialization
csv_file = io.StringIO()
test_sample = test_df.drop(["fraud"], axis=1).iloc[:5]
test_sample.to_csv(csv_file, sep=",", header=False, index=False)
payload = csv_file.getvalue()
response = sm_runtime_client.invoke_endpoint(
                                             EndpointName=endpoint_name,
                                             Body=payload,
                                             ContentType="text/csv"
                                            )

# This is the inference response deserialization step
# This is a bytes object
result = response["Body"].read()
# Decoding bytes to a string with comma separated predictions
result = result.decode("utf-8")
# Converting to list of predictions
result = re.split(",|\n",result)

prediction_df = pd.DataFrame()
prediction_df["Prediction"] = result[:5]
prediction_df["Label"] = test_df["fraud"].iloc[:5].values
prediction_df

In [None]:
Here are the steps to delete the model and endpoint.

In [None]:
# Delete model
sm_client.delete_model(ModelName=model_name)

# Delete endpoint configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# Delete endpoint
sm_client.delete_endpoint(EndpointName=endpoint_name)