# 4 - Endpoint <a class="anchor" id="top"></a>
* [Introduction](#intro)
* [Setup](#setup)
* [Define and deploy the model](#define)
* [Productionize the endpoint](#prod)
    * [Autoscaling](#autoscale)
    * [Model monitor](#monitor)
* [Testing the endpoint](#test)
* [Cleanup resources](#cleanup)

## Introduction <a class="anchor" id="intro"></a>
In this last section, we create a Sagemaker endpoint to allow for real-time predictions using our trained models.
After creating the endpoint, we will test a simple application that takes in basic flight information and returns 
the models prediction.

## Setup <a class="anchor" id="setup"></a>
First, we import Sageamker SDK dependencies as well as modules used in application below.
We also get relevant sessions and read in local environment data.

In [1]:
import xml
import json
import uuid
import boto3
import random
import requests
import numpy as np
import pandas as pd
import datetime as dt
import dateutil.parser
import sagemaker as sm
import sagemaker.sparkml as sparkml

In [2]:
# Get relevant sessions.
sm_session = sm.Session()
role = sm.get_execution_role()
boto3_session = boto3.session.Session()
now = dt.datetime.now().strftime(r"%Y%m%dT%H%M%S")

In [3]:
# Get boto3 session attributes.
account = boto3_session.client("sts").get_caller_identity()["Account"]
region = boto3_session.region_name
s3_resource = boto3_session.resource("s3")

In [4]:
# Retrieve data bucket name.
with open("/home/ec2-user/.aiml-bb/stack-data.json", "r") as f:
    data = json.load(f)
    data_bucket = data["data_bucket"]
    model_bucket = data["model_bucket"]

## Define and deploy the model <a class="anchor" id="define"></a>
To allow for a complete inference pipeline, we chain together the preprocessing, model inference/evaluation, and postprocessing.
We will define each of these stages as a Sagemaker `Model` object, then chain them together into an inference pipeline.

In [5]:
# Required schema for input into preprocessing step.
preprocess_schema_json = json.dumps({
    "input": [
        {"name": "day_of_week", "type": "int"},
        {"name": "month", "type": "int"},
        {"name": "op_carrier", "type": "string"},
        {"name": "origin_latitude", "type": "double"}, 
        {"name": "origin_longitude", "type": "double"},
        {"name": "dest_latitude", "type": "double"}, 
        {"name": "dest_longitude", "type": "double"},
        {"name": "origin_tmax", "type": "double"}, 
        {"name": "origin_tmin", "type": "double"}, 
        {"name": "origin_prcp", "type": "double"}, 
        {"name": "origin_snow", "type": "double"}, 
        {"name": "origin_snwd", "type": "double"},
        {"name": "dest_tmax", "type": "double"}, 
        {"name": "dest_tmin", "type": "double"}, 
        {"name": "dest_prcp", "type": "double"}, 
        {"name": "dest_snow", "type": "double"}, 
        {"name": "dest_snwd", "type": "double"}
    ],
     "output": {"name": "features", "type": "double", "struct": "vector"}
})

In [6]:
# Define the preprocessing model.
preprocess_model = sparkml.model.SparkMLModel(
    name=f"spark-preprocessor-{now}",
    model_data=f"s3://{model_bucket}/spark-preprocessor/model.tar.gz",
    spark_version="2.4",
    sagemaker_session=sm_session,
    env={"SAGEMAKER_SPARKML_SCHEMA": preprocess_schema_json}
)

In [7]:
# Define inference model using XGBoost, 
# the best performing model.
xgb_container_image = sm.image_uris.retrieve("xgboost", region, "latest")
xgb_inference_model = sm.model.Model(
    image_uri=xgb_container_image,
    model_data=f"s3://{model_bucket}/sagemaker-xgboost-tuned/model.tar.gz"
)

In [8]:
# Define complete inference pipeline model.
pipeline_model = sm.pipeline.PipelineModel(
    name=f"sm-pipeline-{now}",
    role=role,
    models=[
        preprocess_model,
        xgb_inference_model
    ]
)

In [9]:
# Enable data capture and create endpoint.
endpoint_name = f"pipeline-endpoint-{now}"
data_capture_config = sm.model_monitor.DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=f"s3://{model_bucket}/endpoint-data-capture/{endpoint_name}/"
)
pipeline_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge",
    endpoint_name=endpoint_name,
    data_capture_config=data_capture_config
)

--------------!

## Productionize the endpoint <a class="anchor" id="prod"></a>
We now ready the model for production by adding autoscaling to ensure high availability and a Model Monitor to continuously track the quality of our model.

In [10]:
# Create necessary resources.
autoscaling_client = boto3.client("application-autoscaling")

### Autoscaling <a class="anchor" id="autoscale"></a>
Attach a CPU base autoscaling policy that triggers on high CPU utilization.

In [11]:
# Prepare the endpoint for autoscaling.
endpoint_resource_id=f"endpoint/{endpoint_name}/variant/AllTraffic"
register_scalable_target_response = autoscaling_client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=endpoint_resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=1,
    MaxCapacity=2 # Low for testing purposes.
)

In [12]:
# Apply a CPU based autoscaling policy.
autoscaling_policy_name = f"CPUUtil-ScalingPolicy"
put_scaling_policy_response = autoscaling_client.put_scaling_policy(
    PolicyName=autoscaling_policy_name,
    ServiceNamespace="sagemaker",
    ResourceId=endpoint_resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    PolicyType="TargetTrackingScaling",
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 90.0,
        "CustomizedMetricSpecification":
        {
            "MetricName": "CPUUtilization",
            "Namespace": "/aws/sagemaker/Endpoints",
            "Dimensions": [
                {"Name": "EndpointName", "Value": endpoint_name },
                {"Name": "VariantName","Value": "AllTraffic"}
            ],
            "Statistic": "Average",
            "Unit": "Percent"
        },
        "ScaleInCooldown": 600,
        "ScaleOutCooldown": 300
    }
)

### Model monitor <a class="anchor" id="monitor"></a>
Create a model quality monitor baselined against the validation data set. 
Note that we are only modelling the inference step in our model pipeline, excluding the preprocessing step.
This is so we can baseline against our already preprocessed data as well as isolate inference performance.

In [None]:
# Create model monitor and baseline against validation data.
inference_model_monitor = sm.model_monitor.DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.4xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600
)
inference_model_monitor.suggest_baseline(
    baseline_dataset=f"s3://{data_bucket}/preprocessing_output/baseline/",
    dataset_format=sm.model_monitor.dataset_format.DatasetFormat.csv(header=True),
    output_s3_uri=f"s3://{model_bucket}/inference-model-monitor/",
    wait=True
)

In [14]:
latest_baselining_job = inference_model_monitor.latest_baselining_job
latest_baselining_job

<sagemaker.model_monitor.model_monitoring.BaseliningJob at 0x7fd00c094198>

In [15]:
# View statistics on monitor baseline.
latest_baselining_job = inference_model_monitor.latest_baselining_job
schema_df = pd.json_normalize(
    latest_baselining_job.baseline_statistics()
    .body_dict["features"]
)
schema_df.head(10)

Unnamed: 0,name,inferred_type,numerical_statistics.common.num_present,numerical_statistics.common.num_missing,numerical_statistics.mean,numerical_statistics.sum,numerical_statistics.std_dev,numerical_statistics.min,numerical_statistics.max,numerical_statistics.distribution.kll.buckets,numerical_statistics.distribution.kll.sketch.parameters.c,numerical_statistics.distribution.kll.sketch.parameters.k,numerical_statistics.distribution.kll.sketch.data,string_statistics.common.num_present,string_statistics.common.num_missing,string_statistics.distinct_count,string_statistics.distribution.categorical.buckets
0,target,Fractional,18594798.0,0.0,0.49999,9297209.0,0.5,0.0,1.0,"[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...",0.64,2048.0,"[[], [1.0], [1.0], [1.0], [], [1.0], [1.0], [1...",,,,
1,day_of_week,Integral,18594798.0,0.0,3.930799,73092410.0,1.988336,1.0,7.0,"[{'lower_bound': 1.0, 'upper_bound': 1.6, 'cou...",0.64,2048.0,"[[], [7.0], [7.0], [7.0], [], [7.0], [7.0], [7...",,,,
2,month,Integral,18594798.0,0.0,6.373626,118516300.0,3.371035,1.0,12.0,"[{'lower_bound': 1.0, 'upper_bound': 2.1, 'cou...",0.64,2048.0,"[[], [12.0], [12.0], [12.0], [], [12.0], [12.0...",,,,
3,op_carrier,String,,,,,,,,,,,,18594798.0,0.0,21.0,"[{'value': 'DL', 'count': 2276123}, {'value': ..."
4,origin_latitude,Fractional,18594798.0,0.0,36.70529,682527500.0,5.82915,-14.331,71.285402,"[{'lower_bound': -14.3310003281, 'upper_bound'...",0.64,2048.0,"[[], [64.81510162], [64.81510162], [64.8151016...",,,,
5,origin_longitude,Fractional,18594798.0,0.0,-94.698524,-1760900000.0,17.936394,-170.710007,145.729004,"[{'lower_bound': -170.710006714, 'upper_bound'...",0.64,2048.0,"[[], [-64.79859924316406], [-64.79859924316406...",,,,
6,dest_latitude,Fractional,18594798.0,0.0,36.709299,682602000.0,5.859106,-14.331,71.285402,"[{'lower_bound': -14.3310003281, 'upper_bound'...",0.64,2048.0,"[[], [61.17440032958984], [71.285402], [61.174...",,,,
7,dest_longitude,Fractional,18594798.0,0.0,-94.802398,-1762831000.0,18.110998,-170.710007,145.729004,"[{'lower_bound': -170.710006714, 'upper_bound'...",0.64,2048.0,"[[], [-66.0018005371], [144.796005249], [-66.0...",,,,
8,origin_tmax,Fractional,18594798.0,0.0,215.662616,4010203000.0,107.342548,-733.0,600.0,"[{'lower_bound': -733.0, 'upper_bound': -599.7...",0.64,2048.0,"[[], [467.0], [461.0], [467.0, 461.0, 461.0, -...",,,,
9,origin_tmin,Fractional,18594798.0,0.0,113.047247,2102091000.0,101.18162,-733.0,600.0,"[{'lower_bound': -733.0, 'upper_bound': -599.7...",0.64,2048.0,"[[], [328.0], [339.0], [317.0, 333.0, 328.0, -...",,,,


In [16]:
# View statistics on monitor baseline.
constraints_df = pd.json_normalize(
    latest_baselining_job.suggested_constraints()
    .body_dict["features"]
)
constraints_df.head(10)

Unnamed: 0,name,inferred_type,completeness,num_constraints.is_non_negative,string_constraints.domains
0,target,Fractional,1.0,True,
1,day_of_week,Integral,1.0,True,
2,month,Integral,1.0,True,
3,op_carrier,String,1.0,,"[DL, F9, US, OO, 9E, B6, AA, YV, G4, EV, OH, N..."
4,origin_latitude,Fractional,1.0,False,
5,origin_longitude,Fractional,1.0,False,
6,dest_latitude,Fractional,1.0,False,
7,dest_longitude,Fractional,1.0,False,
8,origin_tmax,Fractional,1.0,False,
9,origin_tmin,Fractional,1.0,False,


In [17]:
# Create schedule to compare baseline against the realtime traffic.
inference_model_monitor.create_monitoring_schedule(
    monitor_schedule_name=f"inference-model-monitor-schedule",
    endpoint_input=endpoint_name,
    statistics=inference_model_monitor.baseline_statistics(),
    constraints=inference_model_monitor.suggested_constraints(),
    schedule_cron_expression=sm.model_monitor.CronExpressionGenerator.hourly(),
)

## Testing the endpoint <a class="anchor" id="test"></a>
Test the endpoint in a simple application where the flight information is inputted, and a prediction is returned.

In [18]:
# Connect a predictor to the endpoint.
pipeline_predictor = sm.predictor.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sm_session,
    serializer=sm.serializers.JSONSerializer()
)

In [19]:
# User inputted features.
origin = "JFK"
dest = "LAX"
carrier = "B6"
fl_date = "2022-01-31"

All code below would be abstracted away from the user.

In [20]:
# Get date attributes.
today = dt.datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
fl_datetime = dt.datetime.strptime(fl_date, r"%Y-%m-%d")
day_of_week = fl_datetime.weekday() + 1
month = fl_datetime.month

In [21]:
# Get latitude and longitudes of airports.
airport_df = pd.read_parquet(f"s3://{data_bucket}/dl_output/airport_data")
get_iata_geolocation = (
    lambda iata: 
    airport_df.loc[airport_df["iata"]==iata, ["latitude", "longitude"]].iloc[0]
)
origin_lat, origin_lon = get_iata_geolocation(origin)
dest_lat, dest_lon = get_iata_geolocation(dest)

In [22]:
# Grab weather data.
forecast_fqdn = "https://graphical.weather.gov"
get_geolocation_forecast = (
    lambda lat, lon:
    xml.etree.ElementTree.fromstring(
        requests.get(
            f"{forecast_fqdn}/xml/SOAP_server/ndfdXMLclient.php",
            params={
                "lat": lat, "lon": lon,
                "begin": today.isoformat(), 
                "end": (today + dt.timedelta(days=7)).isoformat(),
                "Unit": "m",
                "maxt": "maxt", "mint": "mint",
                "qpf": "qpf", "snow": "snow",
                "product": "time-series",
                "Submit": "Submit"
            }
        ).content
    )
)
origin_forecast = get_geolocation_forecast(origin_lat, origin_lon) 
dest_forecast = get_geolocation_forecast(dest_lat, dest_lon)

In [23]:
# Define function to get averages of date values in XML.
def get_avg_xml_value(xml_tree, field, datetime=fl_datetime):
    # Get date index key.
    layout_key = xml_tree.find(f".//*{field}").attrib["time-layout"]
    
    # Find indices of dates matching date in question.
    idxs = []
    for idx, date in enumerate(xml_tree.findall(f".//*time-layout/start-valid-time")):
        datetime = dateutil.parser.parse(date.text)
        if fl_datetime.strftime("%Y-%m-%d") == datetime.strftime("%Y-%m-%d"):
            idxs.append(idx)
            
    if not idxs:
        raise ValueError("Date invalid, no data found for field. Possibly too far into the future.")
            
    # Data is for different times of day so we take mean.
    # Zero is added so we default in case of no data (e.g. with snow).
    val_sum = 0.0
    for idx, val in enumerate(xml_tree.findall(f".//*{field}/value")):
        if idx in idxs:
            val_sum += float(val.text)
            
    return val_sum / len(idxs)

In [24]:
# Get forecast values and convert to dataset formats.
# In NOAA weather data, all values are scaled by 1/10.
origin_tmax = 0.10 * get_avg_xml_value(origin_forecast, "temperature[@type='maximum']")
origin_tmin = 0.10 * get_avg_xml_value(origin_forecast, "temperature[@type='minimum']")
origin_snwd = 0.10 * get_avg_xml_value(origin_forecast, "precipitation[@type='snow']")
origin_liquid = 0.10 * get_avg_xml_value(origin_forecast, "precipitation[@type='liquid']")

dest_tmax = 0.10 * get_avg_xml_value(dest_forecast, "temperature[@type='maximum']")
dest_tmin = 0.10 * get_avg_xml_value(dest_forecast, "temperature[@type='minimum']")
dest_snwd = 0.10 * get_avg_xml_value(dest_forecast, "precipitation[@type='snow']")
dest_liquid = 0.10 * get_avg_xml_value(dest_forecast, "precipitation[@type='liquid']")

# This snow to liquid ratio is often assumed, however can be inaccurate.
# It is suitable for demonstration purposes, but may need more acccurate 
# inspection in production use cases.
snow_to_liquid_ration = 10.0

origin_avg = (origin_tmax + origin_tmin) / 2
origin_prcp = origin_liquid if origin_avg > 0 else 0
origin_snow = 0 if origin_avg > 0 else snow_to_liquid_ration * origin_liquid

dest_avg = (dest_tmax + dest_tmin) / 2
dest_prcp = dest_liquid if dest_avg > 0 else 0
dest_snow = 0 if dest_avg > 0 else snow_to_liquid_ration * dest_liquid

Send data to the endpoint and make the prediction.

In [26]:
payload =  {"data": [
    day_of_week, month, 
    carrier, 
    origin_lat, origin_lon, 
    dest_lat, dest_lon,
    origin_tmax, origin_tmin, origin_prcp, origin_snow, origin_snwd,
    dest_tmax, dest_tmin, dest_prcp, dest_snow, dest_snwd
]}
print(json.dumps(
    pipeline_predictor.predict(payload).decode("utf-8"), 
    indent=4
))

"0.6006277799606323"


## Cleanup resources <a class="anchor" id="cleanup"></a>
Because this is a temporary project, delete the endpoint.

In [None]:
autoscaling_client.delete_scaling_policy(
    PolicyName=autoscaling_policy_name,
    ServiceNamespace="sagemaker",
    ResourceId=endpoint_resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
)
inference_model_monitor.delete_monitoring_schedule()
pipeline_predictor.delete_endpoint()