# Demos

This is the notebook containing the demos for Feature Store, Model Monitor, and Clarify. Testing for these exercises was performed using __2 vCPU + 4 GiB notebook instance with Python 3 (TensorFlow 2.1 Python 3.6 CPU Optimized) kernel__.

## Staging

We'll begin by initializing some variables that are used throughout the demos. These are often assumed to be present in code samples you'll find in the AWS documenation.

In [2]:
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role

role = get_execution_role()
session = sagemaker.Session()
region = session.boto_region_name
bucket = session.default_bucket()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


## Feature Store
---

Feature store is a special database to give ML systems a consistent data flow across training and inference workloads. It can ingest data in batches (for training) as well as serve input features to models with very low latency for real-time prediction.

For this demo we'll use the boston housing dataset, which you can learn more about here: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

In [5]:
from tensorflow.keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data(test_split=0.1, seed=1234)

# Manually add headers
train_headers = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT"]
test_headers = ["MEDV"]

2024-01-24 13:06:17.582177: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-24 13:06:17.638094: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-24 13:06:17.638137: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-24 13:06:17.639497: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-24 13:06:17.647766: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-24 13:06:17.650083: I tensorflow/core/platform/cpu_feature_guard.cc:1

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


In [6]:
import pandas as pd
import time
import uuid


boston_train = pd.DataFrame(x_train, columns=train_headers)

Once we have our data, we can create a feature group. Remember to attach event time and ID columns - Feature Store needs them.

In [7]:
boston_train["EventTime"] = time.time()
boston_train["id"] = range(len(boston_train))

# Create feature group
from sagemaker.feature_store.feature_group import FeatureGroup

feature_group = FeatureGroup(
    name="boston-features", sagemaker_session=session
)

# Load Feature definitions
feature_group.load_feature_definitions(data_frame=boston_train)

[FeatureDefinition(feature_name='CRIM', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='ZN', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='INDUS', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='CHAS', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='NOX', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='RM', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='AGE', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='DIS', feature_type=<FeatureTypeEnum.FRACTIONAL: 'Fractional'>, collection_type=None),
 FeatureDefinition(feature_name='RAD', feature

The feature group is not created until we call the `create` method, let's do that now:

In [8]:
feature_group.create(
    s3_uri=f"s3://{bucket}/features",
    record_identifier_name='id',
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True,
)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:759895829784:feature-group/boston-features',
 'ResponseMetadata': {'RequestId': '14360dfa-58d1-4bba-87c3-e09e0eb52718',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '14360dfa-58d1-4bba-87c3-e09e0eb52718',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '92',
   'date': 'Wed, 24 Jan 2024 13:07:36 GMT'},
  'RetryAttempts': 0}}

For applications, we can create a lightweight client to retrieve data with low latency:

In [9]:
runtime = session.boto_session.client(
  'sagemaker-featurestore-runtime',
  region_name=region
)

data = runtime.get_record(
    FeatureGroupName="boston-features",
    RecordIdentifierValueAsString="0"
)

If we try to get records before we ingest any data, the response comes back empty:

In [10]:
data

{'ResponseMetadata': {'RequestId': '50a1338f-2e55-467e-b3a6-f0e2c7b7763a',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '50a1338f-2e55-467e-b3a6-f0e2c7b7763a',
   'content-type': 'application/json',
   'content-length': '32',
   'date': 'Wed, 24 Jan 2024 13:08:21 GMT'},
  'RetryAttempts': 0}}

In [11]:
feature_group.ingest(data_frame=boston_train, max_workers=3, wait=True)

IngestionManagerPandas(feature_group_name='boston-features', sagemaker_fs_runtime_client_config=<botocore.config.Config object at 0x7f6207cca3e0>, sagemaker_session=<sagemaker.session.Session object at 0x7f620807a230>, max_workers=3, max_processes=1, profile_name=None, _async_result=<multiprocess.pool.MapResult object at 0x7f61cd475150>, _processing_pool=<pool ProcessPool(ncpus=1)>, _failed_indices=[])

In [12]:
data = runtime.get_record(
    FeatureGroupName="boston-features",
    RecordIdentifierValueAsString="0"
)
data

{'ResponseMetadata': {'RequestId': '4c2d4cd7-46df-4c30-83d3-a465a879e5fa',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4c2d4cd7-46df-4c30-83d3-a465a879e5fa',
   'content-type': 'application/json',
   'content-length': '1110',
   'date': 'Wed, 24 Jan 2024 13:08:58 GMT'},
  'RetryAttempts': 0},
 'Record': [{'FeatureName': 'CRIM', 'ValueAsString': '0.01951'},
  {'FeatureName': 'ZN', 'ValueAsString': '17.5'},
  {'FeatureName': 'INDUS', 'ValueAsString': '1.38'},
  {'FeatureName': 'CHAS', 'ValueAsString': '0.0'},
  {'FeatureName': 'NOX', 'ValueAsString': '0.4161'},
  {'FeatureName': 'RM', 'ValueAsString': '7.104'},
  {'FeatureName': 'AGE', 'ValueAsString': '59.5'},
  {'FeatureName': 'DIS', 'ValueAsString': '9.2229'},
  {'FeatureName': 'RAD', 'ValueAsString': '3.0'},
  {'FeatureName': 'TAX', 'ValueAsString': '216.0'},
  {'FeatureName': 'PTRATIO', 'ValueAsString': '18.6'},
  {'FeatureName': 'B', 'ValueAsString': '393.24'},
  {'FeatureName': 'LSTAT', 'ValueAsString': '8.05'}

## Model Monitor

In this demo we create a monitoring schedule for a deployed model. We'll begin by reloading our data from the previous demo.

In [13]:
from tensorflow.keras.datasets import boston_housing
import pandas as pd

(x_train, y_train), (x_test, y_test) = boston_housing.load_data(test_split=0.1, seed=1234)
headers = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT"]


train = pd.DataFrame(x_train, columns=headers)
train["MEDV"] = y_train

# Target variable must come first per https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html
train.set_index(train.pop('MEDV'), inplace=True)
train.reset_index(inplace=True)
train

Unnamed: 0,MEDV,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,33.0,0.01951,17.5,1.38,0.0,0.4161,7.104,59.5,9.2229,3.0,216.0,18.6,393.24,8.05
1,27.5,0.14866,0.0,8.56,0.0,0.5200,6.727,79.9,2.7778,5.0,384.0,20.9,394.76,9.42
2,5.6,25.04610,0.0,18.10,0.0,0.6930,5.987,100.0,1.5888,24.0,666.0,20.2,396.90,26.77
3,21.2,3.67367,0.0,18.10,0.0,0.5830,6.312,51.9,3.9917,24.0,666.0,20.2,388.62,10.58
4,14.9,9.51363,0.0,18.10,0.0,0.7130,6.728,94.1,2.4961,24.0,666.0,20.2,6.68,18.71
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
450,17.9,18.81100,0.0,18.10,0.0,0.5970,4.628,100.0,1.5539,24.0,666.0,20.2,28.79,34.37
451,14.5,8.49213,0.0,18.10,0.0,0.5840,6.348,86.1,2.0527,24.0,666.0,20.2,83.45,17.64
452,12.7,4.66883,0.0,18.10,0.0,0.7130,5.976,87.9,2.5806,24.0,666.0,20.2,10.48,19.01
453,17.8,0.31827,0.0,9.90,0.0,0.5440,5.914,83.2,3.9986,4.0,304.0,18.4,390.70,18.33


In [14]:
test =  pd.DataFrame(x_test)
test["MEDV"] = y_test

# Target variable must come first per https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html
test.set_index(test.pop('MEDV'), inplace=True)
test.reset_index(inplace=True)

Now we'll upload the data to S3 as train and validation data, then train a model:

In [15]:
train.to_csv("train.csv", header=False, index=False)
test.to_csv("validation.csv", header=False, index=False)

val_location = session.upload_data('./validation.csv', key_prefix="data")
train_location = session.upload_data('./train.csv', key_prefix="data")

s3_input_train = sagemaker.inputs.TrainingInput(s3_data=train_location, content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=val_location, content_type='csv')

In [16]:
from sagemaker.model_monitor import DataCaptureConfig

algo_image = sagemaker.image_uris.retrieve("xgboost", region, version='latest')
s3_output_location = f"s3://{bucket}/models/boston_model"

model=sagemaker.estimator.Estimator(
    image_uri=algo_image,
    role=role,
    instance_count=1,
    instance_type='ml.m4.xlarge',
    volume_size=5,
    output_path=s3_output_location,
    sagemaker_session=sagemaker.Session()
)

model.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        objective='reg:linear',
                        early_stopping_rounds=10,
                        num_round=200)


model.fit({'train': s3_input_train, 'validation': s3_input_validation})

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


INFO:sagemaker:Creating training-job with name: xgboost-2024-01-24-13-11-43-243


2024-01-24 13:11:43 Starting - Starting the training job...
2024-01-24 13:12:09 Starting - Preparing the instances for training.........
2024-01-24 13:13:32 Downloading - Downloading input data...
2024-01-24 13:14:02 Downloading - Downloading the training image...
2024-01-24 13:14:42 Training - Training image download completed. Training in progress...[34mArguments: train[0m
[34m[2024-01-24:13:14:53:INFO] Running standalone xgboost training.[0m
[34m[2024-01-24:13:14:53:INFO] File size need to be processed in the node: 0.04mb. Available memory size in the node: 8542.91mb[0m
[34m[2024-01-24:13:14:53:INFO] Determined delimiter of CSV input is ','[0m
[34m[13:14:53] S3DistributionType set as FullyReplicated[0m
[34m[13:14:53] 455x13 matrix with 5915 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,[0m
[34m[2024-01-24:13:14:53:INFO] Determined delimiter of CSV input is ','[0m
[34m[13:14:53] S3DistributionType set as FullyReplicated[0m
[34m[13:

Now that the training job has finished, we can configure a deployment for data capture, then deploy:

In [17]:
capture_uri = f's3://{bucket}/data-capture'

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=capture_uri
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [18]:
xgb_predictor = model.deploy(
    initial_instance_count=1, instance_type='ml.m4.xlarge',
    data_capture_config=data_capture_config
)

INFO:sagemaker:Creating model with name: xgboost-2024-01-24-13-16-30-409
INFO:sagemaker:Creating endpoint-config with name xgboost-2024-01-24-13-16-30-409
INFO:sagemaker:Creating endpoint with name xgboost-2024-01-24-13-16-30-409


-------!

Here we can provide some sample code to test the deployed model:

In [19]:
xgb_predictor.serializer = sagemaker.serializers.CSVSerializer()

In [33]:
inputs = test.copy()
yyy = inputs.loc[:,inputs.columns[0]]
inputs = inputs.drop(columns=inputs.columns[0])

x_pred = xgb_predictor.predict(inputs.sample(5).values).decode('utf-8')

In [67]:
inputs = test.copy()
yyy = inputs.loc[:,inputs.columns[0]]
inputs = inputs.drop(columns=inputs.columns[0])

import random
t=random.choice(range(len(test)))
if t > len(test)-5:
    t=len(test)-5
x_pred = xgb_predictor.predict(inputs[t:t+5].values).decode('utf-8')

for i, j in zip(yyy[t:5+t],x_pred.split(',')):
    print(i,'\t',round(float(j),1))


50.0 	 46.1
23.4 	 21.3
21.7 	 21.7
19.3 	 20.3
33.1 	 33.2


In [25]:
s3_input_train

<sagemaker.inputs.TrainingInput at 0x7f6208cae020>

We define the Model Monitor and suggest a baseline:

In [35]:
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

my_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [36]:
my_monitor.suggest_baseline(
    baseline_dataset=f's3://{bucket}/data/train.csv',
    dataset_format=DatasetFormat.csv(header=False),
)

INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2024-01-24-13-25-01-957


............................[34m2024-01-24 13:29:41.962652: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory[0m
[34m2024-01-24 13:29:41.962681: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.[0m
[34m2024-01-24 13:29:43.573504: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory[0m
[34m2024-01-24 13:29:43.573530: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)[0m
[34m2024-01-24 13:29:43.573550: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-10-0-213-57.ec2.internal): /proc/driver/nvidia/ver

<sagemaker.processing.ProcessingJob at 0x7f61bd271db0>

Lastly, the Model Monitor must be scheduled, or it won't actually run regular processing jobs on the captured data:

In [68]:
from sagemaker.model_monitor import CronExpressionGenerator

my_monitor.create_monitoring_schedule(
    monitor_schedule_name='my-monitoring-schedule',
    endpoint_input=xgb_predictor.endpoint_name,
    statistics=my_monitor.baseline_statistics(),
    constraints=my_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: my-monitoring-schedule


## Clarify

This Clarify demo builds on the previous demo: we follow the same pattern of define-configure-schedule for our Monitor. Clarify, however, needs more config. We define `SHAPConfig`, `ModelConfig`, `ExplainabilityAnalysisConfig`, and pass them all to the scheduling method.

In [69]:
model_explainability_monitor = sagemaker.model_monitor.ModelExplainabilityMonitor(
    role=role,
    sagemaker_session=session,
    max_runtime_in_seconds=1800,
)


shap_config = sagemaker.clarify.SHAPConfig(
    baseline=[train.mean().astype(int).to_list()[1:]],
    num_samples=int(x_train.size),
    agg_method="mean_abs",
    save_local_shap_values=False,
)


model_config = sagemaker.clarify.ModelConfig(
    model_name="xgboost-2021-08-25-15-19-33-499",
    instance_count=1,
    instance_type='ml.m4.xlarge',
    content_type="text/csv",
    accept_type="text/csv",
)

analysis_config = sagemaker.model_monitor.ExplainabilityAnalysisConfig(
        explainability_config=shap_config,
        model_config=model_config,
        headers=train.columns.to_list()[1:],
    )

explainability_uri = f"s3://{bucket}/model_explainability"
model_explainability_monitor.create_monitoring_schedule(
    output_s3_uri=explainability_uri,
    analysis_config=analysis_config,
    endpoint_input=xgb_predictor.endpoint_name,
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: 1.0.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.model_monitor.clarify_model_monitoring:Uploading analysis config to {s3_uri}.
INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: monitoring-schedule-2024-01-24-13-39-31-182


In [72]:
model_explainability_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: monitoring-schedule-2024-01-24-13-39-31-182


INFO:sagemaker.model_monitor.clarify_model_monitoring:Deleting Model Explainability Job Definition with name: model-explainability-job-definition-2024-01-24-13-39-31-182


In [71]:
my_monitor.delete_monitoring_schedule()


Deleting Monitoring Schedule with name: my-monitoring-schedule


INFO:sagemaker.model_monitor.model_monitoring:Deleting Data Quality Job Definition with name: data-quality-job-definition-2024-01-24-13-38-50-482
