### SageMaker Feature Store Notebook showing use of Time Travel

This notebook is part of an AWS blog that shows how to use "Time Travel" by leveraging SageMaker Feature Store. This particular notebook (#2) is used to generate aggregate data attributes (i.e. averages and sums) from the raw transaction data generated in previous notebook (#1). 

In [1]:
from sagemaker import get_execution_role
import sagemaker
import boto3
import time
import json
import sys

role = get_execution_role()
sm_client = boto3.Session().client(service_name='sagemaker')
smfs_runtime = boto3.Session().client(service_name='sagemaker-featurestore-runtime')

#### Start by Deleting Feature Groups that we will re-create

In [2]:
# Use SageMaker default bucket
BUCKET = sagemaker.Session().default_bucket()
BASE_PREFIX = "sagemaker-featurestore-blog"

# Note that FeatureStore will append this pattern to base prefix -> "{account_id}/sagemaker/{region}/offline-store/"
OFFLINE_STORE_BASE_URI = f's3://{BUCKET}/{BASE_PREFIX}'

print(OFFLINE_STORE_BASE_URI)

CONS_FEATURE_GROUP = "consumer-fg"
CARD_FEATURE_GROUP = "credit-card-fg"

s3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog


In [3]:
# Delete feature groups (in case the name already exists)

try:
    sm_client.delete_feature_group(FeatureGroupName=CARD_FEATURE_GROUP) 
    print('deleted feature group: ' + CARD_FEATURE_GROUP)
except:
    print('Could not delete feature group, it may not exist')
    
try:
    sm_client.delete_feature_group(FeatureGroupName=CONS_FEATURE_GROUP) 
    print('deleted feature group: ' + CONS_FEATURE_GROUP)
except:
    print('Could not delete feature group, it may not exist')
    

deleted feature group: credit-card-fg
deleted feature group: consumer-fg


#### Recreate the Feature Groups using Schema definition files
Each feature group contains configuration parameters for Offline and Online stores. The feature group uses a schema definition file (JSON) that dictates the feature names and types. Below we display these local schema files.

#### Schema files on in the local 'schema' folder

In [4]:
!pygmentize schema/consumer-fg-schema.json

{
    [94m"description"[39;49;00m: [33m"Consumer features"[39;49;00m,
    [94m"features"[39;49;00m: [
          {
              [94m"name"[39;49;00m: [33m"consumer_id"[39;49;00m,
              [94m"type"[39;49;00m: [33m"string"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Consumer ID built from BBAN proxy (Unique)"[39;49;00m
          },
          {
              [94m"name"[39;49;00m: [33m"num_trans_last_7d"[39;49;00m,
              [94m"type"[39;49;00m: [33m"bigint"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Aggregated Metric: Average number of transactions for the consumer aggregated by past 7 days"[39;49;00m
          },
          {
              [94m"name"[39;49;00m: [33m"avg_amt_last_7d"[39;49;00m,
              [94m"type"[39;49;00m: [33m"double"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Aggregated Metric: Average transaction amount for the consumer aggregated by past 7 days"[3

In [5]:
!pygmentize schema/credit-card-fg-schema.json

{
    [94m"description"[39;49;00m: [33m"Credit card features"[39;49;00m,
    [94m"features"[39;49;00m: [
          {
              [94m"name"[39;49;00m: [33m"cc_num"[39;49;00m,
              [94m"type"[39;49;00m: [33m"bigint"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Credit Card Number (Unique)"[39;49;00m
          },
          {
              [94m"name"[39;49;00m: [33m"num_trans_last_7d"[39;49;00m,
              [94m"type"[39;49;00m: [33m"bigint"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Aggregated Metric: Average number of transactions for the card aggregated by past 7 days"[39;49;00m
          },
          {
              [94m"name"[39;49;00m: [33m"avg_amt_last_7d"[39;49;00m,
              [94m"type"[39;49;00m: [33m"double"[39;49;00m,
              [94m"description"[39;49;00m: [33m"Aggregated Metric: Average transaction amount for the card aggregated by past 7 days"[39;49;00m
          },
 

In [6]:
def create_feature_group_from_schema(filename, fg_name, role_arn=None, s3_uri=None):
    schema = json.loads(open(filename).read())
    
    feature_defs = []
    
    for col in schema['features']:
        feature = {'FeatureName': col['name']}
        if col['type'] == 'double':
            feature['FeatureType'] = 'Fractional'
        elif col['type'] == 'bigint':
            feature['FeatureType'] = 'Integral'
        else:
            feature['FeatureType'] = 'String'
        feature_defs.append(feature)

    record_identifier_name = schema['record_identifier_feature_name']
    event_time_name = schema['event_time_feature_name']

    if role_arn is None:
        role_arn = get_execution_role()

    if s3_uri is None:
        offline_config = {}
    else:
        print(f'Creating Offline Store at: {s3_uri}')
        offline_config = {'OfflineStoreConfig': {'S3StorageConfig': {'S3Uri': s3_uri}}}
        
    sm_client.create_feature_group(
        FeatureGroupName = fg_name,
        RecordIdentifierFeatureName = record_identifier_name,
        EventTimeFeatureName = event_time_name,
        FeatureDefinitions = feature_defs,
        Description = schema['description'],
        Tags = schema['tags'],
        OnlineStoreConfig = {'EnableOnlineStore': True},
        RoleArn = role_arn,
        **offline_config)

#### Create the new Feature Groups using the schema definition 
Now we will create the feature group as defined by the schema file. Since Feature Group creation can sometimes take a few minutes, we will wait below for status to change from `Creating`.

In [7]:
create_feature_group_from_schema('schema/consumer-fg-schema.json', CONS_FEATURE_GROUP, 
                                 role_arn=role, s3_uri=OFFLINE_STORE_BASE_URI)

Creating Offline Store at: s3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog


In [8]:
create_feature_group_from_schema('schema/credit-card-fg-schema.json', CARD_FEATURE_GROUP, 
                                 role_arn=role, s3_uri=OFFLINE_STORE_BASE_URI)

Creating Offline Store at: s3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog


In [9]:
# Wait for status to change to 'Created'

def wait_for_feature_group_creation_complete(feature_group_name):
    response = sm_client.describe_feature_group(FeatureGroupName=feature_group_name)
    status = response['FeatureGroupStatus']
    while status == "Creating":
        print("Waiting for Feature Group Creation")
        time.sleep(5)
        response = sm_client.describe_feature_group(FeatureGroupName=feature_group_name)
        status = response['FeatureGroupStatus']
    if status != "Created":
        raise RuntimeError(f"Failed to create feature group {feature_group_name}")
    print(f"FeatureGroup {feature_group_name} successfully created.")

wait_for_feature_group_creation_complete(feature_group_name=CONS_FEATURE_GROUP)
wait_for_feature_group_creation_complete(feature_group_name=CARD_FEATURE_GROUP)

Waiting for Feature Group Creation
FeatureGroup consumer-fg successfully created.
FeatureGroup credit-card-fg successfully created.


#### Make sure the new Feature Groups exist

In [10]:
sm_client.list_feature_groups()

{'FeatureGroupSummaries': [{'FeatureGroupName': 'transaction-feature-group-27-20-08-49',
   'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:572539092864:feature-group/transaction-feature-group-27-20-08-49',
   'CreationTime': datetime.datetime(2021, 4, 27, 20, 8, 58, 393000, tzinfo=tzlocal()),
   'FeatureGroupStatus': 'Created',
   'OfflineStoreStatus': {'Status': 'Active'}},
  {'FeatureGroupName': 'pyspark-fg',
   'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:572539092864:feature-group/pyspark-fg',
   'CreationTime': datetime.datetime(2021, 4, 20, 18, 54, 48, 557000, tzinfo=tzlocal()),
   'FeatureGroupStatus': 'Created'},
  {'FeatureGroupName': 'orders-feature-group-06-15-47-58',
   'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:572539092864:feature-group/orders-feature-group-06-15-47-58',
   'CreationTime': datetime.datetime(2021, 5, 6, 15, 48, 15, 394000, tzinfo=tzlocal()),
   'FeatureGroupStatus': 'Created',
   'OfflineStoreStatus': {'Status': 'Active'}},
  {'FeatureGroupName': 

#### Describe configuration of feature group
Note that each feature group gets its own ARN, allowing you to manage IAM policies that control access to individual feature groups. The feature names and types are displayed, and the required record identifier and event time features are called out specifically. Notice that when we created the Feature Group above, we passed in the `s3_uri` parameter. This parameter dictates the base S3 location where the Offline Store data is written, and can be retrieved from the `describe_feature_group` output within the `OfflineStoreConfig` dictionary. 

In [11]:
sm_client.describe_feature_group(FeatureGroupName=CONS_FEATURE_GROUP)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:572539092864:feature-group/consumer-fg',
 'FeatureGroupName': 'consumer-fg',
 'RecordIdentifierFeatureName': 'consumer_id',
 'EventTimeFeatureName': 'event_time',
 'FeatureDefinitions': [{'FeatureName': 'consumer_id',
   'FeatureType': 'String'},
  {'FeatureName': 'num_trans_last_7d', 'FeatureType': 'Integral'},
  {'FeatureName': 'avg_amt_last_7d', 'FeatureType': 'Fractional'},
  {'FeatureName': 'num_trans_last_1d', 'FeatureType': 'Integral'},
  {'FeatureName': 'avg_amt_last_1d', 'FeatureType': 'Fractional'},
  {'FeatureName': 'event_time', 'FeatureType': 'String'}],
 'CreationTime': datetime.datetime(2021, 5, 26, 14, 48, 57, 742000, tzinfo=tzlocal()),
 'OnlineStoreConfig': {'EnableOnlineStore': True},
 'OfflineStoreConfig': {'S3StorageConfig': {'S3Uri': 's3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog',
   'ResolvedOutputS3Uri': 's3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog/572539092864/sagemaker/u

In [12]:
sm_client.describe_feature_group(FeatureGroupName=CARD_FEATURE_GROUP)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:572539092864:feature-group/credit-card-fg',
 'FeatureGroupName': 'credit-card-fg',
 'RecordIdentifierFeatureName': 'cc_num',
 'EventTimeFeatureName': 'event_time',
 'FeatureDefinitions': [{'FeatureName': 'cc_num', 'FeatureType': 'Integral'},
  {'FeatureName': 'num_trans_last_7d', 'FeatureType': 'Integral'},
  {'FeatureName': 'avg_amt_last_7d', 'FeatureType': 'Fractional'},
  {'FeatureName': 'num_trans_last_1d', 'FeatureType': 'Integral'},
  {'FeatureName': 'avg_amt_last_1d', 'FeatureType': 'Fractional'},
  {'FeatureName': 'event_time', 'FeatureType': 'String'}],
 'CreationTime': datetime.datetime(2021, 5, 26, 14, 49, 0, 425000, tzinfo=tzlocal()),
 'OnlineStoreConfig': {'EnableOnlineStore': True},
 'OfflineStoreConfig': {'S3StorageConfig': {'S3Uri': 's3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog',
   'ResolvedOutputS3Uri': 's3://sagemaker-us-east-1-572539092864/sagemaker-featurestore-blog/572539092864/sagemaker/us-east