# Hello World, Amazon SageMaker Feature Store
This notebook provides a demo of how easy it is to use SageMaker Feature Store. It does this by leveraging a simple  set of utility functions that wrap the feature store API to keep it simple for a data scientist using Python.

### A few imports

In [1]:
from FeatureStore import Utils
from IPython.core.display import display, HTML
import pandas as pd
import time
from sklearn.ensemble import RandomForestClassifier

FG_NAME = 'customers-tmp'

#### Install pyarrow for reading a sample parquet file from the offline store

In [2]:
# !pip install pyarrow

### Create some quick and dirty test data

In [3]:
df = pd.read_csv('customers.csv')
ORIGINAL_RECORD_COUNT = df.shape[0]
df.head()

Unnamed: 0,Id,UpdateTime,ZipCode,Persona,City,Churn
0,C1,2020-02-01T00:00:00Z,11111,1,Boston,1
1,C2,2020-02-01T00:00:00Z,11111,2,Boston,1
2,C3,2020-02-01T00:00:00Z,11111,1,Boston,1
3,C4,2020-02-01T00:00:00Z,11111,2,Boston,0
4,C5,2020-02-01T00:00:00Z,11111,1,Boston,0


### Delete my existing feature group, since we're just playing in a test environment
For completeness, it is good to delete the Glue table that was created on your behalf by SageMaker Feature Store. See the Glue console [here](https://console.aws.amazon.com/glue).

In [4]:
Utils.delete_feature_group(FG_NAME)

Deleting all s3 objects in prefix: offline-store/355151823911/sagemaker/us-east-1/offline-store/customers-tmp in bucket sagemaker-us-east-1-355151823911
Waiting for Feature Group Deletion
Waiting for Feature Group Deletion
Waiting for Feature Group Deletion
Waiting for Feature Group Deletion


### Create a brand new feature group, directly from my dataframe

In [5]:
tags = {'Environment': 'DEV', 
        'CostCenter': 'C20', 
        'Maintainer': 'John Smith', 
        'DocURL': 'https://www.google.com'}
Utils.create_fg_from_df(FG_NAME, df, tags=tags)

Waiting for Feature Group Creation
Waiting for Feature Group Creation
Waiting for Feature Group Creation
FeatureGroup customers-tmp successfully created.


### See that it actually worked

In [6]:
Utils.list_feature_groups('cust')

[]

In [7]:
Utils.describe_feature_group(FG_NAME)

{'FeatureGroupArn': 'arn:aws:sagemaker:us-east-1:355151823911:feature-group/customers-tmp',
 'FeatureGroupName': 'customers-tmp',
 'RecordIdentifierFeatureName': 'Id',
 'EventTimeFeatureName': 'UpdateTime',
 'FeatureDefinitions': [{'FeatureName': 'Id', 'FeatureType': 'String'},
  {'FeatureName': 'UpdateTime', 'FeatureType': 'String'},
  {'FeatureName': 'ZipCode', 'FeatureType': 'Integral'},
  {'FeatureName': 'Persona', 'FeatureType': 'Integral'},
  {'FeatureName': 'City', 'FeatureType': 'String'},
  {'FeatureName': 'Churn', 'FeatureType': 'Integral'}],
 'CreationTime': datetime.datetime(2021, 6, 3, 23, 4, 45, 678000, tzinfo=tzlocal()),
 'OnlineStoreConfig': {'EnableOnlineStore': True},
 'OfflineStoreConfig': {'S3StorageConfig': {'S3Uri': 's3://sagemaker-us-east-1-355151823911/offline-store',
   'ResolvedOutputS3Uri': 's3://sagemaker-us-east-1-355151823911/offline-store/355151823911/sagemaker/us-east-1/offline-store/customers-tmp-1622761485/data'},
  'DisableGlueTableCreation': False,
 

In [4]:
doc_url = Utils.get_tags(FG_NAME)['DocURL']
print(f'Docs for feature group "{FG_NAME}" is here: {doc_url}')

Docs for feature group "customers-tmp" is here: https://www.google.com


### Ingest the features from my dataframe into my new feature group

In [9]:
Utils.ingest_from_df(FG_NAME, df)

### Show that we can lookup the latest feature values from the online store

In [5]:
Utils.get_latest_feature_values(FG_NAME, ['C4','C2','C6'])

[{'Id': 'C4',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 0},
 {'Id': 'C2',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 1},
 {'Id': 'C6',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 0}]

## Show that we can get the history of feature values
The offline store is append-only. New records are added.

#### Now, ingest some new data with later event timestamps
We'll put in two new sets of records each with the event timestamp advanced one day, and the zipcode changed. We should now have three total sets of records:

1. Original, event timestamp Feb 1, zip code 11111
2. New set, with event timestamp Feb 2, zip code 22222
3. Final set, with event timestamp Feb 3, zip code 33333

In [5]:
df['UpdateTime'] = '2020-02-02T00:00:00Z'
df['ZipCode'] = '22222'
Utils.ingest_from_df(FG_NAME, df)

df['UpdateTime'] = '2020-02-03T00:00:00Z'
df['ZipCode'] = '33333'

Utils.ingest_from_df(FG_NAME, df)

#### Look up the full history for a few id's
Wait a few minutes (up to 15) for the data to be there.

In [4]:
ids = ['C5','C6']
features = ['*'] 

mins = 0
while True:
    hist_df = Utils.get_historical_offline_feature_values(FG_NAME, record_ids=ids, feature_names=features)
    if hist_df.shape[0] < ORIGINAL_RECORD_COUNT:
        print('Waiting for offline store data...')
        time.sleep(60)
        mins += 1
    else:
        break

print(f'\nData is available. Waited {mins} minutes\n')
hist_df.sort_values(by=['id','zipcode']).head(30)

Running query:
 SELECT * FROM "customers-tmp-1622761485"  WHERE  Id IN ('C5','C6')

Data is available. Waited 0 minutes



Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted
1,C5,2020-02-01T00:00:00Z,11111,1,Boston,0,2021-06-03 23:11:37.541,2021-06-03 23:05:07.000,False
4,C5,2020-02-02T00:00:00Z,22222,1,Boston,0,2021-06-03 23:11:37.536,2021-06-03 23:05:18.000,False
5,C5,2020-02-02T00:00:00Z,22222,1,Boston,0,2021-06-03 23:11:37.536,2021-06-03 23:07:36.000,False
2,C5,2020-02-03T00:00:00Z,33333,1,Boston,0,2021-06-03 23:11:37.515,2021-06-03 23:05:19.000,False
3,C5,2020-02-03T00:00:00Z,33333,1,Boston,0,2021-06-03 23:11:37.515,2021-06-03 23:07:37.000,False
0,C6,2020-02-01T00:00:00Z,11111,2,Boston,0,2021-06-03 23:11:05.442,2021-06-03 23:05:07.000,False
6,C6,2020-02-02T00:00:00Z,22222,2,Boston,0,2021-06-03 23:11:05.448,2021-06-03 23:05:18.000,False
7,C6,2020-02-02T00:00:00Z,22222,2,Boston,0,2021-06-03 23:11:05.448,2021-06-03 23:07:36.000,False
8,C6,2020-02-03T00:00:00Z,33333,2,Boston,0,2021-06-03 23:11:05.451,2021-06-03 23:05:19.000,False
9,C6,2020-02-03T00:00:00Z,33333,2,Boston,0,2021-06-03 23:11:05.451,2021-06-03 23:07:37.000,False


#### Browse the set of offline store files in the S3 console

In [5]:
s3_console_url = Utils.get_offline_store_url(FG_NAME)
print(f'Review offline store partitioned data files here: {s3_console_url}')

Review offline store partitioned data files here: https://s3.console.aws.amazon.com/s3/buckets/sagemaker-us-east-1-355151823911?region=us-east-1&prefix=offline-store/355151823911/sagemaker/us-east-1/offline-store/customers-tmp/data/


#### See the Glue table that can be used for Athena queries

In [6]:
glue_console_url = Utils.get_glue_table_url(FG_NAME)
print(f'To see the Glue table that was created for you, go here: {glue_console_url}')

To see the Glue table that was created for you, go here: https://console.aws.amazon.com/glue/home?region=us-east-1#table:catalog=355151823911;name=customers-tmp-1622761485;namespace=sagemaker_featurestore


#### Examine contents of a sample offline store file

In [7]:
sample_filename = Utils.download_sample_offline_file(FG_NAME)
print(f'Downloaded sample file from offline store: {sample_filename}')
sample_df = pd.read_parquet(sample_filename)
sample_df.head()

Downloaded sample file from offline store: 20200101T000000Z_PDMok5nnyajcc8y2.parquet


Unnamed: 0,YYYY-MM-DD,AWND,PRCP,SNOW,TAVG,TMIN,WSF2,WSF5,WSF2_MPH,WSF5_MPH,...,PRCP_IN,SNOW_IN,TAVG_F,TMIN_F,BAD,IATA,EVENT_TIME,write_time,api_invocation_time,is_deleted
0,2020-01-01,69.0,0.0,0.0,41.0,22.0,130.0,188.0,29.2,42.3,...,0.0,0.0,39.38,35.96,True,BOS,2020-01-01T00:00:00Z,2021-04-18 17:34:07.654000+00:00,2021-04-18 17:32:40+00:00,False


#### See the total record count in the offline store for this feature group

In [8]:
total_record_count = Utils.get_historical_record_count(FG_NAME)
print(f'Found {total_record_count:,d} records in "{FG_NAME}" feature group.')

Found 32 records in "customers-tmp" feature group.


#### Get a sample of records

In [9]:
sample_df = Utils.sample(FG_NAME, n=5)
sample_df.head()

Running query:
 SELECT * FROM "customers-tmp-1622761485" tablesample bernoulli(25) limit 5


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted
0,C3,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:37.536,2021-06-03 23:05:18.000,False
1,C3,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:37.536,2021-06-03 23:07:36.000,False
2,C1,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:05.464,2021-06-03 23:05:18.000,False
3,C1,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:05.464,2021-06-03 23:07:36.000,False
4,C4,2020-02-02T00:00:00Z,22222,2,Boston,0,2021-06-03 23:11:36.327,2021-06-03 23:05:18.000,False


#### Here we retrieve the full history for all id's

In [14]:
hist_df = Utils.get_historical_offline_feature_values(FG_NAME)
hist_df.sort_values(by=['id','zipcode']).head(100)

Running query:
 SELECT * FROM "customers-tmp-1622761485" 


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted
2,1,2020-02-03T00:00:00Z,,,,,2021-06-03 23:37:16.215,2021-06-03 23:32:19.000,True
31,C1,2020-02-01T00:00:00Z,11111.0,1.0,Boston,1.0,2021-06-03 23:11:05.458,2021-06-03 23:05:07.000,False
13,C1,2020-02-02T00:00:00Z,22222.0,1.0,Boston,1.0,2021-06-03 23:11:05.464,2021-06-03 23:05:18.000,False
14,C1,2020-02-02T00:00:00Z,22222.0,1.0,Boston,1.0,2021-06-03 23:11:05.464,2021-06-03 23:07:36.000,False
11,C1,2020-02-03T00:00:00Z,33333.0,1.0,Boston,1.0,2021-06-03 23:11:05.472,2021-06-03 23:05:19.000,False
12,C1,2020-02-03T00:00:00Z,33333.0,1.0,Boston,1.0,2021-06-03 23:11:05.472,2021-06-03 23:07:37.000,False
3,C1,2020-02-03T00:00:00Z,,,,,2021-06-03 23:37:16.215,2021-06-03 23:32:30.000,True
28,C2,2020-02-01T00:00:00Z,11111.0,2.0,Boston,1.0,2021-06-03 23:11:36.330,2021-06-03 23:05:07.000,False
20,C2,2020-02-02T00:00:00Z,22222.0,2.0,Boston,1.0,2021-06-03 23:11:36.327,2021-06-03 23:05:18.000,False
21,C2,2020-02-02T00:00:00Z,22222.0,2.0,Boston,1.0,2021-06-03 23:11:36.327,2021-06-03 23:07:36.000,False


#### Here we retrieve the full history for one id

In [13]:
hist_df = Utils.get_historical_offline_feature_values(FG_NAME, record_ids=['C3'])
hist_df.sort_values(by=['id','zipcode']).head(100)

Running query:
 SELECT * FROM "customers-tmp-1622761485"  WHERE  Id IN ('C3')


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted
4,C3,2020-02-01T00:00:00Z,11111,1,Boston,1,2021-06-03 23:11:37.541,2021-06-03 23:05:07.000,False
0,C3,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:37.536,2021-06-03 23:05:18.000,False
1,C3,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:37.536,2021-06-03 23:07:36.000,False
2,C3,2020-02-03T00:00:00Z,33333,1,Boston,1,2021-06-03 23:11:37.515,2021-06-03 23:05:19.000,False
3,C3,2020-02-03T00:00:00Z,33333,1,Boston,1,2021-06-03 23:11:37.515,2021-06-03 23:07:37.000,False


#### Now let's see what the online store thinks are the latest values

In [15]:
Utils.get_latest_feature_values(FG_NAME, ['C4','C2','C6'])

[{'Id': 'C4',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 0},
 {'Id': 'C2',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 1},
 {'Id': 'C6',
  'UpdateTime': '2020-02-03T00:00:00Z',
  'ZipCode': 33333,
  'Persona': 2,
  'City': 'Boston',
  'Churn': 0}]

## Train a simple model with features extracted from the feature store
For our example, the dataset we want is the latest values for 3 specific features for each record id.

In [16]:
full_df = Utils.get_latest_offline_feature_values(FG_NAME, feature_names=['Id','ZipCode','Persona','Churn'])
full_df.head(10)

Running query:
 SELECT Id,ZipCode,Persona,Churn FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" ) WHERE rank = 1 AND NOT is_deleted


Unnamed: 0,Id,ZipCode,Persona,Churn
0,C2,33333,2,1
1,C5,33333,1,0
2,C6,33333,2,0
3,C3,33333,1,1
4,C4,33333,2,0


In [19]:
full_df.drop('Id', axis=1, inplace=True)

In [20]:
train_df = full_df[0:4]
test_df = full_df[4:6]

In [21]:
clf = RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(train_df[['ZipCode','Persona']], train_df[['Churn']].values.ravel())
clf.predict(test_df[['ZipCode','Persona']])

array([0])

## Demonstrate As-Of queries

#### Time travel to get dataset as it looked on 2020-02-02

In [24]:
asof_df = Utils.get_offline_feature_values_as_of(FG_NAME, '2020-02-02T00:00:00Z')
asof_df.sort_values(by=['id','updatetime','write_time'], ascending=[True, False, False]).head(100)

Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" WHERE UpdateTime <= '2020-02-02T00:00:00Z') WHERE rank = 1 AND NOT is_deleted


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted,rank
0,C1,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:05.464,2021-06-03 23:07:36.000,False,1
2,C2,2020-02-02T00:00:00Z,22222,2,Boston,1,2021-06-03 23:11:36.327,2021-06-03 23:07:36.000,False,1
4,C3,2020-02-02T00:00:00Z,22222,1,Boston,1,2021-06-03 23:11:37.536,2021-06-03 23:07:36.000,False,1
5,C4,2020-02-02T00:00:00Z,22222,2,Boston,0,2021-06-03 23:11:36.327,2021-06-03 23:07:36.000,False,1
1,C5,2020-02-02T00:00:00Z,22222,1,Boston,0,2021-06-03 23:11:37.536,2021-06-03 23:07:36.000,False,1
3,C6,2020-02-02T00:00:00Z,22222,2,Boston,0,2021-06-03 23:11:05.448,2021-06-03 23:07:36.000,False,1


#### Get dataset as it looked on 2020-02-01

In [25]:
asof_df = Utils.get_offline_feature_values_as_of(FG_NAME, '2020-02-01T00:00:00Z')
asof_df.sort_values(by=['id','updatetime','write_time'], ascending=[True, False, False]).head(100)

Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" WHERE UpdateTime <= '2020-02-01T00:00:00Z') WHERE rank = 1 AND NOT is_deleted


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted,rank
1,C1,2020-02-01T00:00:00Z,11111,1,Boston,1,2021-06-03 23:11:05.458,2021-06-03 23:05:07.000,False,1
0,C2,2020-02-01T00:00:00Z,11111,2,Boston,1,2021-06-03 23:11:36.330,2021-06-03 23:05:07.000,False,1
4,C3,2020-02-01T00:00:00Z,11111,1,Boston,1,2021-06-03 23:11:37.541,2021-06-03 23:05:07.000,False,1
5,C4,2020-02-01T00:00:00Z,11111,2,Boston,0,2021-06-03 23:11:36.330,2021-06-03 23:05:07.000,False,1
2,C5,2020-02-01T00:00:00Z,11111,1,Boston,0,2021-06-03 23:11:37.541,2021-06-03 23:05:07.000,False,1
3,C6,2020-02-01T00:00:00Z,11111,2,Boston,0,2021-06-03 23:11:05.442,2021-06-03 23:05:07.000,False,1


## Delete a record and show how it impacts lookups (latest, and historical)

In [7]:
Utils.delete_record(FG_NAME, 'C1', '2020-02-03T00:00:00Z')

#### After a record is deleted, the online store will no longer return features for that id

In [8]:
Utils.get_latest_feature_values(FG_NAME, ['C1'])

[]

#### Once the deletion is propagated to the offline store, a new record is added with the is_deleted flag set to True
If you get feature values as of that timestamp, the deleted record will be filtered out.

In [9]:
mins = 0
while True:
    hist_df = Utils.get_offline_feature_values_as_of(FG_NAME, '2020-02-03T00:00:00Z')
    if hist_df.shape[0] == ORIGINAL_RECORD_COUNT:
        print('Waiting for offline store deletion update...')
        time.sleep(60)
        mins += 1
    else:
        break

print(f'\nDeletion was propagated. Waited {mins} minutes\n')
hist_df.head(10)

Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" WHERE UpdateTime <= '2020-02-03T00:00:00Z') WHERE rank = 1 AND NOT is_deleted
Waiting for offline store deletion update...
Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" WHERE UpdateTime <= '2020-02-03T00:00:00Z') WHERE rank = 1 AND NOT is_deleted
Waiting for offline store deletion update...
Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" WHERE UpdateTime <= '2020-02-03T00:00:00Z') WHERE rank = 1 AND NOT is_deleted
Waiting for offline store deletion update...
Running query:
 SELECT * FROM (SELECT *, dense_rank() OVER (PARTITION BY Id 

Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted,rank
0,C2,2020-02-03T00:00:00Z,33333,2,Boston,1,2021-06-03 23:11:36.334,2021-06-03 23:07:37.000,False,1
1,C5,2020-02-03T00:00:00Z,33333,1,Boston,0,2021-06-03 23:11:37.515,2021-06-03 23:07:37.000,False,1
2,C3,2020-02-03T00:00:00Z,33333,1,Boston,1,2021-06-03 23:11:37.515,2021-06-03 23:07:37.000,False,1
3,C4,2020-02-03T00:00:00Z,33333,2,Boston,0,2021-06-03 23:11:36.334,2021-06-03 23:07:37.000,False,1
4,C6,2020-02-03T00:00:00Z,33333,2,Boston,0,2021-06-03 23:11:05.451,2021-06-03 23:07:37.000,False,1


#### If you retrieve all feature records, you can see that the offline store is an append-only store
A record is added with features written as NaN, with the `is_deleted` flag set to `True`.
The new deleted record has the more recent write_time. 

In [26]:
hist_df = Utils.get_historical_offline_feature_values(FG_NAME)
hist_df.sort_values(by=['id','zipcode']).head(100)

Running query:
 SELECT * FROM "customers-tmp-1622761485" 


Unnamed: 0,id,updatetime,zipcode,persona,city,churn,write_time,api_invocation_time,is_deleted
24,1,2020-02-03T00:00:00Z,,,,,2021-06-03 23:37:16.215,2021-06-03 23:32:19.000,True
10,C1,2020-02-01T00:00:00Z,11111.0,1.0,Boston,1.0,2021-06-03 23:11:05.458,2021-06-03 23:05:07.000,False
20,C1,2020-02-02T00:00:00Z,22222.0,1.0,Boston,1.0,2021-06-03 23:11:05.464,2021-06-03 23:05:18.000,False
21,C1,2020-02-02T00:00:00Z,22222.0,1.0,Boston,1.0,2021-06-03 23:11:05.464,2021-06-03 23:07:36.000,False
6,C1,2020-02-03T00:00:00Z,33333.0,1.0,Boston,1.0,2021-06-03 23:11:05.472,2021-06-03 23:05:19.000,False
7,C1,2020-02-03T00:00:00Z,33333.0,1.0,Boston,1.0,2021-06-03 23:11:05.472,2021-06-03 23:07:37.000,False
25,C1,2020-02-03T00:00:00Z,,,,,2021-06-03 23:37:16.215,2021-06-03 23:32:30.000,True
23,C2,2020-02-01T00:00:00Z,11111.0,2.0,Boston,1.0,2021-06-03 23:11:36.330,2021-06-03 23:05:07.000,False
12,C2,2020-02-02T00:00:00Z,22222.0,2.0,Boston,1.0,2021-06-03 23:11:36.327,2021-06-03 23:05:18.000,False
13,C2,2020-02-02T00:00:00Z,22222.0,2.0,Boston,1.0,2021-06-03 23:11:36.327,2021-06-03 23:07:36.000,False


#### Now see what would be retrieved for a training dataset with only the latest values for each id
Notice that the deleted record is not returned.

In [27]:
Utils.get_latest_offline_feature_values(FG_NAME, feature_names=['Id','ZipCode','Persona','Churn'])

Running query:
 SELECT Id,ZipCode,Persona,Churn FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485" ) WHERE rank = 1 AND NOT is_deleted


Unnamed: 0,Id,ZipCode,Persona,Churn
0,C2,33333,2,1
1,C5,33333,1,0
2,C6,33333,2,0
3,C3,33333,1,1
4,C4,33333,2,0


#### Show that we can get the latest offline features for a specific set of id's

In [28]:
Utils.get_latest_offline_feature_values(FG_NAME, record_ids=['C3'], feature_names=['Id','ZipCode','Persona','Churn'])

Running query:
 SELECT Id,ZipCode,Persona,Churn FROM (SELECT *, dense_rank() OVER (PARTITION BY Id ORDER BY UpdateTime DESC, Api_Invocation_Time DESC, write_time DESC) AS rank FROM "customers-tmp-1622761485"  WHERE  Id IN ('C3')) WHERE rank = 1 AND NOT is_deleted


Unnamed: 0,Id,ZipCode,Persona,Churn
0,C3,33333,1,1


## Clean up
Delete the feature group and its offline storage.

In [29]:
#Utils.delete_feature_group(FG_NAME)