# Feature store for feature management and online serving

What is a feature store?
The features generated are great examples of features that we can store the Vertex AI Feature Store. This is because:

* The features are needed for real-time prediction
* feature values in a feature store can be used for both training and serving
* if needed, features can be shared with other use cases beyond fraud detection

[Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore) provides a centralized repository for organizing, storing, and serving ML features. Using a central featurestore enables an organization to efficiently share, discover, and re-use ML features at scale, which can increase the velocity of developing and deploying new ML applications.

## Load env config

In [1]:
# naming convention for all cloud resources
VERSION        = "v1"                  # TODO
PREFIX         = f'ndr-{VERSION}'      # TODO

print(f"PREFIX = {PREFIX}")

PREFIX = ndr-v1


In [2]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "ndr-v1"
VERSION                  = "v1"

APP                      = "sp"
MODEL_TYPE               = "2tower"
FRAMEWORK                = "tfrs"
DATA_VERSION             = "v1"
TRACK_HISTORY            = "5"

BUCKET_NAME              = "ndr-v1-hybrid-vertex-bucket"
BUCKET_URI               = "gs://ndr-v1-hybrid-vertex-bucket"
SOURCE_BUCKET            = "spotify-million-playlist-dataset"

DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://ndr-v1-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

CANDIDATE_PREFIX         = "candidates"
TRAIN_DIR_PREFIX      

## Imports

In [3]:
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

In [4]:
import json
import time
import logging
import pandas as pd
import pickle as pkl
from pprint import pprint
from typing import List, Union
from datetime import datetime, timedelta

logging.disable(logging.WARNING)

# tensorflow
import tensorflow as tf
import tensorflow_recommenders as tfrs

# google cloud SDKs
from google.cloud import storage
from google.cloud import bigquery
from google.cloud import aiplatform as vertex_ai
from google.cloud.aiplatform import EntityType, Feature, Featurestore

# this repo
from util import feature_set_utils as feature_utils

In [5]:
vertex_ai.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)

storage_client = storage.Client(project=PROJECT_ID)

bq_client = bigquery.Client(project=PROJECT_ID, location=BQ_LOCATION)

## Helper Function

In [6]:
def run_bq_query(sql: str, show=False) -> Union[str, pd.DataFrame]:
    """
    Run a BigQuery query and return the job ID or result as a DataFrame
    Args:
        sql: SQL query, as a string, to execute in BigQuery
        show: A flag to show query result in a Pandas Dataframe
    Returns:
        df: DataFrame of results from query,  or error, if any
    """

    # bq_client = bigquery.Client()

    # Try dry run before executing query to catch any errors
    job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    bq_client.query(sql, job_config=job_config)

    # If dry run succeeds without errors, proceed to run query
    job_config = bigquery.QueryJobConfig()
    client_result = bq_client.query(sql, job_config=job_config)

    job_id = client_result.job_id

    # Wait for query/job to finish running. then get & return data frame
    result = client_result.result()
    print(f"Finished job_id: {job_id}")
    
    if show:
        df = result.to_arrow().to_pandas()
        return df

## Create Candidate Track Feature table in BigQuery

For batch ingestions, Vertex AI Feature Store requires user-provided timestamps for the ingested feature values. You can specify a particular timestamp for each value or specify the same timestamp for all values:

* If the timestamps for feature values are different, specify the timestamps in a column in your source data. Each row must have its own timestamp indicating when the feature value was generated. In your ingestion request, you specify the column name to identify the timestamp column.
* If the timestamp for all feature values is the same, you can specify it as a parameter in your ingestion request. You can also specify the timestamp in a column in your source data, where each row has the same timestamp.

In [7]:
candidate_files = []

for blob in storage_client.list_blobs(f"{BUCKET_NAME}", prefix=f'data/{DATA_VERSION}/{CANDIDATE_PREFIX}'):
    candidate_files.append(blob.public_url.replace("https://storage.googleapis.com/", "gs://"))

candidate_dataset = tf.data.TFRecordDataset(candidate_files)

parsed_candidate_dataset = candidate_dataset.map(feature_utils.parse_candidate_tfrecord_fn)

# for x in parsed_candidate_dataset.batch(1).take(1):
#     pprint(x)

In [8]:
track_feature_dict = feature_utils.get_candidate_features()
track_feature_dict

{'track_uri_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'track_name_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'artist_uri_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'artist_name_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'album_uri_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'album_name_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'duration_ms_can': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None),
 'track_pop_can': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None),
 'artist_pop_can': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None),
 'artist_genres_can': FixedLenFeature(shape=(), dtype=tf.string, default_value=None),
 'artist_followers_can': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None),
 'track_danceability_can': FixedLenFeature(shape=(), dtype=tf.float32, defa

In [9]:
track_feature_names = list(track_feature_dict.keys())
track_feature_names

['track_uri_can',
 'track_name_can',
 'artist_uri_can',
 'artist_name_can',
 'album_uri_can',
 'album_name_can',
 'duration_ms_can',
 'track_pop_can',
 'artist_pop_can',
 'artist_genres_can',
 'artist_followers_can',
 'track_danceability_can',
 'track_energy_can',
 'track_key_can',
 'track_loudness_can',
 'track_mode_can',
 'track_speechiness_can',
 'track_acousticness_can',
 'track_instrumentalness_can',
 'track_liveness_can',
 'track_valence_can',
 'track_tempo_can',
 'track_time_signature_can']

## Create query to create batch features (source BQ table)

### tracks features

In [10]:
# name of timestamp column
TRACK_FEATURE_TIMESTAMP = "feature_ts"

TRACKS_SRC_TABLE_NAME = "candidates"
TRACKS_FS_TABLE_NAME  = "candidate_tracks_fs"

TRACKS_SRC_BQ_TABLE_URI = f"{PROJECT_ID}.{BQ_DATASET}.{TRACKS_SRC_TABLE_NAME}"
TRACKS_FS_BQ_TABLE_URI = f"{PROJECT_ID}.{BQ_DATASET}.{TRACKS_FS_TABLE_NAME}"

print(f"TRACKS_SRC_BQ_TABLE_URI : {TRACKS_SRC_BQ_TABLE_URI}")
print(f"TRACKS_FS_BQ_TABLE_URI  : {TRACKS_FS_BQ_TABLE_URI}")

TRACKS_SRC_BQ_TABLE_URI : hybrid-vertex.spotify_e2e_test.candidates
TRACKS_FS_BQ_TABLE_URI  : hybrid-vertex.spotify_e2e_test.candidate_tracks_fs


In [11]:
query = f"""
CREATE OR REPLACE TABLE
  `{TRACKS_FS_BQ_TABLE_URI}` AS (
  SELECT
    * EXCEPT(time_signature_can,
      track_mode_can,
      track_key_can),
    CAST(track_mode_can AS STRING) AS track_mode_can,
    CAST(time_signature_can AS STRING) AS track_time_signature_can,
    CAST(track_key_can AS STRING) AS track_key_can,
    CURRENT_TIMESTAMP() AS {TRACK_FEATURE_TIMESTAMP}
  FROM
    `{TRACKS_SRC_BQ_TABLE_URI}` 
  )
"""
print(query)


CREATE OR REPLACE TABLE
  `hybrid-vertex.spotify_e2e_test.candidate_tracks_fs` AS (
  SELECT
    * EXCEPT(time_signature_can,
      track_mode_can,
      track_key_can),
    CAST(track_mode_can AS STRING) AS track_mode_can,
    CAST(time_signature_can AS STRING) AS track_time_signature_can,
    CAST(track_key_can AS STRING) AS track_key_can,
    CURRENT_TIMESTAMP() AS feature_ts
  FROM
    `hybrid-vertex.spotify_e2e_test.candidates` 
  )



In [32]:
run_bq_query(query)

Finished job_id: 6a5f7d83-54e1-4c6d-b1e9-7b86e2cd6101


In [12]:
run_bq_query(
    f"SELECT * FROM `{TRACKS_FS_BQ_TABLE_URI}` LIMIT 5",
    show=True
)

Finished job_id: 6d749bd9-b98f-4d66-a318-03d5d521a2fa


Unnamed: 0,track_uri_can,track_name_can,artist_uri_can,artist_name_can,album_uri_can,album_name_can,duration_ms_can,track_pop_can,artist_pop_can,artist_genres_can,...,track_speechiness_can,track_acousticness_can,track_instrumentalness_can,track_liveness_can,track_valence_can,track_tempo_can,track_mode_can,track_time_signature_can,track_key_can,feature_ts
0,spotify:track:4kUv2uXqqBK061vtafIt8M,"Symphony No. 4 In A Major, Op. 90 ""Italian"": I...",spotify:artist:6OOG8MBBe3FzBl8HSHgPx6,Massimo Freccia; Orchestra Of The Accademia Di...,spotify:album:40DIhWY7nKJBhVOrbY4VI2,"Mendelssohn: Symphony No. 4 in A Minor ""Italia...",346567.0,0.0,0.0,NONE,...,0.0392,0.862,0.838,0.162,0.318,91.022,0,4,9,2023-09-20 14:40:37.489429+00:00
1,spotify:track:5h2paE6F756caDP6GFV3aQ,Beacause Your Mine,spotify:artist:0GX3m0YvjwhghoTmBaVEUL,Beacause Your Mine,spotify:album:0V5FlGaljcUlM2ooNNfmhT,Old Gold - Rat Pack 08,144066.0,0.0,0.0,NONE,...,0.0332,0.851,0.0326,0.945,0.295,84.324,1,5,7,2023-09-20 14:40:37.489429+00:00
2,spotify:track:05Rwhz1NXIdV6O1DmlBXZj,The Treasure in My Heart,spotify:artist:1r2JGDFPIUqJV6JkPGQXvB,Joey and The Ambers,spotify:album:3FayisCn852AXyZ2CRzTfO,"Golden Doo Wop, Vol. 6",152476.0,0.0,0.0,NONE,...,0.0281,0.322,0.0,0.246,0.638,102.005,1,3,4,2023-09-20 14:40:37.489429+00:00
3,spotify:track:78YVECtBUj8vZgmwHpDZr2,I Touch Myself (Soda Club Mix),spotify:artist:3ZKmo0Vpg5g97vAUpP4jNq,"Klubbkatz,Sherrie A",spotify:album:491EGqseR2L7QGjPVJ8O0P,Pure Trance 9 (15 Hi-energy Trance Tracks),321813.0,1.0,0.0,NONE,...,0.0379,2.5e-05,0.188,0.0589,0.369,130.016,1,4,1,2023-09-20 14:40:37.489429+00:00
4,spotify:track:0jaXigDmpR29Yl5JGjl0Mv,Che Guevara,spotify:artist:4dO2v7PgABfjLPnZukztfd,Çeşitli Sanatçılar,spotify:album:0KYvqxGdGrpsnvHO9iqhdi,Dünya Devrim Şarkıları,181880.0,21.0,0.0,NONE,...,0.0288,0.963,0.917,0.107,0.226,89.956,0,4,0,2023-09-20 14:40:37.489429+00:00


### playlist features

In [13]:
# # name of timestamp column
# PLAYLIST_FEATURE_TIMESTAMP = "feature_ts"

# # test with smaller dataset (val); TODO - ingest train dataset
# PLAYLIST_SRC_TABLE_NAME = "v2_train_flatten_valid_last_5" # v2_train_flatten_last_5
# PLAYLIST_FS_TABLE_NAME  = "playlist_features_fs"

# PLAYLIST_SRC_BQ_TABLE_URI = f"{PROJECT_ID}.{BQ_DATASET}.{PLAYLIST_SRC_TABLE_NAME}"
# PLAYLIST_FS_BQ_TABLE_URI = f"{PROJECT_ID}.{BQ_DATASET}.{PLAYLIST_FS_TABLE_NAME}"

# print(f"PLAYLIST_SRC_BQ_TABLE_URI : {PLAYLIST_SRC_BQ_TABLE_URI}")
# print(f"PLAYLIST_FS_BQ_TABLE_URI  : {PLAYLIST_FS_BQ_TABLE_URI}")

In [14]:
# query = f"""
# CREATE OR REPLACE TABLE
#   `{PLAYLIST_FS_BQ_TABLE_URI}` AS (
#   SELECT
#     * EXCEPT(time_signature_can,
#       track_mode_can,
#       track_key_can),
#     CAST(track_mode_can AS STRING) AS track_mode_can,
#     CAST(time_signature_can AS STRING) AS track_time_signature_can,
#     CAST(track_key_can AS STRING) AS track_key_can,
#     CURRENT_TIMESTAMP() AS {PLAYLIST_FEATURE_TIMESTAMP}
#   FROM
#     `{PLAYLIST_SRC_BQ_TABLE_URI}` 
#   )
# """
# print(query)

## Create Feature Store

> A featurestore is the top-level container for entity types, features, and feature values. Typically, an organization creates one shared featurestore for feature ingestion, serving, and sharing across all teams in the organization.

Below you create a `featurestore` resource for `candidate tracks`. This will hold track features (e.g., audio features, 

In [17]:
ONLINE_STORAGE_NODES = 1

DEV_VERSION = "v3"

FEATURESTORE_ID = f"{DEV_VERSION}_{APP}_mpd_tracks_{PREFIX}".replace("-", "_")

print(f"FEATURESTORE_ID: {FEATURESTORE_ID}")

FEATURESTORE_ID: v3_sp_mpd_tracks_ndr_v1


In [18]:
# Creating new feature store sp_mpd_candidate_tracks_ndr_v1. # tmp

In [19]:
try:
    # Checks if there is already a Featurestore
    ff_feature_store = vertex_ai.Featurestore(f"{FEATURESTORE_ID}")
    print(f"""The feature store {FEATURESTORE_ID} already exists.""")
except:
    # Creates a Featurestore
    print(f"""Creating new feature store {FEATURESTORE_ID}.""")
    ff_feature_store = Featurestore.create(
        featurestore_id=f"{FEATURESTORE_ID}",
        online_store_fixed_node_count=ONLINE_STORAGE_NODES,
        labels={"prefix": f"{PREFIX}", "app": f"{APP}"},
        sync=True,
    )

Creating new feature store v3_sp_mpd_tracks_ndr_v1.


In [20]:
ff_feature_store.location

'us-central1'

In [21]:
# ff_feature_store.

## Create the main entity types and their features

> An entity type is a collection of semantically related features. You define your own entity types, based on the concepts that are relevant to your use case.

Let's create `track` and `playlist` entity types:

**track entity**

In [22]:
TRACK_ENTITY_ID = "track"
print(f"TRACK_ENTITY_ID: {TRACK_ENTITY_ID}")

TRACK_ENTITY_ID: track


In [23]:
try:
    # get entity type, if it already exists
    track_entity_type = ff_feature_store.get_entity_type(entity_type_id=TRACK_ENTITY_ID)
except:
    # else, create entity type
    track_entity_type = ff_feature_store.create_entity_type(
        entity_type_id=TRACK_ENTITY_ID, description="Candidate Track Entity", sync=True
    )
    
TRACK_ENTITY_RESOURCE_NAME = track_entity_type.resource_name
print("Entity type name is", TRACK_ENTITY_RESOURCE_NAME)

Entity type name is projects/934903580331/locations/us-central1/featurestores/v3_sp_mpd_tracks_ndr_v1/entityTypes/track


In [24]:
track_entity_type.list_features()

[]

In [25]:
# ENTITY_ID_FIELD = "track_uri_can"

In [26]:
track_entity_type

<google.cloud.aiplatform.featurestore.entity_type.EntityType object at 0x7f47a0455490> 
resource name: projects/934903580331/locations/us-central1/featurestores/v3_sp_mpd_tracks_ndr_v1/entityTypes/track

**playlist entity**

In [27]:
# PLAYLIST_ENTITY_ID = "playlist"
# print(f"PLAYLIST_ENTITY_ID: {PLAYLIST_ENTITY_ID}")

In [28]:
# try:
#     # get entity type, if it already exists
#     pl_entity_type = ff_feature_store.get_entity_type(entity_type_id=PLAYLIST_ENTITY_ID)
# except:
#     # else, create entity type
#     pl_entity_type = ff_feature_store.create_entity_type(
#         entity_type_id=PLAYLIST_ENTITY_ID, description="Playlist Entity", sync=True
#     )

In [29]:
# pl_entity_type

## Create features for each entity type

* see available Feature Store data types in the [API reference](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.featurestores.entityTypes.features#valuetype)
* See the [Spotify Developer API](https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features) docs for more details on the candidate track audio features

In a featurestore, each entity must have a unique ID and must be of type `STRING`

### create features for `track` entity

In [30]:
import src.features.feature_store_configs as fs_configs

TRACK_ENTITY_ID_FIELD = "track_uri_can"

In [32]:
track_feature_configs = fs_configs.TRACK_FEATURE_CONFIGS
# track_feature_configs

### create features for `playlist` entity

**TODO**

In [33]:
# PLAYLIST_ENTITY_ID_FIELD = "pid" # TODO
playlist_feature_configs = fs_configs.PLAYLIST_FEATURE_CONFIGS

## Ingest feature values in Vertex AI Feature Store

In [34]:
# TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
# print(f"TIMESTAMP            : {TIMESTAMP}")

### Ingest `track` features

In [43]:
# track_entity_type.list_features()

In [47]:
# track_feature_ids = track_entity_type.batch_create_features(
#     feature_configs=track_feature_configs, sync=True
# )
pprint(f"track_feature_ids: {track_feature_ids}")

# track_feature_ids.list_features()

('track_feature_ids: '
 '<google.cloud.aiplatform.featurestore.entity_type.EntityType object at '
 '0x7f47a0455490> \n'
 'resource name: '
 'projects/934903580331/locations/us-central1/featurestores/v3_sp_mpd_tracks_ndr_v1/entityTypes/track')


In [49]:
TRACKS_FEATURES_IDS = [
    feature.name for feature in track_entity_type.list_features()
]
pprint(f"TRACKS_FEATURES_IDS: {TRACKS_FEATURES_IDS}")

# track_entity_type.list_features()

("TRACKS_FEATURES_IDS: ['artist_genres_can', 'track_name_can', "
 "'track_acousticness_can', 'track_key_can', 'album_name_can', "
 "'track_tempo_can', 'track_danceability_can', 'artist_name_can', "
 "'track_liveness_can', 'track_mode_can', 'track_pop_can', "
 "'artist_followers_can', 'artist_uri_can', 'track_instrumentalness_can', "
 "'track_loudness_can', 'duration_ms_can', 'track_time_signature_can', "
 "'track_energy_can', 'album_uri_can', 'track_valence_can', "
 "'track_speechiness_can', 'artist_pop_can']")


In [50]:
# TRACKS_FEATURES_IDS

'track_uri_can'

In [41]:
TRACKS_FS_BQ_URI = f"bq://{TRACKS_FS_BQ_TABLE_URI}"
print(f"TRACKS_BQ_SOURCE_URI : {TRACKS_FS_BQ_URI}")

TRACKS_BQ_SOURCE_URI : bq://hybrid-vertex.spotify_e2e_test.candidate_tracks_fs


In [51]:
start_time = time.time()

track_entity_type.ingest_from_bq(
    feature_ids=TRACKS_FEATURES_IDS,
    feature_time=TRACK_FEATURE_TIMESTAMP,
    bq_source_uri=TRACKS_FS_BQ_URI,
    entity_id_field=TRACK_ENTITY_ID_FIELD,
    disable_online_serving=False,
    worker_count=10,
    sync=True,
)

elapsed_ingest_mins = int((time.time() - start_time) / 60)
print(f"elapsed_ingest_mins: {elapsed_ingest_mins}")

elapsed_ingest_mins: 6


In [52]:
track_entity_type.resource_name

'projects/934903580331/locations/us-central1/featurestores/v3_sp_mpd_tracks_ndr_v1/entityTypes/track'

### Ingest `playlist` features

**TODO**

In [53]:
# pl_feature_ids = track_entity_type.batch_create_features(
#     feature_configs=playlist_feature_configs, sync=True
# )

# PL_FEATURES_IDS = [
#     feature.name for feature in pl_feature_ids.list_features()
# ]
# print(f"PL_FEATURES_IDS: {PL_FEATURES_IDS}")

In [54]:
# PLAYLIST_FS_BQ_URI = f"bq://{TRACKS_FS_BQ_TABLE_URI}"
# print(f"PLAYLIST_FS_BQ_URI : {PLAYLIST_FS_BQ_URI}")

In [55]:
# start_time = time.time()

# track_entity_type.ingest_from_bq(
#     feature_ids=TRACKS_FEATURES_IDS,
#     feature_time=TRACK_FEATURE_TIMESTAMP,
#     bq_source_uri=TRACKS_FS_BQ_URI,
#     entity_id_field=TRACK_ENTITY_ID_FIELD,
#     disable_online_serving=False,
#     worker_count=10,
#     sync=True,
# )

# elapsed_ingest_mins = int((time.time() - start_time) / 60)
# print(f"elapsed_ingest_mins: {elapsed_ingest_mins}")

track_entity_type**Monitor ingestion job in the console.**

> The ingestion jobs you just created run asynchronously and they should take several minutes to complete. Please monitoring them in the [console](https://console.cloud.google.com/vertex-ai/ingestion-jobs).

In [56]:
# track_entity_type.to_dict()

In [58]:
# feature_names = track_entity_type.list_features()
# list_of_names = []
# for ele in feature_names:
#     list_of_names.append(ele)
    
# list_of_names

In [59]:
# list_of_names[0]

### Search for feature values

> Run a search query on your feature store to validate that some data was ingested as expected.

#### search `track` features

In [60]:
# return all feature fields for an Entity
track_aggregated_features = track_entity_type.read(
    entity_ids=[
        "spotify:track:44FKqeyePqfAcWfJKJkpGy", 
        "spotify:track:2JozsL1ayPPjrZsOlQwHuk", 
        "spotify:track:5VGz4dSlyNwcPgokpwHKtr",
    ],
    feature_ids=TRACKS_FEATURES_IDS,
)

track_aggregated_features

Unnamed: 0,entity_id,artist_genres_can,track_name_can,track_acousticness_can,track_key_can,album_name_can,track_tempo_can,track_danceability_can,artist_name_can,track_liveness_can,...,artist_uri_can,track_instrumentalness_can,track_loudness_can,duration_ms_can,track_time_signature_can,track_energy_can,album_uri_can,track_valence_can,track_speechiness_can,artist_pop_can
0,spotify:track:2JozsL1ayPPjrZsOlQwHuk,"'blues rock', 'jam band'",Tweezer Reprise,0.594,7,LivePhish 04/03/98,96.06,0.276,Phish,0.829,...,spotify:artist:5wbIWUzTPuTxTyG6ouQKqz,0.167,-5.03,204826.0,4,0.944,spotify:album:251YMVId8YBkTapKyYgExP,0.585,0.0502,57.0
1,spotify:track:44FKqeyePqfAcWfJKJkpGy,"'classic rock', 'cosmic american', 'country ro...",That's It For The Other One [Live in San Franc...,0.484,2,So Many Roads [1965-1995],106.184,0.369,Grateful Dead,0.367,...,spotify:artist:4TMHGUX5WI7OOm53PqSDAT,0.317,-13.969,1253226.0,4,0.449,spotify:album:7mPptCcvPGwhCBGtqNvjk5,0.627,0.0405,68.0
2,spotify:track:5VGz4dSlyNwcPgokpwHKtr,"'athens indie', 'jam band', 'roots rock', 'sou...",Junior,0.000859,2,Ain't Life Grand,184.642,0.179,Widespread Panic,0.164,...,spotify:artist:54SHZF2YS3W87xuJKSvOVf,0.00165,-7.973,273640.0,4,0.796,spotify:album:1dVzzHkYH4vs1qwtJOk0rU,0.883,0.0373,52.0


In [61]:
track_aggregated_features.keys()

Index(['entity_id', 'artist_genres_can', 'track_name_can',
       'track_acousticness_can', 'track_key_can', 'album_name_can',
       'track_tempo_can', 'track_danceability_can', 'artist_name_can',
       'track_liveness_can', 'track_mode_can', 'track_pop_can',
       'artist_followers_can', 'artist_uri_can', 'track_instrumentalness_can',
       'track_loudness_can', 'duration_ms_can', 'track_time_signature_can',
       'track_energy_can', 'album_uri_can', 'track_valence_can',
       'track_speechiness_can', 'artist_pop_can'],
      dtype='object')

In [62]:
track_aggregated_features.columns

Index(['entity_id', 'artist_genres_can', 'track_name_can',
       'track_acousticness_can', 'track_key_can', 'album_name_can',
       'track_tempo_can', 'track_danceability_can', 'artist_name_can',
       'track_liveness_can', 'track_mode_can', 'track_pop_can',
       'artist_followers_can', 'artist_uri_can', 'track_instrumentalness_can',
       'track_loudness_can', 'duration_ms_can', 'track_time_signature_can',
       'track_energy_can', 'album_uri_can', 'track_valence_can',
       'track_speechiness_can', 'artist_pop_can'],
      dtype='object')

In [63]:
# return a subset of feature fields for an Entity
read_track_feats_test = track_entity_type.read(
    entity_ids=[
        "spotify:track:44FKqeyePqfAcWfJKJkpGy", 
        "spotify:track:2JozsL1ayPPjrZsOlQwHuk", 
        "spotify:track:5VGz4dSlyNwcPgokpwHKtr",
    ],
    feature_ids=["track_name_can", "artist_name_can", "artist_genres_can"],
)
# display the dataframe
read_track_feats_test.head()

Unnamed: 0,entity_id,track_name_can,artist_name_can,artist_genres_can
0,spotify:track:2JozsL1ayPPjrZsOlQwHuk,Tweezer Reprise,Phish,"'blues rock', 'jam band'"
1,spotify:track:44FKqeyePqfAcWfJKJkpGy,That's It For The Other One [Live in San Franc...,Grateful Dead,"'classic rock', 'cosmic american', 'country ro..."
2,spotify:track:5VGz4dSlyNwcPgokpwHKtr,Junior,Widespread Panic,"'athens indie', 'jam band', 'roots rock', 'sou..."


#### search `playlist` features

In [44]:
# pl_aggregated_features = pl_entity_type.read(
#     entity_ids=[
#         "XXXXX", 
#         "XXXXX", 
#         "XXXXX"
#     ],
#     feature_ids=PL_FEATURES_IDS,
# )

In [45]:
# pl_aggregated_features

In [None]:
# pl_aggregated_features.columns

## Debugging

In [46]:
ff_feature_store.list_entity_types()

[<google.cloud.aiplatform.featurestore.entity_type.EntityType object at 0x7fd960410610> 
 resource name: projects/934903580331/locations/us-central1/featurestores/v2_sp_mpd_candidate_tracks_ndr_v1/entityTypes/tracks]

In [48]:
# ff_feature_store. 

# Clean up

> run the following command to delete the FS

In [None]:
# ff_feature_store.delete(sync=True, force=True)

**Finished**