- Last updated on: 7/10/2024
- Required snowflake-ml-python version: >=1.5.4

# Feature Store API Overview

This notebook provides an overview of Feature Store APIs. It demenstrates how to manage Feature Store, Feature View, Feature Entity and how to generate training dataset etc. The goal is to provide a quick walkthrough of most common APIs. For full list of APIs, please refer to [API Reference page](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/feature_store).

**Table of contents**:
- [Setup test environment](#setup-test-environment)
- [Manage features in Feature Store](#manage-features-in-feature-store)
  - [Initialize a Feature Store](#initialize-a-feature-store)
  - [Create entities](#create-entities)
  - [Create feature views](#create-feature-views)
  - [Add feature view versions](#add-feature-view-versions)
  - [Update feature views](#update-feature-views)
  - [Operate feature views](#operate-feature-views)
  - [Read values from a feature view](#read-values-from-a-feature-view)
  - [Generate training data](#generate-training-data)
  - [Delete feature views](#delete-feature-views)
  - [Delete entities](#delete-entities)
  - [Cleanup Feature Store](#cleanup-feature-store)
- [Cleanup notebook](#cleanup-notebook)

<a id='setup-test-environment'></a>
## Setup test environment

Let's start with setting up test environment. We will create a session and a schema. The schema`FS_DEMO_SCHEMA` will be used as the Feature Store. It will be cleaned up at the end of the demo. You need to fill the `connection_parameters` with your Snowflake connection information. Follow this **[guide](https://docs.snowflake.com/en/developer-guide/snowpark/python/creating-session)** for more details about how to connect to Snowflake.

In [2]:
from snowflake.snowpark import Session

connection_parameters = {
    "account": "<your snowflake account>",
    "user": "<your snowflake user>",
    "password": "<your snowflake password>",
    "role": "<your snowflake role>",  # optional
    "warehouse": "<your snowflake warehouse>",  # optional
    "database": "<your snowflake database>",  # optional
    "schema": "<your snowflake schema>",  # optional
}

session = Session.builder.configs(connection_parameters).create()

# Uncomment below lines to specify database/warehouse if not already specified in connection_parameters
# session.use_database(<your_database>)
# session.use_warehouse(<your_warehouse>)
assert session.get_current_database() != None, "Session must have a database for the demo."
assert session.get_current_warehouse() != None, "Session must have a warehouse for the demo."

In [3]:
# The schema where Feature Store will initialize on and test dataset stores.
FS_DEMO_SCHEMA = "SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO"

session.sql(f"CREATE OR REPLACE SCHEMA {FS_DEMO_SCHEMA}").collect()

[Row(status='Schema SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO successfully created.')]

We have prepared couple examples which you can find in our [open source repo](https://github.com/snowflakedb/snowflake-ml-python/tree/main/snowflake/ml/feature_store/examples). For each example, it contains the source dataset, feature view and entity defintions which will be used along this demo. Below cell picked one example called "simple_features", and `setup_datasets()` will load the dataset into Snowflake.

In [4]:
from snowflake.ml.feature_store.examples.example_helper import ExampleHelper

helper = ExampleHelper(session, session.get_current_database(), FS_DEMO_SCHEMA)
print(f"All examples: {helper.list_examples()}")

helper.select_example('simple_features')
source_tables = helper.setup_datasets()

All examples: ['simple_features']


We can quickly peek the new generated source tables.

In [5]:
for s in source_tables:
    total_rows = session.table(s).count()
    print(f"Total rows in {s}: {total_rows}")

Total rows in "REGTEST_DB".SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO.citibike_trips: 85304


<a id='manage-features-in-feature-store'></a>
## Manage features in Feature Store

Now we're ready to play Feature Store. Below sections showcase how to create Feature Store, entities, feature views and how to operate them.

<a id='initialize-a-feature-store'></a>
### Initialize a Feature Store

Firstly, we create a new or connect to an existing Feature Store.

In [6]:
from snowflake.ml.feature_store import (
    FeatureStore,
    FeatureView,
    Entity,
    CreationMode,
    FeatureViewStatus,
)

fs = FeatureStore(
    session=session, 
    database=session.get_current_database(), 
    name=FS_DEMO_SCHEMA, 
    default_warehouse=session.get_current_warehouse(),
    creation_mode=CreationMode.CREATE_IF_NOT_EXIST,
)

<a id='create-entities'></a>
### Create entities

Before creating feature views, we need to create entities first. Below cell register entities that pre-defined for this example, and loaded by `helper.load_entities()`.

In [7]:
for e in helper.load_entities():
    fs.register_entity(e)
all_entities_df = fs.list_entities()
assert all_entities_df.count() == 1, "Total 1 entity registered."
all_entities_df.show()

---------------------------------------------------------------------
|"NAME"          |"JOIN_KEYS"         |"DESC"          |"OWNER"     |
---------------------------------------------------------------------
|END_STATION_ID  |["END_STATION_ID"]  |End Station Id  |REGTEST_RL  |
---------------------------------------------------------------------



You can get registered entity by name from Feature Store.

In [8]:
my_entity = fs.get_entity('end_station_id')

<a id='create-feature-views'></a>
### Create feature views

Next, we can register feature views. Feature views are pre-defined in our repository. You can find the definitions [here](https://github.com/snowflakedb/snowflake-ml-python/tree/main/snowflake/ml/feature_store/examples).

In [9]:
for fv in helper.load_draft_feature_views():
    fs.register_feature_view(
        feature_view=fv,
        version='1.0'
    )

all_fvs_df = fs.list_feature_views().select('name', 'version', 'desc', 'refresh_freq')
assert all_fvs_df.count() == 2, "Total 2 feature views registered."
all_fvs_df.show()

------------------------------------------------------------------------------------------------------
|"NAME"            |"VERSION"  |"DESC"                                              |"REFRESH_FREQ"  |
------------------------------------------------------------------------------------------------------
|F_STATION_1D      |1.0        |Managed feature view about trip station refresh...  |1 day           |
|F_STATION_STATIC  |1.0        |Static feature view about trip station.             |NULL            |
------------------------------------------------------------------------------------------------------



<a id='add-feature-view-versions'></a>
### Add feature view versions

We can also add new version in a feature view by giving same name as existing feature view but different version.

In [10]:
for fv in helper.load_draft_feature_views():
    fv.desc = f'{fv.name}/2.0 with new desc.'
    fs.register_feature_view(
        feature_view=fv,
        version='2.0'
    )

all_fvs_df = fs.list_feature_views().select('name', 'version', 'desc', 'refresh_freq')
all_fvs_df.show()
assert all_fvs_df.count() == 4, "Total 4 feature views registered."

  fv.desc = f'{fv.name}/2.0 with new desc.'


------------------------------------------------------------------------------------------------------
|"NAME"            |"VERSION"  |"DESC"                                              |"REFRESH_FREQ"  |
------------------------------------------------------------------------------------------------------
|F_STATION_1D      |1.0        |Managed feature view about trip station refresh...  |1 day           |
|F_STATION_1D      |2.0        |F_STATION_1D/2.0 with new desc.                     |1 day           |
|F_STATION_STATIC  |1.0        |Static feature view about trip station.             |NULL            |
|F_STATION_STATIC  |2.0        |F_STATION_STATIC/2.0 with new desc.                 |NULL            |
------------------------------------------------------------------------------------------------------



<a id='update-feature-views'></a>
### Update feature views

After a feature view registered, it is materialized to Snowflake backend. You can still update some metadata for a registered feature view with `update_feature_view`. Below cell updates the `desc` of a managed feature view. You can check our [API reference](https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/api/feature_store/snowflake.ml.feature_store.FeatureStore) page to find full list of metadata can be updated.

In [11]:
updated_fv = fs.update_feature_view(
    name='f_station_1d',
    version='1.0',
    desc=f'Updated desc for f_station_1d.', 
)

assert updated_fv.desc == 'Updated desc for f_station_1d.'
fs.list_feature_views(feature_view_name='f_station_1d') \
    .select('name', 'version', 'desc', 'refresh_freq', 'scheduling_state').show()

----------------------------------------------------------------------------------------------------
|"NAME"        |"VERSION"  |"DESC"                           |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------------
|F_STATION_1D  |1.0        |Updated desc for f_station_1d.   |1 day           |ACTIVE              |
|F_STATION_1D  |2.0        |F_STATION_1D/2.0 with new desc.  |1 day           |ACTIVE              |
----------------------------------------------------------------------------------------------------



<a id='operate-feature-views'></a>
### Operate feature views

For **managed feature views**, you can suspend, resume, or manually refresh the backend pipelines. A managed feature view is an automated feature pipeline that compute the features on a given schedule. You create managed feature view by setting the `refresh_freq`. On the contrast, a **static feature view** is created when `refresh_freq` is set to None.

In [12]:
registered_fv = fs.get_feature_view('f_station_1d', '1.0')
suspended_fv = fs.suspend_feature_view(registered_fv)
assert suspended_fv.status == FeatureViewStatus.SUSPENDED
fs.list_feature_views().select('name', 'version', 'desc', 'refresh_freq', 'scheduling_state').show()

----------------------------------------------------------------------------------------------------------------
|"NAME"            |"VERSION"  |"DESC"                                   |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------------------------
|F_STATION_1D      |1.0        |Updated desc for f_station_1d.           |1 day           |SUSPENDED           |
|F_STATION_1D      |2.0        |F_STATION_1D/2.0 with new desc.          |1 day           |ACTIVE              |
|F_STATION_STATIC  |1.0        |Static feature view about trip station.  |NULL            |NULL                |
|F_STATION_STATIC  |2.0        |F_STATION_STATIC/2.0 with new desc.      |NULL            |NULL                |
----------------------------------------------------------------------------------------------------------------



In [13]:
resumed_fv = fs.resume_feature_view(suspended_fv)
assert resumed_fv.status == FeatureViewStatus.ACTIVE
fs.list_feature_views().select('name', 'version', 'desc', 'refresh_freq', 'scheduling_state').show()

----------------------------------------------------------------------------------------------------------------
|"NAME"            |"VERSION"  |"DESC"                                   |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------------------------
|F_STATION_1D      |1.0        |Updated desc for f_station_1d.           |1 day           |ACTIVE              |
|F_STATION_1D      |2.0        |F_STATION_1D/2.0 with new desc.          |1 day           |ACTIVE              |
|F_STATION_STATIC  |1.0        |Static feature view about trip station.  |NULL            |NULL                |
|F_STATION_STATIC  |2.0        |F_STATION_STATIC/2.0 with new desc.      |NULL            |NULL                |
----------------------------------------------------------------------------------------------------------------



In [14]:
history_df = fs.get_refresh_history(resumed_fv).order_by('REFRESH_START_TIME')
history_df.show()
assert history_df.count() == 1, "Feature view has been refreshed 1 time."

-------------------------------------------------------------------------------------------------------------------------
|"NAME"            |"STATE"    |"REFRESH_START_TIME"              |"REFRESH_END_TIME"                |"REFRESH_ACTION"  |
-------------------------------------------------------------------------------------------------------------------------
|F_STATION_1D$1.0  |SUCCEEDED  |2024-07-10 14:14:30.963000-07:00  |2024-07-10 14:14:31.684000-07:00  |INCREMENTAL       |
-------------------------------------------------------------------------------------------------------------------------



Below cell manually refreshes the feature view. It triggers the feature computation with latest source data. You can check the refreshes history with `get_refresh_history()` and you will see the different result from prevous `get_refresh_history()`.

In [15]:
fs.refresh_feature_view(resumed_fv)
history_df = fs.get_refresh_history(resumed_fv).order_by('REFRESH_START_TIME')
history_df.show()
assert history_df.count() == 2, "Feature view has been refresh 2 times."

-------------------------------------------------------------------------------------------------------------------------
|"NAME"            |"STATE"    |"REFRESH_START_TIME"              |"REFRESH_END_TIME"                |"REFRESH_ACTION"  |
-------------------------------------------------------------------------------------------------------------------------
|F_STATION_1D$1.0  |SUCCEEDED  |2024-07-10 14:14:30.963000-07:00  |2024-07-10 14:14:31.684000-07:00  |INCREMENTAL       |
|F_STATION_1D$1.0  |SUCCEEDED  |2024-07-10 14:15:14.269000-07:00  |2024-07-10 14:15:14.687000-07:00  |INCREMENTAL       |
-------------------------------------------------------------------------------------------------------------------------



<a id='read-values-from-a-feature-view'></a>
### Read values from a feature view 

You can read the feature value of a registered feature view with `read_feature_view()`.

In [16]:
feature_value_df = fs.read_feature_view(resumed_fv)
feature_value_df.show()

-------------------------------------------------------------
|"END_STATION_ID"  |"F_COUNT_1D"  |"F_AVG_TRIPDURATION_1D"  |
-------------------------------------------------------------
|505               |483           |733.002070               |
|161               |429           |603.533800               |
|347               |440           |693.865909               |
|466               |425           |572.520000               |
|459               |456           |665.528509               |
|247               |241           |631.452282               |
|127               |481           |603.519751               |
|2000              |121           |840.289256               |
|514               |272           |947.345588               |
|195               |219           |738.191781               |
-------------------------------------------------------------



<a id='generate-training-data'></a>
### Generate training data

We can generate training data and output as either [Dataset object](https://docs.snowflake.com/en/developer-guide/snowpark-ml/dataset), or as Snowpark DataFrame.

Below cell creates a spine dataframe by randomly sample some entity keys from source table.

In [17]:
entity_key_names = ','.join(my_entity.join_keys)
spine_df = session.sql(f"select {entity_key_names} from {source_tables[0]}").sample(n=1000)

With `generate_dataset()` it outputs a [Dataset object](https://docs.snowflake.com/en/developer-guide/snowpark-ml/dataset).

In [18]:
training_fv = fs.get_feature_view('f_station_1d', '1.0')

my_dataset = fs.generate_dataset(
    name='my_cool_dataset',
    version='first',
    spine_df=spine_df,
    features=[training_fv],
    desc='This is my dataset joined with feature views',
)

Convert dataset to Pandas DataFrame and peek first 10 rows.

In [19]:
my_dataset.read.to_pandas().head(10)

Unnamed: 0,END_STATION_ID,F_COUNT_1D,F_AVG_TRIPDURATION_1D
0,309,174,1078.988525
1,2004,226,919.442505
2,495,240,650.279175
3,250,175,607.251404
4,3002,253,767.241089
5,350,258,625.224792
6,228,236,1166.860229
7,521,956,667.517761
8,523,541,662.408508
9,521,956,667.517761


With `generate_training_set()`, it outputs a Dataframe.

In [20]:
training_data_df = fs.generate_training_set(
    spine_df=spine_df,
    features=[training_fv]
)

training_data_df.show()

-------------------------------------------------------------
|"END_STATION_ID"  |"F_COUNT_1D"  |"F_AVG_TRIPDURATION_1D"  |
-------------------------------------------------------------
|442               |458           |627.288210               |
|359               |315           |613.644444               |
|483               |105           |590.257143               |
|454               |169           |759.047337               |
|442               |458           |627.288210               |
|472               |432           |533.761574               |
|72                |271           |859.778598               |
|337               |112           |946.883929               |
|545               |375           |632.749333               |
|356               |167           |799.245509               |
-------------------------------------------------------------



<a id='delete-feature-views'></a>
### Delete feature views

Feature views can be deleted via `delete_feature_view()`.

In [21]:
for row in fs.list_feature_views().collect():
    fv = fs.get_feature_view(row['NAME'], row['VERSION'])
    fs.delete_feature_view(fv)

all_fvs_df = fs.list_feature_views().select('name', 'version') 
assert all_fvs_df.count() == 0, "0 feature views left after deletion."
all_fvs_df.show()

----------------------
|"NAME"  |"VERSION"  |
----------------------
|        |           |
----------------------



<a id='delete-entities'></a>
### Delete entities

You can delete entity with `delete_entity()`. Note it will check whether there're feature views registered on this entity before it gets deleted, otherwise the deletion will fail.

In [22]:
for row in fs.list_entities().collect():
    fs.delete_entity(row['NAME'])

all_entities_df = fs.list_entities()
assert all_entities_df.count() == 0, "0 entities after deletion."
all_entities_df.show()

-------------------------------------------
|"NAME"  |"JOIN_KEYS"  |"DESC"  |"OWNER"  |
-------------------------------------------
|        |             |        |         |
-------------------------------------------



<a id='cleanup-feature-store'></a>
### Cleanup Feature Store (experimental) 

Currently we provide an experimental API to delete all entities and feature views in a Feature Store. If `dryrun` sets to True (the default) then it only prints objects will be deleted. Otherwise it performs the deletion. 

In [23]:
fs._clear(dryrun=False)

assert fs.list_feature_views().count() == 0, "0 feature views left after deletion."
assert fs.list_entities().count() == 0, "0 entities left after deletion."

  return f(self, *args, **kargs)


<a id='cleanup-notebook'></a>
## Cleanup notebook

In [24]:
session.sql(f"DROP SCHEMA IF EXISTS {FS_DEMO_SCHEMA}").collect()

[Row(status='SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO successfully dropped.')]