Version: 0.0.2  Updated date: 07/05/2024
Conda Environment : py-snowpark_df_ml_fs-1.15.0_v1

# Getting Started with Snowflake Feature Store - Â Customer Segmentation
We will use the Use-Case to show how Snowflake Feature Store (and Model Registry) can be used to maintain & store features, retrieve them for training and perform micro-batch inference.

In the development (TRAINING) enviroment we will 
- create FeatureViews in the Feature Store that maintain the required customer-behaviour features.
- use these Features to train a model, and save the model in the Snowflake model-registry.
- plot the clusters for the trained model to visually verify. 

In the production (SERVING) environment we will
- re-create the FeatureViews on production data
- generate an Inference FeatureView that uses the saved model to perform incremental inference

# Model Operationalisation in Production

In [16]:
%load_ext autoreload
%autoreload 2

[autoreload of snowflake.ml.fileset.stage_fs failed: Traceback (most recent call last):
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 500, in superreload
    update_generic(old_obj, new_obj)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 397, in update_generic
    update(a, b)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 349, in update_class
    if update_generic(old_obj, new_obj):
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 397, in update_generic
    update(a, b)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/pytho

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


[autoreload of snowflake.ml.registry.registry failed: Traceback (most recent call last):
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 276, in check
    superreload(m, reload, self.old_objects)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 500, in superreload
    update_generic(old_obj, new_obj)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 397, in update_generic
    update(a, b)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 349, in update_class
    if update_generic(old_obj, new_obj):
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/python3.10/site-packages/IPython/extensions/autoreload.py", line 397, in update_generic
    update(a, b)
  File "/opt/miniconda3/envs/py-snowpark_df_ml_fs/lib/pyth

#### Notebook Packages

In [17]:
# Python packages
import os
import json

# SNOWFLAKE
# Snowpark
from snowflake.snowpark import Session, DataFrame, Window, WindowSpec
#from snowflake.snowpark import Analytics

import snowflake.snowpark.functions as F

# Snowflake Feature Store
from snowflake.ml.feature_store import (FeatureView, Entity)

# COMMON FUNCTIONS
from useful_fns import check_and_update, formatSQL, create_ModelRegistry, create_FeatureStore, create_SF_Session

# Feature Engineering Functions
from feature_engineering_fns import uc01_load_data, uc01_pre_process

### Setup Snowflake connection and database parameters

We point the `tpcxai_schema` variable to our `SERVING` schema, and this one change allows us to recreate the model development pipeline in production.

In [18]:
# Set the Schema (Environment)
tpcxai_schema = 'SERVING'

In [19]:
fs_qs_role, tpcxai_database, tpcxai_training_schema, session, warehouse_env = create_SF_Session(tpcxai_schema)




Connection Established with the following parameters:
User                        : JARCHEN
Role                        : "FS_QS_ROLE"
Database                    : "TPCXAI_SF0001_QUICKSTART_INC"
Schema                      : "SERVING"
Warehouse                   : "TPCXAI_SF0001_QUICKSTART_WH"
Snowflake version           : 9.37.1
Snowpark for Python version : 1.38.0 



### MODEL OPERATIONALISATION
* Recreate production Entity, FeatureViews in Production FeatureStore
* Reuse the model fitted in development/training
* Create new Inference FeatureView for incremental model-inference

#### Setup Production Feature Store and references

In [24]:
# Create/Reference Snowflake Model Registry - Common across Environments
mr = create_ModelRegistry(session, tpcxai_database, 'MODEL_1')

# Create/Reference Snowflake Feature Store for Training (Development) Environment
fs = create_FeatureStore(session, tpcxai_database, f'''_{tpcxai_schema}_FEATURE_STORE''', warehouse_env)

### Reference Data to Snowflake Dataframe Objects
# Tables
line_item_tbl              = '.'.join([tpcxai_database, tpcxai_schema,'LINEITEM'])
order_tbl                  = '.'.join([tpcxai_database, tpcxai_schema,'ORDERS'])
order_returns_tbl          = '.'.join([tpcxai_database, tpcxai_schema,'ORDER_RETURNS'])

# Snowpark Dataframe
line_item_sdf              = session.table(line_item_tbl)
order_sdf                  = session.table(order_tbl)
order_returns_sdf          = session.table(order_returns_tbl)
print('''--- Created Data References ---''')

# Model Name
model_name = "MODEL_1.UC01_SNOWFLAKEML_RF_REGRESSOR_MODELSKLEARN"


Model Registry (MODEL_1) already exists
Feature Store (_SERVING_FEATURE_STORE) already exists
--- Created Data References ---


We can now rerun the exact same code that we lifted from our Development (TRAINING) process to recreate the Feature Engineering pipelines in production

In [21]:
### ORDER Entity
if "ORDER" not in json.loads(fs.list_entities().select(F.to_json(F.array_agg("NAME", True))).collect()[0][0]):
    customer_entity = Entity(name="ORDER", join_keys=["O_CUSTOMER_SK"],desc="Primary Key for CUSTOMER ORDER")
    fs.register_entity(customer_entity)
else:
    customer_entity = fs.get_entity("ORDER")
print('''--- Created CUSTOMER Entity ---''')

### Create & Load Source Data
raw_data = uc01_load_data(order_sdf, line_item_sdf, order_returns_sdf)
print('''--- Created Source Data ---''')

### Create & Run Preprocessing Function 
preprocessed_data = uc01_pre_process(raw_data)
print('''--- Created Preprocessed Data ---''')

### Create Preprocessing FeatureView from Preprocess Dataframe (SQL)
ppd_fv_name = "FV_UC01_PREPROCESS"
ppd_fv_version = "V_1"
# Define descriptions for the FeatureView's Features.  These will be added as comments to the database object
preprocess_features_desc = { "FREQUENCY":"Average yearly order frequency",
                             "RETURN_RATIO":"Average of, Per Order Returns Ratio.  Per order returns ratio : total returns value / total order value" }
# Create Inference Feature View
try:
    # If FeatureView already exists just return the reference to it
    fv_uc01_preprocess = fs.get_feature_view(name=ppd_fv_name,version=ppd_fv_version)
except:
    # Create the FeatureView instance
    fv_uc01_preprocess_instance = FeatureView(
        name=ppd_fv_name, 
        entities=[customer_entity], 
        feature_df=preprocessed_data,      # <- We can use the snowpark dataframe as-is from our Python
        timestamp_col="LATEST_ORDER_DATE",
        refresh_freq="60 minute",           # <- specifying optional refresh_freq creates FeatureView as Dynamic Table, else created as View.
        refresh_mode="INCREMENTAL",
        desc="Features to support Use Case 01").attach_feature_desc(preprocess_features_desc)

    # Register the FeatureView instance.  Creates  object in Snowflake
    fv_uc01_preprocess = fs.register_feature_view(
        feature_view=fv_uc01_preprocess_instance, 
        version=ppd_fv_version, 
        block=True
    )
    print(f"Feature View : {ppd_fv_name}_{ppd_fv_version} created in {tpcxai_schema}")   
else:
    print(f"Feature View : {ppd_fv_name}_{ppd_fv_version} already created in {tpcxai_schema}")

print('''---            DONE               ---''')


--- Created CUSTOMER Entity ---
--- Created Source Data ---
--- Created Preprocessed Data ---
Feature View : FV_UC01_PREPROCESS_V_1 already created in SERVING
---            DONE               ---


#### Create Scheduled Inference Pipeline

We now recreate our model inference process that will
- retrieve the latest version of the model from the Model Registry.
- read features from our feature pipeline (fv_uc01_preprocess featureview)
- pass features & model into inference function (uc01_serve) and return inference dataframe
- use inference dataframe to define a new FeatureView to maintain inference process

In [22]:
# Create an Inference Dataframe that reads from our feature-engineering pipeline
inference_input_sdf = fs.read_feature_view(fv_uc01_preprocess)
inference_input_sdf.show()

---------------------------------------------------------------------------------------------
|"O_CUSTOMER_SK"  |"LATEST_ORDER_DATE"  |"FREQUENCY"  |"RETURN_RATIO"  |"RETURN_ROW_PRICE"  |
---------------------------------------------------------------------------------------------
|2779             |2025-11-29           |2.000        |1.000           |69.000              |
|2292             |2025-11-29           |1.500        |1.000           |18.000              |
|2879             |2025-11-29           |1.500        |1.000           |59.000              |
|5019             |2025-11-29           |3.000        |1.000           |24.000              |
|5203             |2025-11-29           |2.500        |1.000           |48.000              |
|5378             |2025-11-29           |3.000        |1.000           |71.000              |
|6994             |2025-11-29           |2.000        |1.000           |39.000              |
|6546             |2025-11-29           |2.500        |1.000

In [25]:
# Get latest version of the model
m = mr.get_model(model_name)
m.show_versions()

Unnamed: 0,created_on,name,aliases,comment,database_name,schema_name,model_name,is_default_version,functions,metadata,user_data,model_attributes,size,environment,runnable_in,inference_services
0,2025-11-29 04:09:42.154000-08:00,KIND_DRAGON_1,"[""DEFAULT"",""FIRST"",""LAST""]",TPCXAI USE CASE 01 - XGB Regressor,TPCXAI_SF0001_QUICKSTART_INC,MODEL_1,UC01_SNOWFLAKEML_RF_REGRESSOR_MODELSKLEARN,True,"[""EXPLAIN"",""PREDICT""]",{},{},"{""framework"":""sklearn"",""task"":""TABULAR_REGRESS...",4477839,"{""default"":{""python_version"":""3.10"",""snowflake...","[""WAREHOUSE"",""SNOWPARK_CONTAINER_SERVICES""]",[]


In [26]:
# Get latest version of the model
m = mr.get_model(model_name)
latest_version = m.show_versions().iloc[-1]['name']
mv = m.version(latest_version)

In [27]:
def uc01_serve(featurevector, km4_purchases) -> DataFrame:
    return km4_purchases.run(featurevector, function_name="predict")

# Test Inference process
inference_result_sdf = uc01_serve(inference_input_sdf, mv)
inference_result_sdf.sort(F.col('LATEST_ORDER_DATE').desc(), F.col('O_CUSTOMER_SK')).show()

  _validate_snowpark_type_feature(


------------------------------------------------------------------------------------------------------------------
|"O_CUSTOMER_SK"  |"LATEST_ORDER_DATE"  |"FREQUENCY"  |"RETURN_RATIO"  |"RETURN_ROW_PRICE"  |"output_feature_0"  |
------------------------------------------------------------------------------------------------------------------
|1859             |2025-11-30           |2.000        |1.000           |35.000              |44.71792984008789   |
|2441             |2025-11-30           |3.000        |1.000           |38.000              |45.39448928833008   |
|3037             |2025-11-30           |1.000        |1.000           |26.000              |46.56095504760742   |
|4135             |2025-11-30           |2.000        |1.000           |16.000              |44.71792984008789   |
|4165             |2025-11-30           |3.000        |1.000           |37.000              |45.39448928833008   |
|4183             |2025-11-30           |2.000        |1.000           |89.000  

We can see in the SQL output below how our model is packaged and called from SQL `MODEL_VERSION_ALIAS!PREDICT(RETURN_RATIO, FREQUENCY) AS TMP_RESULT`

In [28]:
ind_sql = inference_result_sdf.queries['queries'][0]
ind_fmtd_sql = os.linesep.join(ind_sql.split(os.linesep)[:1000])
print(ind_fmtd_sql)

SELECT 
    "O_CUSTOMER_SK", 
    "LATEST_ORDER_DATE", 
    "FREQUENCY", 
    "RETURN_RATIO", 
    "RETURN_ROW_PRICE", 
     CAST ("TMP_RESULT_N7JQ322FV4"['output_feature_0'] AS FLOAT) AS "output_feature_0"
 FROM (
WITH SNOWPARK_ML_MODEL_INFERENCE_INPUT_6KHH0VUNXD AS (SELECT * FROM TPCXAI_SF0001_QUICKSTART_INC._SERVING_FEATURE_STORE.FV_UC01_PREPROCESS$V_1),MODEL_VERSION_ALIAS_UI7TZVES96 AS MODEL TPCXAI_SF0001_QUICKSTART_INC.MODEL_1.UC01_SNOWFLAKEML_RF_REGRESSOR_MODELSKLEARN VERSION KIND_DRAGON_1
                SELECT *,
                    MODEL_VERSION_ALIAS_UI7TZVES96!PREDICT(RETURN_RATIO, FREQUENCY) AS TMP_RESULT_N7JQ322FV4
                FROM SNOWPARK_ML_MODEL_INFERENCE_INPUT_6KHH0VUNXD
)


### Create & Register Inference-FeatureView to run scheduled Inference

We can now define a new Inference Feature View using our Spine and Dataframe reading from our Feature Engineering pipeline.  The FeatureView when created as a Dynamic Table will run to the required refresh_freq and automatically perform incremental inference on new data that arrives through the pipeline.

In [29]:
## Create & Register Inference-FeatureView to run scheduled Inference
inf_fvname = "FV_UC01_INFERENCE_RESULT"
inf_fv_version = "V_1"

inference_features_desc = { "FREQUENCY":"Average yearly order frequency",
                              "RETURN_RATIO":"Average of, Per Order Returns Ratio.  Per order returns ratio : total returns value / total order value", 
                              "OUTPUT_RETURN_ROW_PRICE":f"Predicted Return Price for XGB Model (UC01) using Model Registry ({tpcxai_database} MODEL_1) Model ({mv.model_name}) Model-Version({mv.version_name})  Model Comment ({mv.comment}"}

try:
   fv_uc01_inference_result = fs.get_feature_view(name= inf_fvname, version= inf_fv_version)
except:
   fv_uc01_inference_result = FeatureView(
         name= inf_fvname, 
         entities=[customer_entity], 
         feature_df=inference_result_sdf,
         refresh_freq="60 minute",
         refresh_mode="INCREMENTAL",
         desc="Inference Result from kmeans model for Use Case 01").attach_feature_desc(inference_features_desc)
   
   fv_uc01_inference_result = fs.register_feature_view(
         feature_view=fv_uc01_inference_result, 
         version= inf_fv_version, 
         block=True
   )
   print(f"Inference Feature View : fv_uc01_inference_result_{inf_fv_version} created")   
else:
   print(f"Inference Feature View : fv_uc01_inference_result_{inf_fv_version} already created")
finally:
   fs_serving_fviews = fs.list_feature_views().filter(F.col("NAME") == inf_fvname ).sort(F.col("VERSION").desc())
   fs_serving_fviews.show()  

Inference Feature View : fv_uc01_inference_result_V_1 already created
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"NAME"                    |"VERSION"  |"DATABASE_NAME"               |"SCHEMA_NAME"           |"CREATED_ON"                |"OWNER"     |"DESC"                                              |"ENTITIES"  |"REFRESH_FREQ"  |"REFRESH_MODE"  |"SCHEDULING_STATE"  |"WAREHOUSE"                  |"CLUSTER_BY"       |"ONLINE_CONFIG"                                |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In [30]:
fv_uc01_inference_result

FeatureView(_name=FV_UC01_INFERENCE_RESULT, _entities=[Entity(name=ORDER, join_keys=['O_CUSTOMER_SK'], owner=None, desc=Primary Key for CUSTOMER ORDER)], _feature_df=<snowflake.snowpark.dataframe.DataFrame object at 0x17810c640>, _timestamp_col=None, _desc=Inference Result from kmeans model for Use Case 01, _infer_schema_df=<snowflake.snowpark.dataframe.DataFrame object at 0x3196e2920>, _query=SELECT 
    "O_CUSTOMER_SK", 
    "LATEST_ORDER_DATE", 
    "RETURN_ROW_PRICE", 
     CAST ("TMP_RESULT_UO4AINIPEE"['RETURN_RATIO'] AS DOUBLE) AS "RETURN_RATIO", 
     CAST ("TMP_RESULT_UO4AINIPEE"['FREQUENCY'] AS DOUBLE) AS "FREQUENCY", 
     CAST ("TMP_RESULT_UO4AINIPEE"['OUTPUT_RETURN_ROW_PRICE'] AS DOUBLE) AS "OUTPUT_RETURN_ROW_PRICE"
 FROM (
WITH SNOWPARK_ML_MODEL_INFERENCE_INPUT_O6SFBJYGYP AS (SELECT * FROM TPCXAI_SF0001_QUICKSTART_INC._SERVING_FEATURE_STORE.FV_UC01_PREPROCESS$V_1),MODEL_VERSION_ALIAS_G89Y7A4V1K AS MODEL TPCXAI_SF0001_QUICKSTART_INC.MODEL_1.UC01_SNOWFLAKEML_XGB_REGRESSOR_MO

In [31]:
fv_uc01_inference_result.feature_df.sort(F.col("LATEST_ORDER_DATE").desc()).show(100)

-------------------------------------------------------------------------------------------------------------------------
|"O_CUSTOMER_SK"  |"LATEST_ORDER_DATE"  |"RETURN_ROW_PRICE"  |"RETURN_RATIO"  |"FREQUENCY"  |"OUTPUT_RETURN_ROW_PRICE"  |
-------------------------------------------------------------------------------------------------------------------------
|5984             |2025-11-30           |52.000              |1.0             |3.5          |46.3164176940918           |
|2163             |2025-11-30           |34.000              |1.0             |3.0          |45.40508270263672          |
|7064             |2025-11-30           |22.000              |1.0             |3.0          |45.40508270263672          |
|5804             |2025-11-30           |27.000              |1.0             |2.0          |44.68193435668945          |
|4135             |2025-11-30           |16.000              |1.0             |2.0          |44.68193435668945          |
|2441             |2025-

## CLEAN UP

In [14]:
session.close()

In [15]:
from datetime import datetime
from zoneinfo import ZoneInfo
formatted_time = datetime.now(ZoneInfo("Australia/Melbourne")).strftime("%A, %B %d, %Y %I:%M:%S %p %Z")

print(f"The last run time in Melbourne is: {formatted_time}")

The last run time in Melbourne is: Friday, November 21, 2025 12:18:52 AM AEDT
