Before running this Notebook make sure you have the following Packages:  
*`snowflake-ml-python`

You also need to add the file `demo_data.py` as a **Stage Package**, ie go to **Packages** -> **Stage Packages** and add `@snowpark_demo_db.simple_ml_schema.ASSETS/demo_data.py`

In [1]:
# Import python packages
import streamlit as st
import joblib

from snowflake.ml.feature_store import (
    FeatureStore,
    CreationMode)

from snowflake.ml.registry import Registry
from snowflake.ml.monitoring.entities import model_monitor_config

# from snowflake.snowpark import Session
import snowflake.snowpark.functions as snow_funcs

# Python script stored on a Snowflake stage
from demo_data import generate_demo_data

# Get the Snowpark session
from snowflake.snowpark.context import get_active_session
session = get_active_session()


In [3]:
db_name = "SNOWPARK_DEMO_DB"
schema_name = "SIMPLE_ML_SCHEMA"
fs_schema_name = "SIMPLE_FS_SCHEMA"
mr_schema_name = "SIMPLE_MR_SCHEMA"
wh_name = "SIMPLE_ML_WH"
stage_name = "ASSETS"

session.use_schema(f'{db_name}.{schema_name}')
session.use_warehouse(wh_name)

## Generate new customers for inference

Generate 100 new customers to use, we will have to wait around one minute before they are part of our feature store (the schedule we setup during registring the features)

In [4]:
# Start by generating some new data that we will use for inference
session.use_schema(f'{db_name}.{schema_name}')

# Generate new customers for year 2024
generate_demo_data(session, num_customers=100, month=11, start_year=2024, end_year=2024)

Added 1000 customers to table: CUSTOMER_LIFE_TIME_VALUE
Added 1000 customers to table: CUSTOMER_GENERAL_DATA
Added 1000 customers to table: CUSTOMER_BEHAVIOR_DATA
Added 100 customers to table: CUSTOMER_LIFE_TIME_VALUE
Added 100 customers to table: CUSTOMER_GENERAL_DATA
Added 100 customers to table: CUSTOMER_BEHAVIOR_DATA
------------------------
|"EMAIL"               |
------------------------
|jq7ho1e2K7@KDDoU.com  |
|TynfegHXC5@EK8gf.com  |
|LaU10PhN7X@OCtzz.com  |
|9Wphdu46yz@kmSyh.com  |
|8ktBu2ygwj@U1PcJ.com  |
|QAUnHdGliO@vsOQl.com  |
|9Lnm15ODaM@fO9Ss.com  |
|JVCgRooPMw@sV6cz.com  |
|PZSjb31f9L@Kp66W.com  |
|PW77M5pEsO@epc5D.com  |
------------------------



Greate a Spine DataFrame with the new customers, which will be used to get the features we have for them

In [12]:
# Retrieve new customers
new_customers_df = session.table(f'{db_name}.{schema_name}.CUSTOMER_LIFE_TIME_VALUE').filter(snow_funcs.col('YEAR_MONTH')=='202411').select('EMAIL')
new_customers_df.show()

------------------------
|"EMAIL"               |
------------------------
|jq7ho1e2K7@KDDoU.com  |
|TynfegHXC5@EK8gf.com  |
|LaU10PhN7X@OCtzz.com  |
|9Wphdu46yz@kmSyh.com  |
|8ktBu2ygwj@U1PcJ.com  |
|QAUnHdGliO@vsOQl.com  |
|9Lnm15ODaM@fO9Ss.com  |
|JVCgRooPMw@sV6cz.com  |
|PZSjb31f9L@Kp66W.com  |
|PW77M5pEsO@epc5D.com  |
------------------------



Connect to the Feature Store

In [11]:
# Connect to Feature Store
fs = FeatureStore(
    session=session, 
    database=db_name, 
    name=fs_schema_name, 
    default_warehouse=wh_name,
    creation_mode=CreationMode.FAIL_IF_NOT_EXIST,
)

Get the feature views that has the features we want to use

In [13]:
cust_fv = fs.get_feature_view(name="CUSTOMER_GENERAL_DATA_FEATURES",
                                version="V1")
behavior_fv = fs.get_feature_view(name="CUSTOMER_BEHAVIOR_DATA_FEATURES",
                                version="V1")

Retrieve features using the Spine DataFrame (can take up to a minute until values appear)  
Assumption is you only have the unique-id EMAIL and need to retrieve the features to score using the model

In [14]:
new_customers_features = fs.retrieve_feature_values(new_customers_df, features=[cust_fv, behavior_fv])
new_customers_features.show()

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"EMAIL"               |"GENDER"  |"MEMBERSHIP_STATUS"  |"MEMBERSHIP_LENGTH_DAYS"  |"AVG_SESSION_LENGTH_MIN"  |"AVG_TIME_ON_APP_MIN"  |"AVG_TIME_ON_WEBSITE_MIN"  |"APP_PRIMARY"  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|jq7ho1e2K7@KDDoU.com  |MALE      |BRONZE               |204                       |10.6627                   |2.6627                 |8.6627                     |0              |
|TynfegHXC5@EK8gf.com  |MALE      |DIAMOND              |193                       |NULL                      |9.9401                 |9.9401                     |0              |
|LaU10PhN7X@OCtzz.com  |FEMALE    |BASIC                |255                       |NULL            

Connect to the Model Registry

In [15]:
ml_reg = Registry(session=session, database_name=db_name, schema_name=mr_schema_name, options={"enable_monitoring": True})
ml_reg.show_models()

Unnamed: 0,created_on,name,model_type,database_name,schema_name,comment,owner,default_version_name,versions,aliases
0,2024-09-12 04:25:51.307000-07:00,CUSTOMER_LTV_MODEL,USER_MODEL,SNOWPARK_DEMO_DB,SIMPLE_MR_SCHEMA,,SYSADMIN,MY_FIRST_MODEL_VERSION,"[""MY_FIRST_MODEL_VERSION""]","{""DEFAULT"":""MY_FIRST_MODEL_VERSION"",""FIRST"":""M..."


Get a reference to the default version of the model

In [None]:
registered_model = ml_reg.get_model("CUSTOMER_LTV_MODEL").default

Check which functions we can use and the inputs/outputs for them

In [17]:
registered_model.show_functions()

[{'name': 'PREDICT',
  'target_method': 'predict',
  'target_method_function_type': 'FUNCTION',
  'signature': ModelSignature(
                      inputs=[
                          FeatureSpec(dtype=DataType.STRING, name='GENDER'),
  		FeatureSpec(dtype=DataType.STRING, name='MEMBERSHIP_STATUS'),
  		FeatureSpec(dtype=DataType.INT16, name='MEMBERSHIP_LENGTH_DAYS'),
  		FeatureSpec(dtype=DataType.DOUBLE, name='AVG_SESSION_LENGTH_MIN'),
  		FeatureSpec(dtype=DataType.DOUBLE, name='AVG_TIME_ON_APP_MIN'),
  		FeatureSpec(dtype=DataType.DOUBLE, name='AVG_TIME_ON_WEBSITE_MIN'),
  		FeatureSpec(dtype=DataType.INT8, name='APP_PRIMARY')
                      ],
                      outputs=[
                          FeatureSpec(dtype=DataType.INT8, name='GENDER_OHE_FEMALE'),
  		FeatureSpec(dtype=DataType.INT8, name='GENDER_OHE_MALE'),
  		FeatureSpec(dtype=DataType.DOUBLE, name='MEMBERSHIP_STATUS_OE'),
  		FeatureSpec(dtype=DataType.INT16, name='MEMBERSHIP_LENGTH_DAYS'),
  		FeatureSpec(d

Create predictions from registered model given the retrieved features, we are saving them as a table in Snowflake

In [None]:
output_stream = session.file.get_stream(f"@{stage_name}/pre_processing.joblib")
pipline_preprocessing = joblib.load(output_stream)

Apply the pre processing pipline of the data and get predictions

In [18]:
new_customers_features_pre = pipline_preprocessing.transform(new_customers_features)
new_predictions = registered_model.run(new_customers_features_pre, function_name='predict')

new_predictions.show()

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|"EMAIL"               |"GENDER"  |"MEMBERSHIP_STATUS"  |"AVG_SESSION_LENGTH_MIN"  |"AVG_TIME_ON_APP_MIN"  |"AVG_TIME_ON_WEBSITE_MIN"  |"GENDER_OHE_FEMALE"  |"GENDER_OHE_MALE"  |"MEMBERSHIP_STATUS_OE"  |"MEMBERSHIP_LENGTH_DAYS"  |"AVG_SESSION_LENGTH_MIN_IMP"  |"AVG_TIME_ON_APP_MIN_IMP"  |"AVG_TIME_ON_WEBSITE_MIN_IMP"  |"APP_PRIMARY"  |"LIFE_TIME_VALUE_PREDICTION"  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Save the predictions into a table so we can use them with the model monitor, since we do not capture the real LTV in this demo we will fake it. 

In [None]:
new_predictions = new_predictions.with_column("PREDICTION_TIMESTAMP"
                                                , snow_funcs.to_timestamp(snow_funcs.lit("2024-12-01"), snow_funcs.lit("YYYY-MM-DD")))
new_predictions = new_predictions.with_column_renamed('"output_feature_0"' 
                                                      , "LIFE_TIME_VALUE_PREDICTION")
new_predictions = new_predictions.with_column("LIFE_TIME_VALUE" 
                                              , snow_funcs.col("LIFE_TIME_VALUE_PREDICTION") * snow_funcs.uniform(snow_funcs.lit(0.98), snow_funcs.lit(1.05), snow_funcs.random()))

new_predictions.write.save_as_table(f"{db_name}.{schema_name}.MY_PREDICTIONS", mode="overwrite")


Check the table using SQL

In [None]:
select * from SNOWPARK_DEMO_DB.SIMPLE_ML_SCHEMA.MY_PREDICTIONS;

Create a model monitor, we need to provide two configurations. One for the monitor ie name, wh etc and one for the data to be used for it.

In [None]:
-- Using SQL until the Python way works
CREATE MODEL MONITOR SNOWPARK_DEMO_DB.SIMPLE_MR_SCHEMA.CUSTOMER_LTV_MONITOR
                WITH
                    MODEL=SNOWPARK_DEMO_DB.SIMPLE_MR_SCHEMA.CUSTOMER_LTV_MODEL
                    VERSION='{{registered_model.version_name}}'
                    FUNCTION='predict'
                    WAREHOUSE=SIMPLE_ML_WH
                    SOURCE=SNOWPARK_DEMO_DB.SIMPLE_ML_SCHEMA.MY_PREDICTIONS
                    ID_COLUMNS=('EMAIL')
                    PREDICTION_SCORE_COLUMNS=('LIFE_TIME_VALUE_PREDICTION')
                    PREDICTION_CLASS_COLUMNS=()
                    ACTUAL_SCORE_COLUMNS=('LIFE_TIME_VALUE')
                    ACTUAL_CLASS_COLUMNS=()
                    TIMESTAMP_COLUMN='PREDICTION_TIMESTAMP'
                    REFRESH_INTERVAL='1 minute'
                    AGGREGATION_WINDOW='1 day'
                    BASELINE=SNOWPARK_DEMO_DB.SIMPLE_ML_SCHEMA.CUSTOMER_LTV_BASELINE;

In [None]:
# Do not work with version 1.7.2, use the SQL above
monitor_config = model_monitor_config.ModelMonitorConfig(
            model_version=registered_model,
            model_function_name="predict",
            background_compute_warehouse_name=wh_name,
            refresh_interval = "1 minute" # For demo purpose, in real life this should reflect how often we get the acctuals
        )

table_config = model_monitor_config.ModelMonitorSourceConfig(
            source=f"{db_name}.{schema_name}.MY_PREDICTIONS",
            id_columns=["EMAIL"],
            timestamp_column="PREDICTION_TIMESTAMP",
            prediction_score_columns=["LIFE_TIME_VALUE_PREDICTION"],
            actual_score_columns=["LIFE_TIME_VALUE"],
            baseline=f"{db_name}.{schema_name}.CUSTOMER_LTV_BASELINE"
        )

my_model_monitor = ml_reg.add_monitor(name="customer_ltv_monitor"
                    , source_config = table_config
                    , model_monitor_config = monitor_config
                    )


Lets generate some more customers and run predictions on top of those as well to fake another date for the predictions so we can see something in our model monitoring dashboard.

In [None]:
generate_demo_data(session, num_customers=100, month=12, start_year=2024, end_year=2024)

**Make sure you wait at least 1 minute before executing the next cell so the features are populated...**

In [None]:
# Retrieve new customers
newer_customers_df = session.table(f'{db_name}.{schema_name}.CUSTOMER_LIFE_TIME_VALUE').filter(snow_funcs.col('YEAR_MONTH')=='202412').select('EMAIL')
newer_customers_features = fs.retrieve_feature_values(newer_customers_df, features=[cust_fv, behavior_fv])

newer_customers_features_pre = pipline_preprocessing.transform(newer_customers_features)
newer_predictions = registered_model.run(newer_customers_features_pre, function_name='predict')

newer_predictions.show()

Append the new predictions to the MY_PREDICTIONS table that we use for our model monitor, and we will fake the actual column here as well.

**You have to wait for at least 1 minute before this data will be used in the monitor**

In [None]:
newer_predictions = newer_predictions.with_column("PREDICTION_TIMESTAMP", snow_funcs.to_timestamp(snow_funcs.lit("2025-01-01"), snow_funcs.lit("YYYY-MM-DD")))
newer_predictions = newer_predictions.with_column_renamed('"output_feature_0"' , "LIFE_TIME_VALUE_PREDICTION")
newer_predictions = newer_predictions.with_column("LIFE_TIME_VALUE" 
                                              , snow_funcs.col("LIFE_TIME_VALUE_PREDICTION") * snow_funcs.uniform(snow_funcs.lit(0.75), snow_funcs.lit(1.11), snow_funcs.random()))

newer_predictions.write.save_as_table(f"{db_name}.{schema_name}.MY_PREDICTIONS", mode="append")


You can now go out to Snowsight and navigate to the model to see the monitored values, be sure to select a date range that capture the dates used to generate data