Version: 0.0.2  Updated date: 07/05/2024
Conda Environment : py-snowpark_df_ml_fs-1.15.0_v1

# Getting Started with Snowflake Feature Store
We will use the Use-Case to show how Snowflake Feature Store (and Model Registry) can be used to maintain & store features, retrieve them for training and perform micro-batch inference.

In the development (TRAINING) enviroment we will 
- create FeatureViews in the Feature Store that maintain the required customer-behaviour features.
- use these Features to train a model, and save the model in the Snowflake model-registry.
- plot the clusters for the trained model to visually verify. 

In the production (SERVING) environment we will
- re-create the FeatureViews on production data
- generate an Inference FeatureView that uses the saved model to perform incremental inference

# Feature Engineering & Model Training

In [9]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


#### Notebook Packages

In [10]:
# Python
from time import perf_counter

# ML
import pandas as pd
import xgboost as xgb
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

# SNOWFLAKE
# Snowpark
from snowflake.ml.data.data_connector import DataConnector
from snowflake.ml.registry import Registry as ModelRegistry
from snowflake.snowpark import Session, Row
from snowflake.ml.dataset import Dataset
from snowflake.ml.dataset import load_dataset
from snowflake.ml.experiment import ExperimentTracking
from snowflake.ml.experiment.callback.xgboost import SnowflakeXgboostCallback
from snowflake.ml.model.model_signature import infer_signature
from snowflake.snowpark.context import get_active_session

# Custom
from useful_fns import create_SF_Session

### Setup Snowflake connection and database parameters

In [11]:
# Schemas
tpcxai_schema = 'SERVING'

In [12]:
fs_qs_role, tpcxai_database, tpcxai_serving_schema, session, warehouse_env = create_SF_Session(tpcxai_schema, role="ACCOUNTADMIN")

You might have more than one threads sharing the Session object trying to update sql_simplifier_enabled. Updating this while other tasks are running can potentially cause unexpected behavior. Please update the session configuration before starting the threads.



Connection Established with the following parameters:
User                        : JARCHEN
Role                        : "ACCOUNTADMIN"
Database                    : "TPCXAI_SF0001_QUICKSTART_INC"
Schema                      : "SERVING"
Warehouse                   : "TPCXAI_SF0001_QUICKSTART_WH"
Snowflake version           : 9.37.1
Snowpark for Python version : 1.38.0 



In [13]:
# Create compute pool
def create_compute_pool(name: str, instance_family: str, min_nodes: int = 1, max_nodes: int = 10) -> list[Row]:
    query = f"""
        CREATE COMPUTE POOL IF NOT EXISTS {name}
            MIN_NODES = {min_nodes}
            MAX_NODES = {max_nodes}
            INSTANCE_FAMILY = {instance_family}
    """
    return session.sql(query).collect()

compute_pool = "DEMO_POOL_CPU"
create_compute_pool(compute_pool, "CPU_X64_S")

[Row(status='DEMO_POOL_CPU already exists, statement succeeded.')]

In [14]:
from snowflake.ml.jobs import remote

@remote("DEMO_POOL_CPU", stage_name="Blah", session=session)
def simple_task(n: int) -> dict:
    """Simple task that runs remotely"""
    import datetime
    result = n * n
    return {
        "input": n,
        "result": result,
        "timestamp": datetime.datetime.now().isoformat(),
        "message": f"Computed {n}^2 = {result} remotely"
    }
train_job = simple_task(
    n=10
)

In [15]:
train_job.wait()
train_job.show_logs()




In [16]:
train_job.result()

{'input': 10,
 'result': 100,
 'timestamp': '2025-12-02T09:16:05.774193',
 'message': 'Computed 10^2 = 100 remotely'}

In [17]:
from pathlib import Path
from snowflake.ml.jobs import remote, submit_file
import time

def test_simple_file_job():
    """Test simple file-based job submission without external dependencies"""
    
    print("=== Testing Simple File Job (No External Dependencies) ===\n")
    
    # Create a very simple Python script using only standard library
    simple_script = '''#!/usr/bin/env python3
import sys
import datetime
import math

def main():
    """Simple computation script using only standard library"""
    print("Starting simple computation job...")
    
    # Get arguments
    number = 42
    for i, arg in enumerate(sys.argv):
        if arg == "--number" and i + 1 < len(sys.argv):
            number = int(sys.argv[i + 1])
    
    # Perform some calculations
    print(f"Processing number: {number}")
    
    results = {
        "input": number,
        "square": number ** 2,
        "cube": number ** 3,
        "square_root": math.sqrt(number) if number >= 0 else "undefined",
        "factorial": math.factorial(number) if number <= 20 and number >= 0 else "too large or negative",
        "timestamp": datetime.datetime.now().isoformat()
    }
    
    print("Results:")
    for key, value in results.items():
        print(f"  {key}: {value}")
    
    print("Job completed successfully!")
    return results

if __name__ == "__main__":
    main()
'''
    
    script_path = "simple_job.py"
    with open(script_path, 'w') as f:
        f.write(simple_script)
    
    print(f"Created simple script: {script_path}")
    print("Submitting simple job (no external dependencies)...")
    
    # Submit without any external dependencies
    job = submit_file(
        str(script_path),
        "SYSTEM_COMPUTE_POOL_CPU",
        stage_name="ML_STAGE",
        session=session,
        args=["--number", "15"]
    )
    
    print(f"Job ID: {job.id}")
    print(f"Initial status: {job.status}")
    
    # Wait for completion
    while job.status in ["PENDING", "RUNNING"]:
        print(f"Status: {job.status}")
        time.sleep(2)
    
    if job.status == "DONE":
        print("✓ Simple job completed successfully!")
        logs = job.get_logs()
        if logs:
            print("Job output:")
            print(logs)
    else:
        print(f"✗ Job failed with status: {job.status}")
        logs = job.get_logs()
        if logs:
            print(f"Error logs: {logs}")
    
    print("\n=== Simple Job Test Complete ===")

In [None]:
test_simple_file_job()

=== Testing Simple File Job (No External Dependencies) ===

Created simple script: simple_job.py
Submitting simple job (no external dependencies)...
Job ID: TPCXAI_SF0001_QUICKSTART_INC.SERVING.SIMPLE_JOB_5OS61GZAK6TI
Initial status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: PENDING
Status: RUNNING
Status: RUNNING
Status: RUNNING


## CLEAN UP

In [None]:
# session.close()

In [None]:
from datetime import datetime
from zoneinfo import ZoneInfo
formatted_time = datetime.now(ZoneInfo("Australia/Melbourne")).strftime("%A, %B %d, %Y %I:%M:%S %p %Z")

print(f"The last run time in Melbourne is: {formatted_time}")