# Feature Engineering with Snowflake Feature Store

This notebook derives features from raw continuous customer data using **Snowflake Feature Store**.

The Feature Store provides:
- Centralized feature definitions and management
- Automatic feature refresh on schedule
- Point-in-time correct features for training
- Feature lineage and discovery
- Consistent features for training and inference

**Prerequisites**: Run `generate_continuous_data.ipynb` first to create raw data tables.

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from snowflake.snowpark.context import get_active_session
from snowflake.snowpark import functions as F, Window
from snowflake.ml.feature_store import (
    FeatureStore,
    FeatureView,
    Entity,
    CreationMode
)

session = get_active_session()
print(f"Snowpark version: {session.get_current_version()}")

## Configuration

In [None]:
DATABASE = 'ML_DEMO'
SCHEMA = 'PUBLIC'
FEATURE_STORE_NAME = 'CLV_FEATURE_STORE'
WAREHOUSE = 'ML_DEMO_WH'

session.use_database(DATABASE)
session.use_schema(SCHEMA)
session.use_warehouse(WAREHOUSE)

OBSERVATION_DATE = datetime(2024, 6, 30)

print(f"Database: {DATABASE}")
print(f"Schema: {SCHEMA}")
print(f"Feature Store: {FEATURE_STORE_NAME}")
print(f"Warehouse: {WAREHOUSE}")
print(f"Observation Date: {OBSERVATION_DATE}")

## Create or Connect to Feature Store

A feature store in Snowflake is a schema that contains feature views (backed by dynamic tables or views).

In [None]:
fs = FeatureStore(
    session=session,
    database=DATABASE,
    name=FEATURE_STORE_NAME,
    default_warehouse=WAREHOUSE,
    creation_mode=CreationMode.CREATE_IF_NOT_EXIST
)

print(f"✓ Feature Store ready: {DATABASE}.{FEATURE_STORE_NAME}")

## Register Entity

Entities organize features by subject. Here we create a CUSTOMER entity with CUSTOMER_ID as the join key.

In [None]:
try:
    customer_entity = fs.get_entity("CUSTOMER")
    print("✓ CUSTOMER entity already exists")
except:
    customer_entity = Entity(
        name="CUSTOMER",
        join_keys=["CUSTOMER_ID"],
        desc="Customer entity for CLV prediction"
    )
    fs.register_entity(customer_entity)
    print("✓ Created CUSTOMER entity")

print(f"  Join keys: {customer_entity.join_keys}")

## Feature View 1: RFM Features

**RFM** (Recency, Frequency, Monetary) features capture customer purchase behavior:
- **Recency**: Days since last purchase
- **Frequency**: Total number of purchases  
- **Monetary**: Total and average spending

In [None]:
transactions_df = session.table(f"{DATABASE}.{SCHEMA}.CONTINUOUS_TRANSACTIONS")

rfm_df = transactions_df.group_by("CUSTOMER_ID").agg([
    F.datediff(
        "day",
        F.max("TRANSACTION_DATE"),
        F.lit(OBSERVATION_DATE)
    ).alias("RECENCY_DAYS"),
    F.count("TRANSACTION_ID").alias("FREQUENCY"),
    F.sum("AMOUNT").alias("MONETARY_TOTAL"),
    F.avg("AMOUNT").alias("MONETARY_AVG"),
    F.min("TRANSACTION_DATE").alias("FIRST_PURCHASE_DATE"),
    F.max("TRANSACTION_DATE").alias("LAST_PURCHASE_DATE")
])

rfm_df = rfm_df.with_column(
    "CUSTOMER_TENURE_DAYS",
    F.datediff("day", F.col("FIRST_PURCHASE_DATE"), F.lit(OBSERVATION_DATE))
)

print("RFM feature DataFrame:")
rfm_df.show(5)

In [None]:
rfm_fv = FeatureView(
    name="RFM_FEATURES",
    entities=[customer_entity],
    feature_df=rfm_df,
    refresh_freq="1 day",
    desc="Recency, Frequency, Monetary features from transaction history"
).attach_feature_desc({
    "RECENCY_DAYS": "Days since last purchase (lower = more recent)",
    "FREQUENCY": "Total number of purchases (count of transactions)",
    "MONETARY_TOTAL": "Total amount spent across all transactions",
    "MONETARY_AVG": "Average transaction amount",
    "CUSTOMER_TENURE_DAYS": "Days since first purchase (customer age)"
})

rfm_fv_registered = fs.register_feature_view(
    feature_view=rfm_fv,
    version="1.0",
    block=True
)

print("✓ Registered RFM_FEATURES feature view")

## Feature View 2: Purchase Pattern Features

Advanced behavioral features:
- Inter-purchase time patterns
- Product category diversity
- Recent activity windows (30d, 90d)
- Spending trends

In [None]:
customers_df = session.table(f"{DATABASE}.{SCHEMA}.CONTINUOUS_CUSTOMERS_PROFILE")
transactions_df = session.table(f"{DATABASE}.{SCHEMA}.CONTINUOUS_TRANSACTIONS")

purchase_patterns_df = transactions_df.group_by("CUSTOMER_ID").agg([
    F.count_distinct("PRODUCT_CATEGORY").alias("UNIQUE_CATEGORIES_PURCHASED"),
    F.sum("QUANTITY").alias("TOTAL_ITEMS_PURCHASED"),
    F.sum(
        F.when(
            F.col("TRANSACTION_DATE") >= F.dateadd("day", F.lit(-30), F.lit(OBSERVATION_DATE)),
            F.col("AMOUNT")
        ).otherwise(F.lit(0))
    ).alias("RECENT_30D_AMOUNT"),
    F.sum(
        F.when(
            F.col("TRANSACTION_DATE") >= F.dateadd("day", F.lit(-30), F.lit(OBSERVATION_DATE)),
            F.lit(1)
        ).otherwise(F.lit(0))
    ).alias("RECENT_30D_COUNT"),
    F.sum(
        F.when(
            F.col("TRANSACTION_DATE") >= F.dateadd("day", F.lit(-90), F.lit(OBSERVATION_DATE)),
            F.col("AMOUNT")
        ).otherwise(F.lit(0))
    ).alias("RECENT_90D_AMOUNT"),
    F.sum(
        F.when(
            F.col("TRANSACTION_DATE") >= F.dateadd("day", F.lit(-90), F.lit(OBSERVATION_DATE)),
            F.lit(1)
        ).otherwise(F.lit(0))
    ).alias("RECENT_90D_COUNT")
])

print("Purchase patterns feature DataFrame:")
purchase_patterns_df.show(5)

In [None]:
purchase_patterns_fv = FeatureView(
    name="PURCHASE_PATTERNS",
    entities=[customer_entity],
    feature_df=purchase_patterns_df,
    refresh_freq="1 day",
    desc="Purchase behavior patterns and trends"
).attach_feature_desc({
    "UNIQUE_CATEGORIES_PURCHASED": "Number of distinct product categories purchased",
    "TOTAL_ITEMS_PURCHASED": "Total quantity of items purchased",
    "RECENT_30D_AMOUNT": "Total spending in last 30 days",
    "RECENT_30D_COUNT": "Number of transactions in last 30 days",
    "RECENT_90D_AMOUNT": "Total spending in last 90 days",
    "RECENT_90D_COUNT": "Number of transactions in last 90 days"
})

purchase_patterns_fv_registered = fs.register_feature_view(
    feature_view=purchase_patterns_fv,
    version="1.0",
    block=True
)

print("✓ Registered PURCHASE_PATTERNS feature view")

## Feature View 3: Engagement Features

Non-purchase engagement signals:
- Website visits
- Email interactions
- Support tickets
- Product views and cart adds

In [None]:
interactions_df = session.table(f"{DATABASE}.{SCHEMA}.CONTINUOUS_INTERACTIONS")

engagement_df = interactions_df.group_by("CUSTOMER_ID").agg([
    F.count("INTERACTION_ID").alias("TOTAL_INTERACTIONS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("website_visit"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("WEBSITE_VISITS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("email_open"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("EMAIL_OPENS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("email_click"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("EMAIL_CLICKS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("support_ticket"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("SUPPORT_TICKETS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("product_view"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("PRODUCT_VIEWS"),
    F.sum(
        F.when(F.col("EVENT_TYPE") == F.lit("cart_add"), F.lit(1))
        .otherwise(F.lit(0))
    ).alias("CART_ADDS")
])

engagement_df = engagement_df.with_column(
    "EMAIL_ENGAGEMENT_RATE",
    F.div0(F.col("EMAIL_CLICKS"), F.col("EMAIL_OPENS"))
)

print("Engagement feature DataFrame:")
engagement_df.show(5)

In [None]:
engagement_fv = FeatureView(
    name="ENGAGEMENT_FEATURES",
    entities=[customer_entity],
    feature_df=engagement_df,
    refresh_freq="1 day",
    desc="Customer engagement and interaction features"
).attach_feature_desc({
    "TOTAL_INTERACTIONS": "Total count of all customer interactions",
    "WEBSITE_VISITS": "Number of website visits",
    "EMAIL_OPENS": "Number of emails opened",
    "EMAIL_CLICKS": "Number of email links clicked",
    "SUPPORT_TICKETS": "Number of support tickets created",
    "PRODUCT_VIEWS": "Number of product views",
    "CART_ADDS": "Number of items added to cart",
    "EMAIL_ENGAGEMENT_RATE": "Email click-through rate (clicks / opens)"
})

engagement_fv_registered = fs.register_feature_view(
    feature_view=engagement_fv,
    version="1.0",
    block=True
)

print("✓ Registered ENGAGEMENT_FEATURES feature view")

## Generate Training Dataset

Create a training dataset that combines:
- Customer profiles (spine)
- All registered features from Feature Store
- Target variable (FUTURE_12M_LTV)

In [None]:
spine_df = session.table(f"{DATABASE}.{SCHEMA}.CONTINUOUS_CUSTOMERS_PROFILE")

print(f"Spine DataFrame: {spine_df.count()} customers")
spine_df.show(5)

In [None]:
training_df = fs.generate_training_set(
    spine_df=spine_df,
    features=[
        rfm_fv_registered,
        purchase_patterns_fv_registered,
        engagement_fv_registered
    ],
    save_as="CONTINUOUS_TRAINING_DATA"
)

print(f"\n✓ Training dataset created: {training_df.count()} rows")
print(f"  Columns: {len(training_df.columns)}")
print(f"\nColumn names:")
for col in training_df.columns:
    print(f"  - {col}")

## Add Target Variable

Calculate FUTURE_12M_LTV based on features and add to training table.

In [None]:
training_with_target = training_df.with_column(
    "FUTURE_12M_LTV",
    (
        F.col("MONETARY_TOTAL") * F.lit(0.6) *
        F.greatest(F.lit(0.5), (F.lit(1.5) - F.col("RECENCY_DAYS") / F.lit(180))) *
        F.least(F.lit(2.0), (F.lit(1) + F.col("FREQUENCY") / F.lit(20))) *
        (F.lit(1) + F.col("TOTAL_INTERACTIONS") / F.lit(500)) *
        F.uniform(F.lit(0.7), F.lit(1.3), F.random())
    )
)

training_with_target.write.mode("overwrite").save_as_table("CONTINUOUS_TRAINING_DATA_WITH_TARGET")

print("✓ Saved training data with target to CONTINUOUS_TRAINING_DATA_WITH_TARGET")

stats_df = session.table("CONTINUOUS_TRAINING_DATA_WITH_TARGET").select(
    F.avg("FUTURE_12M_LTV").alias("AVG_LTV"),
    F.median("FUTURE_12M_LTV").alias("MEDIAN_LTV"),
    F.min("FUTURE_12M_LTV").alias("MIN_LTV"),
    F.max("FUTURE_12M_LTV").alias("MAX_LTV")
).collect()[0]

print(f"\nTarget variable statistics:")
print(f"  Average: ${stats_df['AVG_LTV']:.2f}")
print(f"  Median:  ${stats_df['MEDIAN_LTV']:.2f}")
print(f"  Min:     ${stats_df['MIN_LTV']:.2f}")
print(f"  Max:     ${stats_df['MAX_LTV']:.2f}")

## List All Feature Views

In [None]:
print("All registered feature views:")
fs.list_feature_views(entity_name="CUSTOMER").show()

## Summary

This notebook demonstrated:

1. **Feature Store Setup**: Created feature store and registered CUSTOMER entity
2. **Feature Views**: Created 3 managed feature views:
   - RFM_FEATURES (recency, frequency, monetary)
   - PURCHASE_PATTERNS (behavioral patterns)
   - ENGAGEMENT_FEATURES (non-purchase interactions)
3. **Training Dataset**: Generated training data combining all features
4. **Benefits**:
   - Centralized feature definitions
   - Automatic refresh (1 day schedule)
   - Reusable for training and inference
   - Feature lineage and discovery

**Next Steps**:
- Use `CONTINUOUS_TRAINING_DATA_WITH_TARGET` table for model training
- Features automatically refresh daily from raw data
- Use same feature views for inference to ensure consistency