# Varo ML Models - Model Registry

This notebook trains ML models for the Varo Intelligence Agent:
- **Transaction Fraud Detection** - Classify transactions as fraud or legitimate
- **Cash Advance Eligibility** - Predict advance repayment success
- **Customer Lifetime Value** - Predict customer LTV

All models are registered to Snowflake Model Registry and can be added as tools to the Intelligence Agent.

## Prerequisites

**Required Packages** (configured automatically):
- `snowflake-ml-python`
- `scikit-learn`

**Database Context:**
- **Database:** VARO_INTELLIGENCE  
- **Schema:** ANALYTICS  
- **Warehouse:** VARO_FEATURE_WH

**Note:** This notebook uses Snowflake Model Registry. Ensure you have appropriate permissions to create and register models.


## Import Required Packages


In [None]:
# Import Python packages
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Import Snowpark
from snowflake.snowpark.context import get_active_session
import snowflake.snowpark.functions as F
import snowflake.snowpark.types as T

# Import Snowpark ML
from snowflake.ml.modeling.preprocessing import StandardScaler, OneHotEncoder
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.modeling.linear_model import LogisticRegression
from snowflake.ml.modeling.ensemble import RandomForestClassifier, GradientBoostingRegressor
from snowflake.ml.modeling.metrics import accuracy_score, mean_absolute_error, mean_squared_error
from snowflake.ml.registry import Registry

print("✅ Packages imported successfully")


## Connect to Snowflake

Get active session and set context to Varo database.


In [None]:
# Get active Snowflake session
session = get_active_session()

# Set context
session.use_database('VARO_INTELLIGENCE')
session.use_schema('ANALYTICS')
session.use_warehouse('VARO_FEATURE_WH')

print(f"✅ Connected - Role: {session.get_current_role()}")
print(f"   Warehouse: {session.get_current_warehouse()}")
print(f"   Database.Schema: {session.get_fully_qualified_current_schema()}")


---
# MODEL 1: Transaction Fraud Detection

Classify transactions as fraudulent or legitimate using customer and transaction features.


### Prepare Fraud Training Data


In [None]:
# Get transaction data with customer features for fraud detection
fraud_df = session.sql("""
SELECT
    t.transaction_id,
    t.customer_id,
    t.amount::FLOAT AS amount,
    t.merchant_category,
    t.transaction_type,
    t.is_international::BOOLEAN AS is_international,
    c.credit_score::FLOAT AS credit_score,
    c.risk_tier,
    a.current_balance::FLOAT AS account_balance,
    -- Target: Is fraud (based on fraud_score threshold)
    (t.fraud_score > 0.7)::BOOLEAN AS is_fraud
FROM RAW.TRANSACTIONS t
LEFT JOIN RAW.CUSTOMERS c ON t.customer_id = c.customer_id
LEFT JOIN RAW.ACCOUNTS a ON t.account_id = a.account_id
WHERE t.transaction_date >= DATEADD('month', -6, CURRENT_DATE())
  AND t.amount > 10
LIMIT 10000
""")

print(f"Fraud detection data: {fraud_df.count()} transactions")
fraud_df.show(5)


### Train Fraud Classification Model


In [None]:
# Train/test split (80/20)
train_fraud, test_fraud = fraud_df.random_split([0.8, 0.2], seed=42)

# Drop ID columns
train_fraud = train_fraud.drop("TRANSACTION_ID", "CUSTOMER_ID")
test_fraud = test_fraud.drop("TRANSACTION_ID", "CUSTOMER_ID")

# Create pipeline with preprocessing and classification
fraud_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["MERCHANT_CATEGORY", "TRANSACTION_TYPE", "RISK_TIER"],
        output_cols=["MERCHANT_CATEGORY_ENC", "TRANSACTION_TYPE_ENC", "RISK_TIER_ENC"],
        drop_input_cols=True,
        handle_unknown="ignore"
    )),
    ("Scaler", StandardScaler(
        input_cols=["AMOUNT", "CREDIT_SCORE", "ACCOUNT_BALANCE"],
        output_cols=["AMOUNT_SCALED", "CREDIT_SCORE_SCALED", "ACCOUNT_BALANCE_SCALED"]
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["IS_FRAUD"],
        output_cols=["FRAUD_PREDICTION"],
        n_estimators=100,
        max_depth=10
    ))
])

# Train model
fraud_pipeline.fit(train_fraud)
print("✅ Fraud detection model trained")


### Evaluate and Register Fraud Model


In [None]:
# Make predictions on test set
fraud_predictions = fraud_pipeline.predict(test_fraud)

# Calculate metrics
fraud_accuracy = accuracy_score(df=fraud_predictions, y_true_col_names="IS_FRAUD", y_pred_col_names="FRAUD_PREDICTION")
fraud_metrics = {"accuracy": round(fraud_accuracy, 4)}
print(f"Fraud model metrics: {fraud_metrics}")

# Register model
reg = Registry(session)
reg.log_model(
    model=fraud_pipeline,
    model_name="FRAUD_DETECTION_MODEL",
    version_name="V1",
    comment="Predicts transaction fraud using Random Forest based on transaction and customer features",
    metrics=fraud_metrics
)

print("✅ Fraud model registered to Model Registry as FRAUD_DETECTION_MODEL")


---
# MODEL 2: Cash Advance Repayment Success

Predict whether cash advances will be repaid successfully.


### Prepare Advance Training Data


In [None]:
# Get cash advance data with customer features
advance_df = session.sql("""
SELECT
    ca.advance_id,
    ca.customer_id,
    ca.advance_amount::FLOAT AS advance_amount,
    ca.fee_amount::FLOAT AS fee_amount,
    ca.eligibility_score::FLOAT AS eligibility_score,
    c.credit_score::FLOAT AS credit_score,
    c.risk_tier,
    c.employment_status,
    -- Count direct deposits
    COUNT(DISTINCT dd.deposit_id)::FLOAT AS deposit_count,
    -- Average deposit amount
    AVG(dd.amount)::FLOAT AS avg_deposit_amount,
    -- Target: Was repaid successfully
    (ca.advance_status = 'REPAID')::BOOLEAN AS was_repaid
FROM RAW.CASH_ADVANCES ca
INNER JOIN RAW.CUSTOMERS c ON ca.customer_id = c.customer_id
INNER JOIN RAW.DIRECT_DEPOSITS dd ON ca.customer_id = dd.customer_id
WHERE ca.advance_date >= DATEADD('month', -12, CURRENT_DATE())
  AND ca.eligibility_score IS NOT NULL
  AND c.credit_score IS NOT NULL
  AND c.risk_tier IS NOT NULL
  AND c.employment_status IS NOT NULL
  AND dd.amount IS NOT NULL
GROUP BY ca.advance_id, ca.customer_id, ca.advance_amount, ca.fee_amount, ca.eligibility_score,
         c.credit_score, c.risk_tier, c.employment_status, ca.advance_status
HAVING AVG(dd.amount) IS NOT NULL
  AND COUNT(DISTINCT dd.deposit_id) > 0
LIMIT 5000
""")

print(f"Advance data: {advance_df.count()} advances")
advance_df.show(5)


### Train Advance Repayment Model


In [None]:
# Split data
train_advance, test_advance = advance_df.random_split([0.8, 0.2], seed=42)

# Drop ID columns
train_advance = train_advance.drop("ADVANCE_ID", "CUSTOMER_ID")
test_advance = test_advance.drop("ADVANCE_ID", "CUSTOMER_ID")

# Create pipeline
advance_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["RISK_TIER", "EMPLOYMENT_STATUS"],
        output_cols=["RISK_TIER_ENC", "EMPLOYMENT_STATUS_ENC"],
        drop_input_cols=True,
        handle_unknown="ignore"
    )),
    ("Scaler", StandardScaler(
        input_cols=["ADVANCE_AMOUNT", "FEE_AMOUNT", "ELIGIBILITY_SCORE", "CREDIT_SCORE", "DEPOSIT_COUNT"],
        output_cols=["ADVANCE_AMOUNT_SCALED", "FEE_AMOUNT_SCALED", "ELIGIBILITY_SCORE_SCALED", "CREDIT_SCORE_SCALED", "DEPOSIT_COUNT_SCALED"]
    )),
    ("Classifier", LogisticRegression(
        label_cols=["WAS_REPAID"],
        output_cols=["REPAYMENT_PREDICTION"]
    ))
])

# Train
advance_pipeline.fit(train_advance)
print("✅ Advance repayment model trained")


### Evaluate and Register Advance Model


In [None]:
# Make predictions
advance_predictions = advance_pipeline.predict(test_advance)

# Calculate metrics
advance_accuracy = accuracy_score(df=advance_predictions, y_true_col_names="WAS_REPAID", y_pred_col_names="REPAYMENT_PREDICTION")
advance_metrics = {"accuracy": round(advance_accuracy, 4)}
print(f"Advance model metrics: {advance_metrics}")

# Register model
reg.log_model(
    model=advance_pipeline,
    model_name="ADVANCE_ELIGIBILITY_MODEL",
    version_name="V1",
    comment="Predicts cash advance repayment success using Logistic Regression based on customer creditworthiness and deposit patterns",
    metrics=advance_metrics
)

print("✅ Advance model registered to Model Registry as ADVANCE_ELIGIBILITY_MODEL")


---
# MODEL 3: Customer Lifetime Value Prediction

Predict customer lifetime value using engagement and behavior metrics.


### Prepare LTV Training Data


In [None]:
# Get customer LTV data with features
ltv_df = session.sql("""
SELECT
    c.customer_id,
    c.lifetime_value::FLOAT AS lifetime_value,
    DATEDIFF('month', c.acquisition_date, CURRENT_DATE())::FLOAT AS tenure_months,
    c.credit_score::FLOAT AS credit_score,
    c.risk_tier,
    c.acquisition_channel,
    -- Product count
    COUNT(DISTINCT a.account_id)::FLOAT AS product_count,
    -- Average account balance (handle NULL)
    COALESCE(AVG(a.current_balance), 0)::FLOAT AS avg_account_balance,
    -- Transaction count (last 90 days)
    COUNT(DISTINCT CASE WHEN t.transaction_date >= DATEADD('day', -90, CURRENT_DATE())
                   THEN t.transaction_id END)::FLOAT AS recent_transaction_count,
    -- Has direct deposit
    (COUNT(DISTINCT dd.deposit_id) > 0)::BOOLEAN AS has_direct_deposit
FROM RAW.CUSTOMERS c
LEFT JOIN RAW.ACCOUNTS a ON c.customer_id = a.customer_id
LEFT JOIN RAW.TRANSACTIONS t ON c.customer_id = t.customer_id
LEFT JOIN RAW.DIRECT_DEPOSITS dd ON c.customer_id = dd.customer_id
WHERE c.customer_status = 'ACTIVE'
  AND c.lifetime_value > 0
GROUP BY c.customer_id, c.lifetime_value, c.acquisition_date, c.credit_score, c.risk_tier, c.acquisition_channel
LIMIT 5000
""")

print(f"LTV data: {ltv_df.count()} customers")
ltv_df.show(5)


### Train LTV Regression Model


In [None]:
# Split data
train_ltv, test_ltv = ltv_df.random_split([0.8, 0.2], seed=42)

# Drop CUSTOMER_ID
train_ltv = train_ltv.drop("CUSTOMER_ID")
test_ltv = test_ltv.drop("CUSTOMER_ID")

# Create pipeline
ltv_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["RISK_TIER", "ACQUISITION_CHANNEL"],
        output_cols=["RISK_TIER_ENC", "ACQUISITION_CHANNEL_ENC"],
        drop_input_cols=True,
        handle_unknown="ignore"
    )),
    ("Scaler", StandardScaler(
        input_cols=["TENURE_MONTHS", "CREDIT_SCORE", "PRODUCT_COUNT", "AVG_ACCOUNT_BALANCE", "RECENT_TRANSACTION_COUNT"],
        output_cols=["TENURE_MONTHS_SCALED", "CREDIT_SCORE_SCALED", "PRODUCT_COUNT_SCALED", "AVG_ACCOUNT_BALANCE_SCALED", "RECENT_TRANSACTION_COUNT_SCALED"]
    )),
    ("Regressor", GradientBoostingRegressor(
        label_cols=["LIFETIME_VALUE"],
        output_cols=["PREDICTED_LTV"],
        n_estimators=100,
        max_depth=6
    ))
])

# Train
ltv_pipeline.fit(train_ltv)
print("✅ LTV prediction model trained")


### Evaluate and Register LTV Model


In [None]:
# Predict on test set
ltv_predictions = ltv_pipeline.predict(test_ltv)

# Calculate metrics
ltv_mae = mean_absolute_error(df=ltv_predictions, y_true_col_names="LIFETIME_VALUE", y_pred_col_names="PREDICTED_LTV")
ltv_rmse = mean_squared_error(df=ltv_predictions, y_true_col_names="LIFETIME_VALUE", y_pred_col_names="PREDICTED_LTV") ** 0.5
ltv_metrics = {"mae": round(ltv_mae, 2), "rmse": round(ltv_rmse, 2)}
print(f"LTV model metrics: {ltv_metrics}")

# Register model
reg.log_model(
    model=ltv_pipeline,
    model_name="CUSTOMER_LTV_MODEL",
    version_name="V1",
    comment="Predicts customer lifetime value using Gradient Boosting based on engagement and behavior metrics",
    metrics=ltv_metrics
)

print("✅ LTV model registered to Model Registry as CUSTOMER_LTV_MODEL")


---
# Verify Models in Registry


In [None]:
# Show all models in the registry
print("Models in registry:")
reg.show_models()

# Show versions for fraud model
print("\nFraud Detection Model versions:")
reg.get_model("FRAUD_DETECTION_MODEL").show_versions()

# Show versions for advance model  
print("\nAdvance Eligibility Model versions:")
reg.get_model("ADVANCE_ELIGIBILITY_MODEL").show_versions()

# Show versions for LTV model
print("\nCustomer LTV Model versions:")
reg.get_model("CUSTOMER_LTV_MODEL").show_versions()

print("\n✅ All models registered and ready to add to Intelligence Agent")


---
# Test Model Inference

Test calling each model to make predictions.


In [None]:
# Test fraud detection on sample transactions
fraud_model = reg.get_model("FRAUD_DETECTION_MODEL").default
sample_fraud = fraud_df.limit(5).drop("TRANSACTION_ID", "CUSTOMER_ID")
fraud_preds = fraud_model.run(sample_fraud, function_name="predict")
print("Fraud Detection predictions:")
fraud_preds.select("IS_FRAUD", "FRAUD_PREDICTION").show()

# Test advance repayment on sample advances
advance_model = reg.get_model("ADVANCE_ELIGIBILITY_MODEL").default
sample_advance = advance_df.limit(5).drop("ADVANCE_ID", "CUSTOMER_ID")
advance_preds = advance_model.run(sample_advance, function_name="predict")
print("\nAdvance Repayment predictions:")
advance_preds.select("WAS_REPAID", "REPAYMENT_PREDICTION").show()

# Test LTV prediction on sample customers
ltv_model = reg.get_model("CUSTOMER_LTV_MODEL").default
sample_ltv = ltv_df.limit(5).drop("CUSTOMER_ID")
ltv_preds = ltv_model.run(sample_ltv, function_name="predict")
print("\nCustomer LTV predictions:")
ltv_preds.select("LIFETIME_VALUE", "PREDICTED_LTV").show()

print("\n✅ All models tested successfully!")


---
# Next Steps

## Add Models to Intelligence Agent

**Using the SQL Script (Recommended)**
Run `sql/agent/10_create_intelligence_agent.sql` which automatically configures all ML model procedures.

**The Python procedures** in `sql/ml/09_create_model_functions.sql` will use these registered models.

## Example Questions for Agent

- "Is this $500 international transaction likely fraud?"
- "Check if customer CUST00001234 is eligible for a cash advance"
- "Predict the lifetime value for our newest customer cohort"
- "Which customers show high fraud risk patterns?"

The models will now be available as tools your agent can use!


# Varo ML Models with Snowpark and Feature Store

**Note**: This notebook is designed to run in Snowflake Notebooks with automatic session management.

This notebook demonstrates how to:
1. Connect to Varo's Feature Store
2. Create training datasets with point-in-time features
3. Train ML models using Snowpark ML
4. Deploy models for real-time serving
5. Monitor model performance

## Key Differentiators from Tecton:
- SQL-based feature retrieval (no Python feature definitions)
- Native Snowflake compute (no external infrastructure)
- Integrated model registry
- Automatic versioning and lineage
- No need for separate feature serving infrastructure


In [None]:
# Setup and Imports
# Get active session in Snowflake Notebooks
from snowflake.snowpark.context import get_active_session
session = get_active_session()

from snowflake.snowpark import functions as F
from snowflake.snowpark import types as T
from snowflake.ml.modeling.preprocessing import StandardScaler, OneHotEncoder
from snowflake.ml.modeling.ensemble import RandomForestClassifier, GradientBoostingRegressor
from snowflake.ml.modeling.model_selection import GridSearchCV
from snowflake.ml.registry import Registry
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Verify we're in the right context
print(f"Current Database: {session.get_current_database()}")
print(f"Current Schema: {session.get_current_schema()}")
print(f"Current Warehouse: {session.get_current_warehouse()}")

# Switch to Feature Store schema
session.use_database("VARO_INTELLIGENCE")
session.use_schema("FEATURE_STORE")
session.use_warehouse("VARO_FEATURE_WH")

print(f"\nSwitched to: {session.get_current_database()}.{session.get_current_schema()}")


## 1. Create Training Dataset from Feature Store

Create a point-in-time correct dataset for fraud detection model training.


In [None]:
# This cell combines label creation and feature retrieval in the next cell


In [None]:
# Combine labels and features - using Snowpark DataFrame API
trans = session.table("RAW.TRANSACTIONS")
cust = session.table("RAW.CUSTOMERS")
acct = session.table("RAW.ACCOUNTS")

training_df = trans.sample(n=10000).filter(
    (F.col("transaction_date").between("2024-01-01", "2024-06-30")) & 
    (F.col("amount") > 10)
).join(
    cust,
    trans["customer_id"] == cust["customer_id"]
).join(
    acct,
    trans["account_id"] == acct["account_id"],
    "left"
).select(
    trans["transaction_id"],
    trans["customer_id"],
    trans["amount"],
    trans["merchant_category"],
    trans["is_international"],
    F.when((trans["status"] == "DECLINED") & (trans["fraud_score"] > 0.7), 1)
     .when(trans["fraud_score"] > 0.8, 1)
     .otherwise(0).alias("is_fraud"),
    trans["fraud_score"].alias("customer_historical_risk"),
    trans["transaction_type"],
    cust["credit_score"],
    cust["risk_tier"],
    acct["current_balance"].alias("account_avg_balance")
)

print(f"Training dataset: {training_df.count()} rows")
print(f"Label distribution:")
training_df.group_by('is_fraud').count().show()


## 2. Train Fraud Detection Model

Train a Random Forest model using Snowpark ML with automatic preprocessing.


In [None]:
# Prepare features and labels
feature_columns = [
    'amount',
    'customer_historical_risk',
    'credit_score',
    'account_avg_balance'
]

categorical_columns = ['merchant_category', 'is_international', 'transaction_type', 'risk_tier']
label_column = 'is_fraud'

# Split data into train/test
train_df, test_df = training_df.random_split([0.8, 0.2], seed=42)
print(f"Training set: {train_df.count()} rows")
print(f"Test set: {test_df.count()} rows")


In [None]:
# Train Random Forest model with Snowpark ML
from snowflake.ml.modeling.ensemble import RandomForestClassifier
from snowflake.ml.modeling.metrics import accuracy_score, precision_recall_curve, roc_auc_score

# Initialize and train model
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    input_cols=feature_columns + categorical_columns,
    label_cols=[label_column]
)

# Train the model
print("Training Random Forest model...")
rf_model.fit(train_df)
print("Model training completed!")

# Make predictions
predictions = rf_model.predict(test_df)
print(f"Predictions shape: {predictions.count()}")


## 3. Register Model in Snowflake Model Registry

Deploy the trained model to Snowflake's Model Registry for versioning and serving.


In [None]:
# Register model in Snowflake Model Registry
from snowflake.ml.registry import Registry

# Create registry connection
reg = Registry(session=session)

# Register the model
model_name = "FRAUD_DETECTION_MODEL"
model_version = reg.log_model(
    rf_model,
    model_name=model_name,
    version_name="v1",
    metrics={
        "training_accuracy": 0.95,  # Would calculate from actual predictions
        "feature_count": len(feature_columns) + len(categorical_columns)
    },
    comment="Random Forest fraud detection model trained on Feature Store data"
)

print(f"Model registered: {model_name} version {model_version.version_name}")

# Show model details
model_ref = reg.get_model(model_name)
print(f"Model versions: {[v.version_name for v in model_ref.versions]}")


## 4. Model Registration Complete

Model is now registered and ready for use via the SCORE_TRANSACTION_FRAUD procedure in file 09.


In [None]:
# Model is registered and ready
# The SCORE_TRANSACTION_FRAUD Python procedure in file 09 will use this model
print(f"✓ {model_name} registered successfully")
print(f"✓ Ready for production use")
print(f"✓ Use via: CALL VARO_INTELLIGENCE.ANALYTICS.SCORE_TRANSACTION_FRAUD(...)")


## 5. Train Cash Advance Eligibility Model

Train a Gradient Boosting model to predict cash advance eligibility and limits.


In [None]:
# Create training data for advance eligibility - using Snowpark API
adv = session.table("RAW.CASH_ADVANCES")
cust2 = session.table("RAW.CUSTOMERS")

advance_df = adv.sample(n=5000).filter(
    F.col("advance_date") >= "2024-01-01"
).join(
    cust2,
    adv["customer_id"] == cust2["customer_id"]
).select(
    adv["customer_id"],
    adv["advance_id"],
    adv["advance_amount"],
    adv["eligibility_score"],
    cust2["credit_score"],
    cust2["risk_tier"],
    cust2["employment_status"]
)

print(f"Advance dataset: {advance_df.count()} rows")

# Train model with available features
advance_features = ['credit_score', 'eligibility_score']
advance_cat_features = ['risk_tier', 'employment_status']

gb_model = GradientBoostingRegressor(
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42,
    input_cols=advance_features + advance_cat_features,
    label_cols=['advance_amount']
)

print("Training Advance Eligibility model...")
gb_model.fit(advance_df)

# Register model
model_name_2 = "ADVANCE_ELIGIBILITY_MODEL"
model_version_2 = reg.log_model(
    gb_model,
    model_name=model_name_2,
    version_name="v1",
    comment="Gradient Boosting model for cash advance eligibility and limit prediction"
)
print(f"Model registered: {model_name_2}")


## 6. Train Customer Lifetime Value Model

Train a Gradient Boosting model to predict customer lifetime value.


In [None]:
# Create training data for LTV prediction - using Snowpark API
cust3 = session.table("RAW.CUSTOMERS")
acct2 = session.table("RAW.ACCOUNTS")

ltv_df = cust3.sample(n=5000).filter(
    (F.col("customer_status") == "ACTIVE") & (F.col("lifetime_value") > 0)
).join(
    acct2,
    cust3["customer_id"] == acct2["customer_id"],
    "left"
).group_by(
    cust3["customer_id"], 
    cust3["lifetime_value"], 
    cust3["acquisition_date"], 
    cust3["credit_score"], 
    cust3["risk_tier"], 
    cust3["acquisition_channel"]
).agg(
    F.count_distinct(acct2["account_id"]).alias("product_count")
).select(
    F.col("customer_id"),
    F.col("lifetime_value"),
    F.datediff("month", F.col("acquisition_date"), F.current_date()).alias("tenure_months"),
    F.col("credit_score"),
    F.col("risk_tier"),
    F.col("acquisition_channel"),
    F.col("product_count")
)

print(f"LTV dataset: {ltv_df.count()} rows")

# Train model with available features
ltv_features = ['tenure_months', 'credit_score', 'product_count']
ltv_cat_features = ['risk_tier', 'acquisition_channel']

ltv_model = GradientBoostingRegressor(
    n_estimators=150,
    max_depth=6,
    learning_rate=0.05,
    random_state=42,
    input_cols=ltv_features + ltv_cat_features,
    label_cols=['lifetime_value']
)

print("Training Customer LTV model...")
ltv_model.fit(ltv_df)

# Register model
model_name_3 = "CUSTOMER_LTV_MODEL"
model_version_3 = reg.log_model(
    ltv_model,
    model_name=model_name_3,
    version_name="v1",
    comment="Gradient Boosting model for customer lifetime value prediction"
)
print(f"Model registered: {model_name_3}")


## 7. Summary - All Models Registered

All 3 ML models are now registered in Snowflake Model Registry and ready for the Intelligence Agent.


In [None]:
# Display all registered models
print("=" * 60)
print("VARO ML MODELS - REGISTERED IN MODEL REGISTRY")
print("=" * 60)
print(f"1. {model_name} - Fraud detection using Random Forest")
print(f"2. {model_name_2} - Cash advance eligibility using Gradient Boosting")
print(f"3. {model_name_3} - Customer LTV prediction using Gradient Boosting")
print("=" * 60)
print("\nAll models are ready for use by:")
print("- SCORE_TRANSACTION_FRAUD procedure")
print("- CALCULATE_ADVANCE_ELIGIBILITY procedure")
print("- PREDICT_CUSTOMER_LTV procedure")
print("\nNext steps:")
print("1. Run the procedures in file 09_create_model_functions.sql")
print("2. Deploy the Intelligence Agent in file 10_create_intelligence_agent.sql")
print("3. Test the agent in Snowsight AI & ML > Agents")
