# Axon ML Models - Model Registry

This notebook trains ML models for the Axon Intelligence Agent:
- **Evidence Upload Volume Forecasting** - Predict future monthly evidence_upload_volume
- **Agency Churn Prediction** - Classify agencies at risk of churning
- **Device Deployment Success** - Predict which device deployments will be successful

All models are registered to Snowflake Model Registry and can be added as tools to the Intelligence Agent.

## Before You Begin

**Add these packages** in the Packages dropdown (upper right):
- `snowflake-ml-python`
- `scikit-learn`
- `xgboost`
- `matplotlib`

**Database:** AXON_INTELLIGENCE  
**Schema:** ANALYTICS  
**Warehouse:** AXON_WH


## Import Required Packages


In [None]:
# Import Python packages
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Import Snowpark
from snowflake.snowpark.context import get_active_session
import snowflake.snowpark.functions as F
import snowflake.snowpark.types as T
from snowflake.snowpark import Window

# Import Snowpark ML
from snowflake.ml.modeling.preprocessing import StandardScaler, OneHotEncoder
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.modeling.linear_model import LinearRegression, LogisticRegression
from snowflake.ml.modeling.ensemble import RandomForestClassifier
from snowflake.ml.modeling.metrics import mean_squared_error, mean_absolute_error, accuracy_score, roc_auc_score
from snowflake.ml.registry import Registry

print("✅ Packages imported successfully")


## Connect to Snowflake

Get active session and set context to Axon database.


In [None]:
# Get active Snowflake session
session = get_active_session()

# Set context
session.use_database('AXON_INTELLIGENCE')
session.use_schema('ANALYTICS')
session.use_warehouse('AXON_WH')

print(f"✅ Connected - Role: {session.get_current_role()}")
print(f"   Warehouse: {session.get_current_warehouse()}")
print(f"   Database.Schema: {session.get_fully_qualified_current_schema()}")


---
# MODEL 1: Evidence Upload Volume Forecasting

Predict future monthly evidence_upload_volume using historical order data.


### Prepare Revenue Training Data


In [None]:
# Get monthly evidence_upload_volume data with features
evidence_upload_volume_df = session.sql("""
SELECT
    DATE_TRUNC('month', order_date)::DATE AS order_month,
    MONTH(order_date) AS month_num,
    YEAR(order_date) AS year_num,
    SUM(order_amount)::FLOAT AS total_evidence_upload_volume,
    COUNT(DISTINCT order_id)::FLOAT AS order_count,
    COUNT(DISTINCT agency_id)::FLOAT AS customer_count,
    AVG(order_amount)::FLOAT AS avg_order_value
FROM RAW.ORDERS
WHERE order_date >= DATEADD('month', -30, CURRENT_DATE())
  AND payment_status = 'COMPLETED'
GROUP BY DATE_TRUNC('month', order_date), MONTH(order_date), YEAR(order_date)
ORDER BY order_month
""")

print(f"Revenue data: {evidence_upload_volume_df.count()} months")
evidence_upload_volume_df.show(5)


### Split Data and Train Revenue Model


In [None]:
# Train/test split (last 6 months for testing)
train_evidence_upload_volume = evidence_upload_volume_df.filter(F.col("ORDER_MONTH") < F.dateadd("month", F.lit(-6), F.current_date()))
test_evidence_upload_volume = evidence_upload_volume_df.filter(F.col("ORDER_MONTH") >= F.dateadd("month", F.lit(-6), F.current_date()))

# Drop ORDER_MONTH (DATE type not supported in pipeline)
train_evidence_upload_volume = train_evidence_upload_volume.drop("ORDER_MONTH")
test_evidence_upload_volume = test_evidence_upload_volume.drop("ORDER_MONTH")

# Create pipeline
evidence_upload_volume_pipeline = Pipeline([
    ("Scaler", StandardScaler(
        input_cols=["MONTH_NUM", "ORDER_COUNT", "AGENCY_COUNT", "AVG_ORDER_VALUE"],
        output_cols=["MONTH_NUM_SCALED", "ORDER_COUNT_SCALED", "AGENCY_COUNT_SCALED", "AVG_ORDER_VALUE_SCALED"]
    )),
    ("LinearRegression", LinearRegression(
        label_cols=["TOTAL_REVENUE"],
        output_cols=["PREDICTED_REVENUE"]
    ))
])

# Train model
evidence_upload_volume_pipeline.fit(train_evidence_upload_volume)
print("✅ Revenue forecasting model trained")


### Evaluate and Register Revenue Model


In [None]:
# Make predictions on test set
test_predictions = evidence_upload_volume_pipeline.predict(test_evidence_upload_volume)

# Calculate metrics
mae = mean_absolute_error(df=test_predictions, y_true_col_names="TOTAL_REVENUE", y_pred_col_names="PREDICTED_REVENUE")
mse = mean_squared_error(df=test_predictions, y_true_col_names="TOTAL_REVENUE", y_pred_col_names="PREDICTED_REVENUE")
rmse = mse ** 0.5

metrics = {"mae": round(mae, 2), "rmse": round(rmse, 2)}
print(f"Model metrics: {metrics}")

# Register model (use different name to avoid conflict with ML Functions)
reg = Registry(session)
reg.log_model(
    model=evidence_upload_volume_pipeline,
    model_name="EVIDENCE_VOLUME_PREDICTOR",
    version_name="V1",
    comment="Predicts monthly evidence_upload_volume based on historical order patterns using Linear Regression",
    metrics=metrics
)

print("✅ Revenue model registered to Model Registry as EVIDENCE_VOLUME_PREDICTOR")


---
# MODEL 2: Agency Churn Prediction

Classify agencies as likely to churn or not based on behavior patterns.


### Prepare Churn Training Data


In [None]:
# Get customer features for churn prediction
churn_df = session.sql("""
SELECT
    c.agency_id,
    c.customer_segment,
    c.industry_vertical,
    c.lifetime_value::FLOAT AS lifetime_value,
    c.credit_risk_score::FLOAT AS credit_risk_score,
    -- Recent orders (last 3 months)
    COUNT(DISTINCT CASE WHEN o.order_date >= DATEADD('month', -3, CURRENT_DATE()) 
                   THEN o.order_id END)::FLOAT AS recent_orders,
    -- Historical average
    (COUNT(DISTINCT CASE WHEN o.order_date < DATEADD('month', -3, CURRENT_DATE()) 
                    THEN o.order_id END) / 9.0)::FLOAT AS historical_avg_orders,
    -- Support satisfaction
    AVG(CASE WHEN st.created_date >= DATEADD('month', -6, CURRENT_DATE()) 
        THEN st.customer_satisfaction_score::FLOAT END) AS avg_csat,
    -- Quality issues
    COUNT(DISTINCT qi.quality_issue_id)::FLOAT AS quality_issue_count,
    -- Design wins
    COUNT(DISTINCT CASE WHEN dw.device_deployment_date >= DATEADD('month', -12, CURRENT_DATE()) 
                   THEN dw.device_deployment_id END)::FLOAT AS recent_device_deployments,
    -- Target: Is churned
    (c.customer_status = 'CHURNED' 
     OR (COUNT(DISTINCT CASE WHEN o.order_date >= DATEADD('month', -3, CURRENT_DATE()) 
                        THEN o.order_id END) = 0 
         AND COUNT(DISTINCT CASE WHEN o.order_date < DATEADD('month', -3, CURRENT_DATE()) 
                            THEN o.order_id END) > 5))::BOOLEAN AS is_churned
FROM RAW.AGENCIES c
LEFT JOIN RAW.ORDERS o ON c.agency_id = o.agency_id
LEFT JOIN RAW.SUPPORT_TICKETS st ON c.agency_id = st.agency_id
LEFT JOIN RAW.QUALITY_ISSUES qi ON c.agency_id = qi.agency_id
LEFT JOIN RAW.DEVICE_DEPLOYMENTS dw ON c.agency_id = dw.agency_id
GROUP BY c.agency_id, c.customer_segment, c.industry_vertical, c.lifetime_value, c.credit_risk_score, c.customer_status
HAVING COUNT(DISTINCT o.order_id) > 10
""")

print(f"Churn data: {churn_df.count()} agencies")
churn_df.show(5)


### Train Churn Classification Model


In [None]:
# Train/test split (80/20)
train_churn, test_churn = churn_df.random_split([0.8, 0.2], seed=42)

# Drop AGENCY_ID and drop original string columns after they'll be encoded
train_churn = train_churn.drop("AGENCY_ID")
test_churn = test_churn.drop("AGENCY_ID")

# Create pipeline with preprocessing and classification
churn_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["AGENCY_SEGMENT", "INDUSTRY_VERTICAL"],
        output_cols=["AGENCY_SEGMENT_ENCODED", "INDUSTRY_VERTICAL_ENCODED"],
        drop_input_cols=True,  # Drop original string columns after encoding
        handle_unknown="ignore"
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["IS_CHURNED"],
        output_cols=["CHURN_PREDICTION"],
        n_estimators=100,
        max_depth=10
    ))
])

# Train model
churn_pipeline.fit(train_churn)
print("✅ Churn classification model trained")


### Evaluate and Register Churn Model


In [None]:
# Make predictions
churn_predictions = churn_pipeline.predict(test_churn)

# Calculate metrics
accuracy = accuracy_score(df=churn_predictions, y_true_col_names="IS_CHURNED", y_pred_col_names="CHURN_PREDICTION")
# Note: ROC AUC might need probability scores - using accuracy for now
churn_metrics = {"accuracy": round(accuracy, 4)}
print(f"Churn model metrics: {churn_metrics}")

# Register model (use different name to avoid conflict)
reg.log_model(
    model=churn_pipeline,
    model_name="AGENCY_CHURN_PREDICTOR",
    version_name="V1",
    comment="Predicts customer churn probability using Random Forest based on behavior patterns",
    metrics=churn_metrics
)

print("✅ Churn model registered to Model Registry as AGENCY_CHURN_PREDICTOR")


---
# MODEL 3: Device Deployment Success Prediction

Predict which design wins are likely to convert to production orders.


### Prepare Device Deployment Success Data


In [None]:
# Get design win features
deployment_success_df = session.sql("""
SELECT
    dw.device_deployment_id,
    p.product_family,
    c.customer_segment,
    c.industry_vertical,
    dw.estimated_annual_volume::FLOAT AS estimated_volume,
    dw.competitive_displacement::BOOLEAN AS is_competitive_win,
    -- Has this design gone to production?
    (EXISTS (SELECT 1 FROM RAW.PRODUCTION_ORDERS po 
             WHERE po.device_deployment_id = dw.device_deployment_id))::BOOLEAN AS converted_to_production
FROM RAW.DEVICE_DEPLOYMENTS dw
JOIN RAW.PRODUCT_CATALOG p ON dw.product_id = p.product_id
JOIN RAW.AGENCIES c ON dw.agency_id = c.agency_id
WHERE dw.device_deployment_date >= DATEADD('month', -24, CURRENT_DATE())
""")

print(f"Design win data: {deployment_success_df.count()} design wins")
deployment_success_df.show(5)


### Train Conversion Model


In [None]:
# Split data
train_deployment_success, test_deployment_success = deployment_success_df.random_split([0.8, 0.2], seed=42)

# Drop DESIGN_WIN_ID (VARCHAR not supported as feature)
train_deployment_success = train_deployment_success.drop("DESIGN_WIN_ID")
test_deployment_success = test_deployment_success.drop("DESIGN_WIN_ID")

# Create pipeline
deployment_success_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["PRODUCT_FAMILY", "AGENCY_SEGMENT", "INDUSTRY_VERTICAL"],
        output_cols=["PRODUCT_FAMILY_ENC", "AGENCY_SEGMENT_ENC", "INDUSTRY_VERTICAL_ENC"],
        drop_input_cols=True,  # Drop original string columns after encoding
        handle_unknown="ignore"
    )),
    ("Classifier", LogisticRegression(
        label_cols=["CONVERTED_TO_PRODUCTION"],
        output_cols=["CONVERSION_PREDICTION"]
    ))
])

# Train
deployment_success_pipeline.fit(train_deployment_success)
print("✅ Design win deployment_success model trained")


### Evaluate and Register Conversion Model


In [None]:
# Predict on test set
deployment_success_predictions = deployment_success_pipeline.predict(test_deployment_success)

# Calculate accuracy
conv_accuracy = accuracy_score(df=deployment_success_predictions, 
                                y_true_col_names="CONVERTED_TO_PRODUCTION",
                                y_pred_col_names="CONVERSION_PREDICTION")
conv_metrics = {"accuracy": round(conv_accuracy, 4)}
print(f"Conversion model metrics: {conv_metrics}")

# Register model (use different name to avoid conflict)
reg.log_model(
    model=deployment_success_pipeline,
    model_name="DEPLOYMENT_SUCCESS_PREDICTOR",
    version_name="V1",
    comment="Predicts design win to production deployment_success using Logistic Regression",
    metrics=conv_metrics
)

print("✅ Conversion model registered to Model Registry as DEPLOYMENT_SUCCESS_PREDICTOR")


---
# Verify Models in Registry


In [None]:
# Show all models in the registry
print("Models in registry:")
reg.show_models()

# Show versions for evidence_upload_volume model
print("\nRevenue model versions:")
reg.get_model("EVIDENCE_VOLUME_PREDICTOR").show_versions()

# Show versions for churn model  
print("\nChurn model versions:")
reg.get_model("AGENCY_CHURN_PREDICTOR").show_versions()

# Show versions for deployment_success model
print("\nConversion model versions:")
reg.get_model("DEPLOYMENT_SUCCESS_PREDICTOR").show_versions()

print("\n✅ All models registered and ready to add to Intelligence Agent")


---
# Test Model Inference

Test calling each model to make predictions.


In [None]:
# Test evidence_upload_volume forecast on recent data
evidence_upload_volume_model = reg.get_model("EVIDENCE_VOLUME_PREDICTOR").default
recent_evidence_upload_volume = evidence_upload_volume_df.limit(3).drop("ORDER_MONTH")
evidence_upload_volume_preds = evidence_upload_volume_model.run(recent_evidence_upload_volume, function_name="predict")
print("Revenue predictions:")
evidence_upload_volume_preds.select("TOTAL_REVENUE", "PREDICTED_REVENUE").show()

# Test churn prediction on sample agencies
churn_model = reg.get_model("AGENCY_CHURN_PREDICTOR").default
sample_agencies = churn_df.limit(5).drop("AGENCY_ID")
churn_preds = churn_model.run(sample_agencies, function_name="predict")
print("\nChurn predictions:")
churn_preds.select("IS_CHURNED", "CHURN_PREDICTION").show()

# Test deployment_success prediction
deployment_success_model = reg.get_model("DEPLOYMENT_SUCCESS_PREDICTOR").default
sample_designs = deployment_success_df.limit(5).drop("DESIGN_WIN_ID")
deployment_success_preds = deployment_success_model.run(sample_designs, function_name="predict")
print("\nConversion predictions:")
deployment_success_preds.select("CONVERTED_TO_PRODUCTION", "CONVERSION_PREDICTION").show()

print("\n✅ All models tested successfully!")


---
# Next Steps

## Add Models to Intelligence Agent

1. In Snowsight → AI & ML → Agents → AXON_INTELLIGENCE_AGENT
2. Go to Tools → + Add → Model
3. Add each registered model:
   - **EVIDENCE_VOLUME_PREDICTOR**
   - **AGENCY_CHURN_PREDICTOR**
   - **DEPLOYMENT_SUCCESS_PREDICTOR**

## Example Questions for Agent

- "Forecast evidence_upload_volume for the next quarter using the evidence_upload_volume predictor"
- "Which agencies are predicted to churn according to the churn predictor?"
- "Show me design wins with high deployment_success probability using the deployment_success predictor"

The models will now be available as tools your agent can use!
