# Hootsuite Intelligence Agent - ML Models

**Training 3 Machine Learning Models for Social Media Intelligence**

This notebook trains 3 ML models for the Hootsuite Intelligence Agent:
1. **CHURN_RISK_PREDICTOR** - Predicts customer churn risk (Low/Medium/High)
2. **CAMPAIGN_ROI_PREDICTOR** - Predicts campaign ROI (Low/Medium/High)
3. **TICKET_PRIORITY_CLASSIFIER** - Classifies support ticket priority (Low/Medium/High/Urgent)

---

## Prerequisites
- Database: `HOOTSUITE_INTELLIGENCE`
- Schema: `ML_MODELS`
- Feature views created in `ANALYTICS` schema
- Packages: `snowflake-ml-python`, `scikit-learn`, `pandas`

In [None]:
import streamlit as st
from snowflake.snowpark import Session
from snowflake.ml.modeling.ensemble import RandomForestClassifier
from snowflake.ml.modeling.linear_model import LogisticRegression
from snowflake.ml.modeling.preprocessing import OneHotEncoder
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.registry import Registry
import pandas as pd
import warnings

warnings.filterwarnings('ignore')

# ==============================================================================
# Setup Session
# ==============================================================================
# Get the current session
session = Session.builder.getOrCreate()

# Set context
session.use_database("HOOTSUITE_INTELLIGENCE")
session.use_schema("ML_MODELS")
session.use_warehouse("HOOTSUITE_WH")

# Initialize Model Registry
registry = Registry(
    session=session, 
    database_name="HOOTSUITE_INTELLIGENCE", 
    schema_name="ML_MODELS"
)

print("✅ Session and Registry configured")

--- 
## Model 1: Churn Risk Predictor

**Objective**: Predict customer churn risk category based on plan, industry, and usage metrics.
**Algorithm**: Random Forest Classifier

In [None]:
print("Training Churn Risk Model...")

# Load data
churn_df = session.table("HOOTSUITE_INTELLIGENCE.ANALYTICS.V_CHURN_RISK_FEATURES")

# Split data
train_churn, test_churn = churn_df.random_split([0.8, 0.2], seed=42)

# Define Pipeline
churn_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["PLAN_TYPE", "INDUSTRY"], 
        output_cols=["PLAN_TYPE_ENC", "INDUSTRY_ENC"], 
        drop_input_cols=True
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["CHURN_RISK_LABEL"], 
        output_cols=["PREDICTED_RISK"], 
        n_estimators=20,
        max_depth=5
    ))
])

# Train
churn_pipeline.fit(train_churn)

# Register Model
registry.log_model(
    model=churn_pipeline,
    model_name="CHURN_RISK_PREDICTOR",
    version_name="v1",
    target_platforms=['WAREHOUSE'],
    sample_input_data=train_churn.drop("CHURN_RISK_LABEL").limit(10),
    comment="Predicts customer churn risk (Low/Medium/High)"
)

print("✅ Churn Risk Model Registered.")

--- 
## Model 2: Campaign ROI Predictor

**Objective**: Predict campaign ROI category based on objective and budget.
**Algorithm**: Logistic Regression

In [None]:
print("Training Campaign ROI Model...")

# Load data
roi_df = session.table("HOOTSUITE_INTELLIGENCE.ANALYTICS.V_CAMPAIGN_ROI_FEATURES")

# Split data
train_roi, test_roi = roi_df.random_split([0.8, 0.2], seed=42)

# Define Pipeline
roi_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["OBJECTIVE"], 
        output_cols=["OBJECTIVE_ENC"], 
        drop_input_cols=True
    )),
    ("Classifier", LogisticRegression(
        label_cols=["ROI_LABEL"], 
        output_cols=["PREDICTED_ROI"], 
        max_iter=100
    ))
])

# Train
roi_pipeline.fit(train_roi)

# Register Model
registry.log_model(
    model=roi_pipeline,
    model_name="CAMPAIGN_ROI_PREDICTOR",
    version_name="v1",
    target_platforms=['WAREHOUSE'],
    sample_input_data=train_roi.drop("ROI_LABEL").limit(10),
    comment="Predicts campaign ROI (Low/Medium/High)"
)

print("✅ Campaign ROI Model Registered.")

--- 
## Model 3: Ticket Priority Classifier

**Objective**: Classify support ticket priority based on category and summary.
**Algorithm**: Random Forest Classifier

In [None]:
print("Training Ticket Priority Model...")

# Load data
ticket_df = session.table("HOOTSUITE_INTELLIGENCE.ANALYTICS.V_TICKET_PRIORITY_FEATURES")

# Split data
train_ticket, test_ticket = ticket_df.random_split([0.8, 0.2], seed=42)

# Define Pipeline
ticket_pipeline = Pipeline([
    ("Encoder", OneHotEncoder(
        input_cols=["CATEGORY"], 
        output_cols=["CATEGORY_ENC"], 
        drop_input_cols=True
    )),
    ("Classifier", RandomForestClassifier(
        label_cols=["PRIORITY_LABEL"], 
        output_cols=["PREDICTED_PRIORITY"], 
        n_estimators=20,
        max_depth=5
    ))
])

# Train
ticket_pipeline.fit(train_ticket)

# Register Model
registry.log_model(
    model=ticket_pipeline,
    model_name="TICKET_PRIORITY_CLASSIFIER",
    version_name="v1",
    target_platforms=['WAREHOUSE'],
    sample_input_data=train_ticket.drop("PRIORITY_LABEL").limit(10),
    comment="Classifies ticket priority (Low/Med/High/Urgent)"
)

print("✅ Ticket Priority Model Registered.")

--- 
## Verification

In [None]:
print("All 3 models trained and registered successfully!")
print("Run sql/ml/hootsuite_07_ml_model_functions.sql to create SQL wrappers.")