# üìà Cortex ML Interactive Tutorial
## Learn by Doing: Time Series Forecasting & Anomaly Detection

**Author:** Li Ma  
**Date:** February 24, 2026  
**Project:** DIA v2.0 - Direct Marketing Analytics Intelligence

---

## üìö What You'll Learn

This interactive notebook teaches you how to:
1. ‚úÖ Forecast future metrics (email volume, open rates, revenue)
2. ‚úÖ Detect anomalies in time series data
3. ‚úÖ Understand confidence intervals
4. ‚úÖ Monitor data quality and system health
5. ‚úÖ Build predictive analytics pipelines

## üéØ Prerequisites

- Docker containers running (`docker-compose up`)
- Snowflake credentials configured in `.env` file
- Time series data in Snowflake (minimum 14 days)

## üß† What is Cortex ML?

**Cortex ML** provides machine learning functions for time series data:

### 1. FORECAST() - Predict Future Values
- Predict next week's email volume
- Forecast expected open rates
- Estimate future revenue
- Capacity planning

### 2. ANOMALY_DETECTION() - Find Unusual Patterns
- Detect sudden bounce rate spikes
- Identify data quality issues
- Alert on system anomalies
- Monitor KPI deviations

**Use Cases:**
- Performance forecasting
- Capacity planning
- Quality monitoring
- Trend analysis
- Alert systems

---

## üìä Data Requirements

For accurate ML predictions:
- ‚úÖ **Time series data** (date/timestamp column)
- ‚úÖ **Numeric metrics** (values to predict)
- ‚úÖ **Minimum 14 days** of historical data
- ‚úÖ **Regular intervals** (daily, hourly, etc.)
- ‚úÖ **No missing dates** (fill gaps with 0 or interpolate)

---

**üí° Tip:** Run each cell with `Shift + Enter` and experiment with your own data!

In [None]:
# Install required packages for this notebook
# Run this cell once to install dependencies
import sys
import subprocess

packages = [
    'structlog',
    'python-dotenv',
    'snowflake-snowpark-python',
    'pandas',
    'matplotlib'
]

print("üì¶ Installing required packages...")
for package in packages:
    print(f"   Installing {package}...")
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
        print(f"   ‚úÖ {package} installed")
    except subprocess.CalledProcessError as e:
        print(f"   ‚ùå Failed to install {package}: {e}")

print("\n‚úÖ Installation complete!")
print("‚ö†Ô∏è  If this is the first install, please RESTART THE KERNEL:")
print("   Jupyter menu: Kernel ‚Üí Restart Kernel")

In [None]:
# Setup Python paths and import libraries
import sys
import os

# Calculate the project paths dynamically
notebook_dir = os.getcwd()
project_root = os.path.abspath(os.path.join(notebook_dir, '..'))
orchestrator_path = os.path.join(project_root, 'orchestrator')

# Add paths for both local and Docker environments
sys.path.insert(0, orchestrator_path)
sys.path.insert(0, project_root)
sys.path.insert(0, '/app')

print(f"üìÅ Python paths added:")
print(f"   Project Root: {project_root}")
print(f"   Orchestrator: {orchestrator_path}")

# Verify orchestrator path exists
if os.path.exists(orchestrator_path):
    print(f"   ‚úÖ Orchestrator directory found")
else:
    print(f"   ‚ö†Ô∏è  Orchestrator directory NOT found at: {orchestrator_path}")

# Core Python libraries
import json
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from datetime import datetime, timedelta

# Data manipulation and visualization
import pandas as pd
import matplotlib.pyplot as plt

# Snowflake libraries
from snowflake.snowpark import Session

# Environment and logging
from dotenv import load_dotenv

# Try to import custom logger with fallback
try:
    from utils.logging import get_logger
    logger = get_logger(__name__)
    print(f"   ‚úÖ Using custom structlog logger")
except ImportError as e:
    import logging
    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
    logger = logging.getLogger(__name__)
    print(f"   ‚ö†Ô∏è  Using standard logging (utils.logging not found)")

# Load environment variables from .env file
load_dotenv()

print("\n‚úÖ All libraries imported successfully!")
print(f"   Python version: {sys.version.split()[0]}")

## üì¶ Understanding the Response Data Models

The ML service uses these data structures for predictions and anomalies.

In [None]:
@dataclass
class ForecastResult:
    """
    Single forecast prediction with confidence interval.
    
    Attributes:
        timestamp: Date/time of prediction
        forecast: Predicted value
        lower_bound: Lower confidence interval (pessimistic)
        upper_bound: Upper confidence interval (optimistic)
    """
    timestamp: Any
    forecast: float
    lower_bound: Optional[float] = None
    upper_bound: Optional[float] = None


@dataclass
class ForecastResponse:
    """
    Complete forecast response with all predictions.
    
    Attributes:
        table: Source data table
        metric: Column being predicted
        forecasts: List of predictions
        metadata: Model info (algorithm, accuracy, etc.)
        error: Error message if something went wrong
    """
    table: str
    metric: str
    forecasts: Optional[List[ForecastResult]] = None
    metadata: Optional[Dict[str, Any]] = None
    error: Optional[str] = None


@dataclass
class AnomalyResult:
    """
    Single anomaly detection result.
    
    Attributes:
        timestamp: Date/time of data point
        value: Actual observed value
        expected: Expected value based on patterns
        is_anomaly: True if unusual, False if normal
        score: Anomaly score (higher = more unusual)
    """
    timestamp: Any
    value: float
    expected: Optional[float] = None
    is_anomaly: bool = False
    score: Optional[float] = None


@dataclass
class AnomalyResponse:
    """
    Complete anomaly detection response.
    
    Attributes:
        table: Source data table
        metric: Column being analyzed
        anomalies: List of all data points (marked if anomaly)
        metadata: Detection info (threshold, sensitivity, etc.)
        error: Error message if something went wrong
    """
    table: str
    metric: str
    anomalies: Optional[List[AnomalyResult]] = None
    metadata: Optional[Dict[str, Any]] = None
    error: Optional[str] = None
    
    @property
    def anomaly_count(self) -> int:
        """Count how many anomalies were found"""
        if not self.anomalies:
            return 0
        return sum(1 for a in self.anomalies if a.is_anomaly)
    
    @property
    def has_anomalies(self) -> bool:
        """Check if any anomalies were found"""
        return self.anomaly_count > 0


# Test it out!
sample_forecast = ForecastResult(
    timestamp="2026-03-01",
    forecast=25000.0,
    lower_bound=23000.0,
    upper_bound=27000.0
)

sample_anomaly = AnomalyResult(
    timestamp="2026-02-20",
    value=15.2,
    expected=8.5,
    is_anomaly=True,
    score=0.95
)

print("‚úÖ ML data models created!")
print(f"\nüìà Forecast Example:")
print(f"   Date: {sample_forecast.timestamp}")
print(f"   Predicted: {sample_forecast.forecast:,.0f}")
print(f"   Range: {sample_forecast.lower_bound:,.0f} - {sample_forecast.upper_bound:,.0f}")

print(f"\n‚ö†Ô∏è  Anomaly Example:")
print(f"   Date: {sample_anomaly.timestamp}")
print(f"   Actual: {sample_anomaly.value}")
print(f"   Expected: {sample_anomaly.expected}")
print(f"   Is Anomaly: {sample_anomaly.is_anomaly}")
print(f"   Score: {sample_anomaly.score}")

## üîß Import CortexML Service

Now let's import the `CortexML` class from the services module.

In [None]:
# Import the CortexML service class
try:
    from services.cortex_ml import CortexML
    print("‚úÖ CortexML class imported successfully!")
    print("   Ready to forecast and detect anomalies")
except ImportError as e:
    print(f"‚ùå Failed to import CortexML: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Make sure you ran Cell 2 (path setup)")
    print("   2. Check that orchestrator/services/cortex_ml.py exists")

## üìà Example 1: Forecast Future Metrics

Predict future email volume for the next 30 days.

**‚ö†Ô∏è Note:** Update table and column names to match your data!

In [None]:
# Forecast email volume for next 30 days
TABLE_NAME = "VW_SFMC_EMAIL_PERFORMANCE"  # Your table name
TIMESTAMP_COL = "SEND_DATE"                # Date column
TARGET_COL = "EMAILS_SENT"                 # Metric to predict

try:
    ml = CortexML()
    
    print(f"üìà Forecasting {TARGET_COL} from {TABLE_NAME}...")
    print(f"   Time column: {TIMESTAMP_COL}")
    print(f"   Forecasting: 30 days ahead\n")
    
    forecast_response = ml.forecast(
        table=TABLE_NAME,
        timestamp_col=TIMESTAMP_COL,
        target_col=TARGET_COL,
        periods=30  # Predict next 30 days
    )
    
    if forecast_response.error:
        print(f"‚ùå Error: {forecast_response.error}")
        print("\nüí° Common issues:")
        print("   ‚Ä¢ Less than 14 days of historical data")
        print("   ‚Ä¢ Missing dates in time series")
        print("   ‚Ä¢ Table/column names don't exist")
    else:
        print(f"‚úÖ Forecast generated: {len(forecast_response.forecasts)} predictions\n")
        
        # Show first 7 days
        print("üìä First 7 Days Forecast:")
        print("-" * 70)
        for forecast in forecast_response.forecasts[:7]:
            print(f"{forecast.timestamp}: {forecast.forecast:,.0f}")
            print(f"   Range: {forecast.lower_bound:,.0f} - {forecast.upper_bound:,.0f}")
        
        # Summary statistics
        total_predicted = sum(f.forecast for f in forecast_response.forecasts)
        print(f"\nüìä 30-Day Summary:")
        print(f"   Total Predicted: {total_predicted:,.0f}")
        print(f"   Daily Average: {total_predicted/30:,.0f}")
        print(f"   Algorithm: {forecast_response.metadata.get('algorithm', 'N/A')}")
        
except Exception as e:
    print(f"‚ùå Forecast failed: {e}")
    print("\nüí° Make sure:")
    print("   1. Table exists and has time series data")
    print("   2. Column names are correct")
    print("   3. Minimum 14 days of historical data")

## üìä Visualize the Forecast

Let's plot the forecast to see the trend visually.

In [None]:
# Visualize forecast (if forecast was successful)
try:
    if 'forecast_response' in locals() and not forecast_response.error:
        # Extract data for plotting
        dates = [f.timestamp for f in forecast_response.forecasts]
        predictions = [f.forecast for f in forecast_response.forecasts]
        lower = [f.lower_bound for f in forecast_response.forecasts]
        upper = [f.upper_bound for f in forecast_response.forecasts]
        
        # Create plot
        plt.figure(figsize=(12, 6))
        plt.plot(dates, predictions, 'b-', linewidth=2, label='Forecast')
        plt.fill_between(dates, lower, upper, alpha=0.3, label='Confidence Interval')
        plt.xlabel('Date')
        plt.ylabel(TARGET_COL)
        plt.title(f'30-Day Forecast: {TARGET_COL}')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
        
        print("‚úÖ Forecast visualization complete!")
    else:
        print("‚ö†Ô∏è  No forecast data to visualize")
        print("   Run the forecast cell first!")
        
except Exception as e:
    print(f"‚ùå Visualization error: {e}")

## ‚ö†Ô∏è  Example 2: Detect Anomalies

Find unusual patterns in bounce rates or other metrics.

In [None]:
# Detect anomalies in bounce rate
TARGET_COL_ANOMALY = "BOUNCE_RATE"  # Metric to check for anomalies

try:
    ml = CortexML()
    
    print(f"üîç Detecting anomalies in {TARGET_COL_ANOMALY}...")
    print(f"   Table: {TABLE_NAME}")
    print(f"   Sensitivity: 0.95 (95% confidence)\n")
    
    anomaly_response = ml.detect_anomalies(
        table=TABLE_NAME,
        timestamp_col=TIMESTAMP_COL,
        target_col=TARGET_COL_ANOMALY,
        sensitivity=0.95  # 0.9 to 0.99 (higher = more sensitive)
    )
    
    if anomaly_response.error:
        print(f"‚ùå Error: {anomaly_response.error}")
    else:
        print(f"‚úÖ Anomaly detection complete!\n")
        
        if anomaly_response.has_anomalies:
            print(f"‚ö†Ô∏è  Found {anomaly_response.anomaly_count} anomalies!\n")
            print("üìä Anomalies Detected:")
            print("-" * 70)
            
            for anomaly in anomaly_response.anomalies:
                if anomaly.is_anomaly:
                    print(f"üî¥ {anomaly.timestamp}")
                    print(f"   Actual: {anomaly.value:.2f}")
                    print(f"   Expected: {anomaly.expected:.2f}")
                    print(f"   Deviation: {abs(anomaly.value - anomaly.expected):.2f}")
                    print(f"   Score: {anomaly.score:.3f}")
                    print()
        else:
            print(f"‚úÖ No anomalies detected - all values are normal!")
            print(f"   Analyzed {len(anomaly_response.anomalies)} data points")
        
        print(f"\nüìä Detection Settings:")
        print(f"   Sensitivity: {anomaly_response.metadata.get('sensitivity', 0.95)}")
        print(f"   Algorithm: {anomaly_response.metadata.get('algorithm', 'Statistical')}")
        
except Exception as e:
    print(f"‚ùå Anomaly detection failed: {e}")

## üìâ Visualize Anomalies

Plot the metric with anomalies highlighted in red.

In [None]:
# Visualize anomalies (if detection was successful)
try:
    if 'anomaly_response' in locals() and not anomaly_response.error:
        # Extract data
        dates = [a.timestamp for a in anomaly_response.anomalies]
        values = [a.value for a in anomaly_response.anomalies]
        expected = [a.expected for a in anomaly_response.anomalies]
        is_anomaly = [a.is_anomaly for a in anomaly_response.anomalies]
        
        # Split into normal and anomaly points
        normal_dates = [d for d, is_anom in zip(dates, is_anomaly) if not is_anom]
        normal_values = [v for v, is_anom in zip(values, is_anomaly) if not is_anom]
        anomaly_dates = [d for d, is_anom in zip(dates, is_anomaly) if is_anom]
        anomaly_values = [v for v, is_anom in zip(values, is_anomaly) if is_anom]
        
        # Create plot
        plt.figure(figsize=(14, 6))
        plt.plot(dates, expected, 'g--', alpha=0.5, label='Expected', linewidth=1)
        plt.scatter(normal_dates, normal_values, c='blue', alpha=0.6, label='Normal', s=30)
        plt.scatter(anomaly_dates, anomaly_values, c='red', alpha=0.8, label='Anomaly', s=100, marker='x', linewidths=3)
        plt.xlabel('Date')
        plt.ylabel(TARGET_COL_ANOMALY)
        plt.title(f'Anomaly Detection: {TARGET_COL_ANOMALY}')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
        
        print("‚úÖ Anomaly visualization complete!")
        print(f"   üîµ Blue dots: Normal values ({len(normal_dates)} points)")
        print(f"   üî¥ Red X: Anomalies ({len(anomaly_dates)} points)")
        
    else:
        print("‚ö†Ô∏è  No anomaly data to visualize")
        print("   Run the anomaly detection cell first!")
        
except Exception as e:
    print(f"‚ùå Visualization error: {e}")

## üéØ Example 3: Forecast Multiple Metrics

Compare forecasts for different metrics side-by-side.

In [None]:
# Forecast multiple metrics
metrics_to_forecast = [
    "OPEN_RATE",
    "CLICK_RATE",
    "BOUNCE_RATE"
]

forecast_results = {}

print("üìà Forecasting Multiple Metrics:\n")
print("=" * 70)

try:
    ml = CortexML()
    
    for metric in metrics_to_forecast:
        print(f"\nüîÑ Forecasting {metric}...")
        
        try:
            response = ml.forecast(
                table=TABLE_NAME,
                timestamp_col=TIMESTAMP_COL,
                target_col=metric,
                periods=7  # Next 7 days
            )
            
            if response.error:
                print(f"   ‚ùå Error: {response.error}")
            else:
                forecast_results[metric] = response
                avg_forecast = sum(f.forecast for f in response.forecasts) / len(response.forecasts)
                print(f"   ‚úÖ 7-day average forecast: {avg_forecast:.2f}")
                
        except Exception as e:
            print(f"   ‚ùå Failed: {e}")
    
    print("\n" + "=" * 70)
    print("\nüìä Summary:")
    for metric, response in forecast_results.items():
        avg = sum(f.forecast for f in response.forecasts) / len(response.forecasts)
        print(f"   {metric}: {avg:.2f}")
        
except Exception as e:
    print(f"‚ùå Multiple forecast failed: {e}")

## üîî Example 4: Alert System

Build an anomaly alert system that monitors metrics.

In [None]:
# Anomaly alert system
def check_for_alerts(table: str, metrics: List[str], sensitivity: float = 0.95):
    """
    Monitor multiple metrics and alert on anomalies.
    
    Args:
        table: Table to monitor
        metrics: List of metric columns to check
        sensitivity: Anomaly detection threshold
    
    Returns:
        Dictionary of alerts by metric
    """
    ml = CortexML()
    alerts = {}
    
    for metric in metrics:
        try:
            response = ml.detect_anomalies(
                table=table,
                timestamp_col=TIMESTAMP_COL,
                target_col=metric,
                sensitivity=sensitivity
            )
            
            if not response.error and response.has_anomalies:
                alerts[metric] = [
                    a for a in response.anomalies if a.is_anomaly
                ]
        except Exception as e:
            print(f"‚ö†Ô∏è  Error checking {metric}: {e}")
    
    return alerts


# Run alert check
print("üîî Running Anomaly Alert System...\n")

alert_metrics = ["BOUNCE_RATE", "UNSUBSCRIBE_RATE"]

try:
    alerts = check_for_alerts(TABLE_NAME, alert_metrics, sensitivity=0.95)
    
    if alerts:
        print(f"‚ö†Ô∏è  ALERTS DETECTED!\n")
        for metric, anomalies in alerts.items():
            print(f"üî¥ {metric}: {len(anomalies)} anomalies")
            for a in anomalies[:3]:  # Show first 3
                print(f"   ‚Ä¢ {a.timestamp}: {a.value:.2f} (expected {a.expected:.2f})")
            print()
    else:
        print("‚úÖ All systems normal - no anomalies detected!")
        
except Exception as e:
    print(f"‚ùå Alert system error: {e}")

## üéì Summary: What You Learned

Congratulations! You've learned:

‚úÖ **Cortex ML Fundamentals**
- Time series forecasting
- Anomaly detection
- Confidence intervals
- Sensitivity tuning

‚úÖ **Practical Applications**
- Forecast email metrics
- Detect unusual patterns
- Monitor multiple KPIs
- Build alert systems

‚úÖ **Data Visualization**
- Plot forecasts with confidence intervals
- Highlight anomalies
- Compare multiple metrics

‚úÖ **Python Skills**
- Dataclasses for structured data
- Matplotlib for visualization
- List comprehensions
- Error handling

---

## üöÄ Next Steps

**Try These Experiments:**
1. Forecast different time horizons (7, 14, 30 days)
2. Compare sensitivity levels (0.90, 0.95, 0.99)
3. Monitor daily vs weekly aggregations
4. Build automated alert emails

**Advanced Use Cases:**

### 1. Capacity Planning
```python
# Predict infrastructure needs
server_forecast = ml.forecast(
    table="SYSTEM_METRICS",
    timestamp_col="HOUR",
    target_col="API_REQUESTS",
    periods=168  # Next week (hourly)
)
```

### 2. Performance Monitoring
```python
# Track KPI trends
for metric in ["OPEN_RATE", "CTR", "CONVERSION_RATE"]:
    forecast = ml.forecast(
        table="DAILY_METRICS",
        timestamp_col="DATE",
        target_col=metric,
        periods=30
    )
    # Alert if forecast < target
```

### 3. Data Quality Checks
```python
# Detect data issues
quality_check = ml.detect_anomalies(
    table="RAW_DATA",
    timestamp_col="LOAD_DATE",
    target_col="NULL_COUNT",
    sensitivity=0.99
)
```

---

## üîó Related Resources

- **Documentation:** `guides/02_STEP_2.1_CORTEX_SERVICES.md`
- **Service Code:** `orchestrator/services/cortex_ml.py`
- **Other Notebooks:**
  - `cortex_analyst_interactive.ipynb` - SQL generation
  - `cortex_complete_interactive.ipynb` - Text generation
  - `cortex_search_interactive.ipynb` - Semantic search

---

## üìù Key Concepts

### Confidence Intervals

Forecasts include uncertainty ranges:
- **Forecast:** Most likely value
- **Lower Bound:** Pessimistic estimate (90% confident actual will be above this)
- **Upper Bound:** Optimistic estimate (90% confident actual will be below this)

Example: Forecast = 1000, Range = 800-1200
- Most likely: 1000 emails
- Could be as low as: 800
- Could be as high as: 1200

### Anomaly Sensitivity

Controls how strict detection is:
- **0.90 (90%)**: Only catch obvious anomalies
- **0.95 (95%)**: Balanced - good default
- **0.99 (99%)**: Very sensitive - catch subtle issues

Higher sensitivity = More anomalies detected = More false positives

### Time Series Requirements

**Good Data:**
‚úÖ Daily values for 30+ days
‚úÖ No missing dates
‚úÖ Consistent intervals

**Bad Data:**
‚ùå Only 5 days of history
‚ùå Gaps in dates (missing weeks)
‚ùå Irregular intervals (random dates)

---

## üí° Combining All Cortex Services

```python
# Complete workflow example
def analyze_performance():
    # 1. Get historical data (Cortex Analyst)
    analyst = CortexAnalyst()
    data = analyst.send_message("What was daily open rate last 90 days?")
    
    # 2. Detect anomalies (Cortex ML)
    ml = CortexML()
    anomalies = ml.detect_anomalies(
        table="EMAIL_METRICS",
        timestamp_col="DATE",
        target_col="OPEN_RATE"
    )
    
    # 3. Forecast future (Cortex ML)
    forecast = ml.forecast(
        table="EMAIL_METRICS",
        timestamp_col="DATE",
        target_col="OPEN_RATE",
        periods=30
    )
    
    # 4. Generate insights (Cortex Complete)
    llm = CortexComplete()
    summary = llm.complete(f"""
    Analyze this performance data:
    - Historical: {data.results}
    - Anomalies: {anomalies.anomaly_count}
    - Forecast: {forecast.forecasts[0].forecast}
    
    Provide 3 key insights:
    """)
    
    return summary
```

---

**Status:** ‚úÖ Tutorial Complete  
**Congratulations!** You've mastered all 4 Cortex services! üéâ