# Fiddler Alert Testing Notebook
This notebook creates a test environment to verify if Fiddler's alert emails successfully reach your inbox despite corporate firewalls and email filtering. We'll:

- Create a test project and model with a synthetic dataset
- Set up a baseline and various alert rules
- Publish events that intentionally violate these alerts


# Fiddler Alert Testing Notebook Documentation

## Purpose
This notebook creates a complete testing environment to verify Fiddler's alert functionality and email delivery. It automates the entire workflow from creating test resources to triggering real alerts that send notifications.

## Key Features
- Creates a self-contained test project with synthetic data
- Configures all available Fiddler alert types
- Generates and publishes events that trigger each alert type
- Tests email delivery through corporate networks/firewalls

## Prerequisites
- Fiddler URL
- API token (from Credentials tab in Settings)
- Email address(es) for receiving test alerts

## Alert Types Demonstrated
1. **Data Integrity Alerts**
   - Range violations (numeric values outside expected bounds)
   - Null value violations (unexpected missing data)
   - Type violations (data of incorrect type)

2. **Data Drift Alerts**
   - Distribution shift detection via Jensen-Shannon Distance

3. **Traffic Alerts**
   - Low volume detection

4. **Performance Alerts**
   - Model precision degradation

5. **Custom Metric Alerts**
   - Business-relevant metrics (e.g., revenue loss from churned customers)

## Implementation Details
- Creates synthetic customer churn dataset with varied column types
- Sets appropriate thresholds for each alert type
- Configures email notifications for each alert
- Publishes batches of violation events to trigger alerts

## Usage
1. Configure URL, API token and email recipients
2. Run all cells sequentially
3. Verify alerts in Fiddler UI
4. Check email inbox for alert notifications

## Troubleshooting
If emails aren't received, check spam folders or verify with IT that emails from Fiddler's domain aren't being blocked.


## Configuration and Setup


In [1]:
# Fiddler connection details
URL = ''  # Your Fiddler URL (e.g., 'https://your_company_name.fiddler.ai')
TOKEN = ''  # Your API token from the Credentials tab in Settings

# Project and model details
PROJECT_NAME = 'alert_test_project'
MODEL_NAME = 'alert_test_model'
BASELINE_NAME = 'baseline_dataset'

# Alert test settings
RECIPIENT_EMAIL = ['email1','email2']  # Email address to test alert delivery

In [None]:
%pip install fiddler-client>=3.7.0

In [None]:
import numpy as np
import pandas as pd
import time
from datetime import datetime, timedelta
import random
from uuid import uuid4
import fiddler as fdl

# Configure logging
import logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[logging.StreamHandler()],
)

# Initialize connection to Fiddler
fdl.init(url=URL, token=TOKEN)
logging.info(f"Connected to Fiddler. Client version: {fdl.__version__}")

## 3. Create a Synthetic Dataset with Various Column Types

In [None]:
def generate_synthetic_dataset(rows=500, seed=42):
    """Generate a synthetic dataset with various column types for alert testing"""
    np.random.seed(seed)
    random.seed(seed)
    
    # Generate a timestamp column
    end_date = datetime.now()
    start_date = end_date - timedelta(days=30)
    timestamps = [start_date + timedelta(
        seconds=random.randint(0, int((end_date - start_date).total_seconds()))) 
        for _ in range(rows)]
    
    # Convert timestamps to milliseconds
    timestamp_ms = [int(ts.timestamp() * 1000) for ts in timestamps]
    
    # Generate data
    data = {
        # Integer feature
        'age': np.random.randint(18, 80, rows),
        
        # Float feature with 2 decimal places
        'income': np.round(np.random.uniform(20000, 200000, rows), 2),
        
        # Categorical feature with 3 categories
        'location': np.random.choice(['Urban', 'Suburban', 'Rural'], rows),
        
        # Boolean feature
        'has_subscription': np.random.choice([True, False], rows),
        
        # String feature
        'customer_id': [f"CUST-{str(uuid4())[:8]}" for _ in range(rows)],
        
        # Target column (binary)
        'churn': np.random.choice([0, 1], rows, p=[0.8, 0.2]),
        
        # Model prediction (probability)
        'predicted_churn': np.random.beta(2, 5, rows),
        
        # Timestamp column
        'timestamp': timestamp_ms
    }
    
    return pd.DataFrame(data)

# Generate baseline dataset
baseline_df = generate_synthetic_dataset(rows=500)
print(f"Generated baseline dataset with {len(baseline_df)} rows")
baseline_df.head()

## Create a Project, Model, and Upload the Baseline

In [None]:
# Create or get the project
try:
    project = fdl.Project(name=PROJECT_NAME).create()
    print(f'Created new project with id = {project.id} and name = {project.name}')
except fdl.Conflict:
    project = fdl.Project.from_name(name=PROJECT_NAME)
    print(f'Using existing project with id = {project.id} and name = {project.name}')

# Define the model spec
model_spec = fdl.ModelSpec(
    inputs=['age', 'income', 'location', 'has_subscription'],
    outputs=['predicted_churn'],
    targets=['churn'],
    metadata=['customer_id', 'timestamp']
)

# Set model task parameters
model_task = fdl.ModelTask.BINARY_CLASSIFICATION
task_params = fdl.ModelTaskParams(target_class_order=[0, 1])

# Create the model
try:
    model = fdl.Model.from_data(
        name=MODEL_NAME,
        project_id=project.id,
        source=baseline_df,
        spec=model_spec,
        task=model_task,
        task_params=task_params,
        event_id_col='customer_id',
        event_ts_col='timestamp'
    )
    model.create()
    print(f'Created new model with id = {model.id} and name = {model.name}')
except fdl.Conflict:
    model = fdl.Model.from_name(name=MODEL_NAME, project_id=project.id)
    print(f'Using existing model with id = {model.id} and name = {model.name}')

# Upload the baseline dataset
baseline_publish_job = model.publish(
    source=baseline_df,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=BASELINE_NAME
)
print(f'Initiated baseline upload with Job ID = {baseline_publish_job.id}')

# Wait for the baseline upload to complete
baseline_publish_job.wait()
print("Baseline upload completed successfully")

# Get the dataset
dataset = fdl.Dataset.from_name(name=BASELINE_NAME, model_id=model.id)
print(f'Retrieved dataset with id = {dataset.id} and name = {dataset.name}')

baseline = fdl.Baseline.from_name(name=BASELINE_NAME, model_id=model.id)
print(f'Retrieved baseline with id = {baseline.id} and name = {baseline.name}')


## Set Alert Rules
This section covers all the alert rules availble in Fiddler

In [None]:
# Function to create alerts and set notification
def create_alert_with_notification(alert_rule):
    try:
        alert_rule.create()
        print(f'Created alert rule: {alert_rule.name}')
        
        # Set email notification
        alert_rule.set_notification_config(emails=RECIPIENT_EMAIL)
        print(f'Set email notification for {alert_rule.name} to {RECIPIENT_EMAIL}')
    except fdl.Conflict:
        print(f'Alert rule {alert_rule.name} already exists')

# 1. Data Integrity Alert - Range Violation (age)
range_alert = fdl.AlertRule(
    name='Age Range Violation Alert',
    model_id=model.id,
    metric_id='range_violation_count',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=1,
    critical_threshold=3,
    condition=fdl.AlertCondition.GREATER,
    columns=['age']
)
create_alert_with_notification(range_alert)

# 2. Data Integrity Alert - Null Value Violation
null_alert = fdl.AlertRule(
    name='Null Value Violation Alert',
    model_id=model.id,
    metric_id='null_violation_percentage',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=5,
    critical_threshold=10,
    condition=fdl.AlertCondition.GREATER,
    columns=['income']
)
create_alert_with_notification(null_alert)

# 3. Data Integrity Alert - Type Violation
type_alert = fdl.AlertRule(
    name='Location Type Violation Alert',
    model_id=model.id,
    metric_id='type_violation_count',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=1,
    critical_threshold=3,
    condition=fdl.AlertCondition.GREATER,
    columns=['location']
)
create_alert_with_notification(type_alert)

# 4. Data Drift Alert - Jensen-Shannon Distance
jsd_alert = fdl.AlertRule(
    name='Age Data Drift Alert',
    model_id=model.id,
    metric_id='jsd',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.MEDIUM,
    warning_threshold=0.1,
    critical_threshold=0.2,
    condition=fdl.AlertCondition.GREATER,
    columns=['age'],
    baseline_id=baseline.id,
)
create_alert_with_notification(jsd_alert)

# 5. Traffic Alert
traffic_alert = fdl.AlertRule(
    name='Low Traffic Alert',
    model_id=model.id,
    metric_id='traffic',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.MEDIUM,
    warning_threshold=5,
    critical_threshold=2,
    condition=fdl.AlertCondition.LESSER,  # Alert if traffic is low
)
create_alert_with_notification(traffic_alert)

# 6. Performance Alert - Precision
precision_alert = fdl.AlertRule(
    name='Low Precision Alert',
    model_id=model.id,
    metric_id='precision',
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=0.7,
    critical_threshold=0.5,
    condition=fdl.AlertCondition.LESSER,  # Alert if precision is low
)
create_alert_with_notification(precision_alert)

# 7. Custom Metric Alert
# First, create a custom metric
lost_revenue_metric = fdl.CustomMetric(
    name='Lost Revenue',
    model_id=model.id,
    description='Revenue lost for customers who churn',
    definition="sum(if(\"churn\"==1, \"income\", 0))",
)
try:
    lost_revenue_metric.create()
    print(f'Created custom metric: {lost_revenue_metric.name}')
except fdl.Conflict:
    print(f'Custom metric {lost_revenue_metric.name} already exists')
    lost_revenue_metric = fdl.CustomMetric.from_name(
        name='Lost Revenue', 
        model_id=model.id
    )

# Create alert based on custom metric
custom_metric_alert = fdl.AlertRule(
    name='High Lost Revenue Alert',
    model_id=model.id,
    metric_id=lost_revenue_metric.id,
    bin_size=fdl.BinSize.HOUR,
    compare_to=fdl.CompareTo.RAW_VALUE,
    priority=fdl.Priority.HIGH,
    warning_threshold=50000,
    critical_threshold=100000,
    condition=fdl.AlertCondition.GREATER,
)
create_alert_with_notification(custom_metric_alert)

# List all alert rules to verify
alert_rules = list(fdl.AlertRule.list(model_id=model.id))
print(f"\nCreated {len(alert_rules)} alert rules:")
for alert in alert_rules:
    print(f"- {alert.name}")

## Publish Events to trigger alerts
These generated events when published will intentionally trigger all the previosuly configured alert rules

In [None]:
def generate_violation_events():
    """Generate events that will trigger various alerts"""
    now = int(time.time() * 1000)  # Current time in milliseconds
    
    # 1. Range Violation - Create events with ages outside valid range
    range_violations = generate_synthetic_dataset(rows=5, seed=100)
    range_violations['age'] = np.random.choice([5, 10, 95, 100, 120], 5)  # Invalid ages
    range_violations['timestamp'] = [now] * 5
    
    # 2. Null Violation - Create events with null values
    null_violations = generate_synthetic_dataset(rows=10, seed=101)
    null_violations.loc[0:4, 'income'] = None
    null_violations['timestamp'] = [now] * 10
    
    # 3. Type Violation - Create events with wrong types
    type_violations = generate_synthetic_dataset(rows=5, seed=102)
    type_violations['location'] = [1, 2, 3, 4, 5]  # Should be strings, not integers
    type_violations['timestamp'] = [now] * 5
    
    # 4. Data Drift - Create events with significant distribution shift
    data_drift = generate_synthetic_dataset(rows=50, seed=103)
    data_drift['age'] = np.random.randint(60, 100, 50)  # Shift age distribution
    data_drift['timestamp'] = [now] * 50
    
    # 5. Low Traffic - Already covered by having few events
    
    # 6. Performance Alert - Create events with wrong predictions
    performance_issues = generate_synthetic_dataset(rows=20, seed=104)
    performance_issues['churn'] = 1  # All churned
    performance_issues['predicted_churn'] = np.random.beta(1, 10, 20)  # Low predictions
    performance_issues['timestamp'] = [now] * 20
    
    # 7. Custom Metric Alert - High income customers who churned
    high_lost_revenue = generate_synthetic_dataset(rows=15, seed=105)
    high_lost_revenue['churn'] = 1  # All churned
    high_lost_revenue['income'] = np.random.uniform(150000, 200000, 15)  # High income
    high_lost_revenue['timestamp'] = [now] * 15
    
    # Combine all datasets
    all_violations = pd.concat([
        range_violations,
        null_violations,
        type_violations,
        data_drift,
        performance_issues,
        high_lost_revenue
    ])
    
    return all_violations

# Generate events that will trigger alerts
violation_events = generate_violation_events()
print(f"Generated {len(violation_events)} violation events")

# Publish events in batches (to avoid timeouts)
batch_size = 20
for i in range(0, len(violation_events), batch_size):
    batch_df = violation_events.iloc[i:i+batch_size].copy()
    print(f"Publishing batch {i//batch_size + 1} with {len(batch_df)} events...")
    events_job = model.publish(source=batch_df)
    events_job.wait()
    print(f"Batch {i//batch_size + 1} published successfully")
    time.sleep(2)  # Add a small delay between batches

print("\nAll violation events have been published!")
print("Alerts should be triggered and emails sent to:", RECIPIENT_EMAIL)
print("Please check your inbox for alert notifications.")

## Verify Alert Status in UI :

- Navigate to the Fiddler UI and go to your test project
- Click on the 'Alerts' tab to view triggered alerts
- Check your email inbox for alert notifications
    - If emails haven't arrived:
        - check your spam/junk folder
        - Verify with your IT department that emails from Fiddler's domain aren't being blocked