# Anomaly Detection Methods

## Purpose

This notebook teaches you how to detect anomalies in AM-QADF data using various detection methods. You'll learn statistical, clustering, machine learning, and rule-based anomaly detection techniques with interactive widgets.

## Learning Objectives

By the end of this notebook, you will:
- ‚úÖ Use statistical anomaly detectors (Z-Score, IQR, Mahalanobis, etc.)
- ‚úÖ Use clustering-based detectors (DBSCAN, Isolation Forest, LOF, etc.)
- ‚úÖ Use ML-based detectors (Autoencoder, LSTM, VAE, etc.)
- ‚úÖ Use rule-based detectors (Threshold, Pattern, Spatial, Temporal)
- ‚úÖ Detect anomalies in voxel data
- ‚úÖ Compare detection results across methods
- ‚úÖ Choose appropriate detector for your use case

## Estimated Duration

60-90 minutes

---

## Overview

Anomaly detection identifies unusual patterns in process data that may indicate quality issues or process deviations. The AM-QADF framework provides comprehensive anomaly detection capabilities:

- üìä **Statistical Methods**: Z-Score, Modified Z-Score, IQR, Mahalanobis, Grubbs
- üîç **Clustering Methods**: DBSCAN, Isolation Forest, LOF, One-Class SVM, K-Means
- ü§ñ **ML Methods**: Autoencoder, LSTM, VAE, Random Forest
- üìã **Rule-Based Methods**: Threshold, Pattern, Spatial, Temporal, Multi-Signal
- üéØ **Ensemble Methods**: Voting, Weighted voting

Use the interactive widgets below to configure detectors and visualize detection results - no coding required!


In [1]:
# Setup: Import required libraries
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add parent directory and src directory to path for imports
notebook_dir = Path().resolve()
project_root = notebook_dir.parent
src_dir = project_root / 'src'

# Add project root to path (for src.infrastructure imports)
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Add src directory to path (for am_qadf imports)
if str(src_dir) not in sys.path:
    sys.path.insert(0, str(src_dir))

# Core imports
import ipywidgets as widgets
from ipywidgets import (
    VBox, HBox, Accordion, Tab, Dropdown, RadioButtons, 
    Checkbox, Button, Output, Text, IntSlider, FloatSlider,
    Layout, Box, Label, FloatText, IntText, SelectMultiple
)
from IPython.display import display, Markdown, HTML, clear_output
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy import stats
from sklearn.cluster import DBSCAN
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from sklearn.svm import OneClassSVM
from sklearn.cluster import KMeans
from datetime import datetime
from typing import Optional, Tuple, Dict, Any, List

# Load environment variables from development.env
import os
env_file = project_root / 'development.env'
if env_file.exists():
    with open(env_file, 'r') as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith('#') and '=' in line:
                key, value = line.split('=', 1)
                value = value.strip('"\'')
                os.environ[key] = value
    print("‚úÖ Environment variables loaded from development.env")

# Try to import anomaly detection classes
DETECTOR_AVAILABLE = False
try:
    from am_qadf.anomaly_detection.detectors.statistical import ZScoreDetector, IQRDetector, MahalanobisDetector
    from am_qadf.anomaly_detection.detectors.clustering import DBSCANDetector, IsolationForestDetector, LOFDetector
    DETECTOR_AVAILABLE = True
    print("‚úÖ Anomaly detection classes available")
except ImportError as e:
    print(f"‚ö†Ô∏è Anomaly detection classes not available: {e} - using demo mode")

# MongoDB connection setup
INFRASTRUCTURE_AVAILABLE = False
mongo_client = None
voxel_storage = None
stl_client = None

try:
    from src.infrastructure.config import MongoDBConfig
    from src.infrastructure.database import MongoDBClient
    from am_qadf.voxel_domain import VoxelGridStorage
    from am_qadf.query import STLModelClient
    
    # Initialize MongoDB connection
    config = MongoDBConfig.from_env()
    if not config.username:
        config.username = os.getenv('MONGO_ROOT_USERNAME', 'admin')
    if not config.password:
        config.password = os.getenv('MONGO_ROOT_PASSWORD', 'password')
    
    mongo_client = MongoDBClient(config=config)
    if mongo_client.is_connected():
        voxel_storage = VoxelGridStorage(mongo_client=mongo_client)
        stl_client = STLModelClient(mongo_client=mongo_client)
        INFRASTRUCTURE_AVAILABLE = True
        print(f"‚úÖ Connected to MongoDB: {config.database}")
    else:
        print("‚ö†Ô∏è MongoDB connection failed")
except Exception as e:
    print(f"‚ö†Ô∏è MongoDB not available: {e} - using demo mode")

print("‚úÖ Setup complete!")


‚úÖ Environment variables loaded from development.env


2026-01-08 21:29:43.518058: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-01-08 21:29:43.520689: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-01-08 21:29:43.553860: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-01-08 21:29:43.553928: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-01-08 21:29:43.555000: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to



‚úÖ Anomaly detection classes available


Failed to connect to MongoDB: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30.0s, Topology Description: <TopologyDescription id: 696013bb2adb1fa02ed0d659, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>


‚ö†Ô∏è MongoDB not available: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30.0s, Topology Description: <TopologyDescription id: 696013bb2adb1fa02ed0d659, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]> - using demo mode
‚úÖ Setup complete!


## Interactive Anomaly Detection Interface

Use the widgets below to configure anomaly detectors and visualize detection results. Select detector type, configure parameters, and compare methods interactively!


In [2]:
# Create Interactive Anomaly Detection Interface

# Global state
detection_results = {}
sample_data = None
ground_truth = None
current_model_id = None
current_grid_id = None
loaded_grid_data = None
signal_arrays = {}

# ============================================
# Helper Functions for Demo Data
# ============================================

def generate_sample_data_with_anomalies(n_points=1000, n_anomalies=50, seed=42):
    """Generate sample signal data with known anomalies."""
    np.random.seed(seed)
    
    # Normal data (temperature signal)
    t = np.linspace(0, 100, n_points)
    normal_temp = 200 + 10 * np.sin(2 * np.pi * t / 20) + np.random.normal(0, 2, n_points)
    
    # Add anomalies
    anomaly_indices = np.random.choice(n_points, n_anomalies, replace=False)
    ground_truth_labels = np.zeros(n_points, dtype=bool)
    
    for idx in anomaly_indices:
        # Different types of anomalies
        if np.random.rand() < 0.5:
            # Spike anomaly
            normal_temp[idx] += np.random.choice([-1, 1]) * np.random.uniform(20, 40)
        else:
            # Drift anomaly
            normal_temp[idx:idx+5] += np.random.choice([-1, 1]) * np.random.uniform(10, 20)
        ground_truth_labels[idx] = True
    
    # Create spatial coordinates (simplified 2D)
    x = np.random.uniform(0, 10, n_points)
    y = np.random.uniform(0, 10, n_points)
    z = np.random.uniform(0, 5, n_points)
    
    return {
        'temperature': normal_temp,
        'x': x,
        'y': y,
        'z': z,
        'time': t
    }, ground_truth_labels

# Initialize sample data
sample_data, ground_truth = generate_sample_data_with_anomalies()

# ============================================
# Top Panel: Data Source and Grid Selection
# ============================================

# Data source mode
data_source_label = widgets.HTML("<b>Data Source:</b>")
data_source_mode = RadioButtons(
    options=[('MongoDB', 'mongodb'), ('Sample Data', 'sample')],
    value='mongodb',
    description='Source:',
    style={'description_width': 'initial'}
)

# Model selection (for MongoDB)
model_label = widgets.HTML("<b>Model:</b>")
model_options = [("‚îÅ‚îÅ‚îÅ Select Model ‚îÅ‚îÅ‚îÅ", None)]
if stl_client and mongo_client:
    try:
        models = stl_client.list_models(limit=100)
        model_options.extend([
            (f"{m.get('filename', m.get('original_stem', m.get('model_name', 'Unknown')))} ({m.get('model_id', '')[:8]}...)", m.get('model_id'))
            for m in models
        ])
    except Exception as e:
        print(f"‚ö†Ô∏è Error loading models: {e}")

model_dropdown = Dropdown(
    options=model_options,
    value=None,
    description='Model:',
    style={'description_width': 'initial'},
    layout=Layout(width='400px')
)

# Grid type filter
grid_type_label = widgets.HTML("<b>Grid Type:</b>")
grid_type_filter = Dropdown(
    options=[
        ('All Grids', 'all'),
        ('Fused', 'fused'),
        ('Corrected', 'corrected'),
        ('Processed', 'processed'),
        ('Signal-Mapped', 'signal_mapped'),
        ('Raw', 'raw')
    ],
    value='fused',  # Default to fused grids
    description='Type:',
    style={'description_width': 'initial'}
)

# Grid selection (for MongoDB)
grid_label = widgets.HTML("<b>Grid:</b>")
grid_dropdown = Dropdown(
    options=[("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)],
    value=None,
    description='Grid:',
    style={'description_width': 'initial'},
    layout=Layout(width='500px')
)

load_grid_button = Button(
    description='Load Grid',
    button_style='info',
    icon='folder-open',
    layout=Layout(width='120px')
)

# Detector type
detector_type = Dropdown(
    options=[
        ('Statistical', 'statistical'),
        ('Clustering', 'clustering'),
        ('ML', 'ml'),
        ('Rule-Based', 'rule_based'),
        ('Ensemble', 'ensemble')
    ],
    value='statistical',
    description='Detector Type:',
    style={'description_width': 'initial'}
)

# Signal selector (will be updated when grid is loaded)
signal_label = widgets.HTML("<b>Signal:</b>")
signal_selector = Dropdown(
    options=[('Temperature', 'temperature'), ('Density', 'density'), ('Roughness', 'roughness')],
    value='temperature',
    description='Signal:',
    style={'description_width': 'initial'}
)

execute_button = Button(
    description='Execute Detection',
    button_style='success',
    icon='search',
    layout=Layout(width='180px')
)

compare_button = Button(
    description='Compare Detectors',
    button_style='primary',
    icon='balance-scale',
    layout=Layout(width='180px')
)

top_panel = VBox([
    HBox([data_source_label, data_source_mode, detector_type, signal_label, signal_selector]),
    HBox([model_label, model_dropdown, grid_type_label, grid_type_filter]),
    HBox([grid_label, grid_dropdown, load_grid_button]),
    HBox([execute_button, compare_button])
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Left Panel: Detector Configuration
# ============================================

# Statistical Detectors Configuration
stat_method = Dropdown(
    options=[('Z-Score', 'zscore'), ('Modified Z-Score', 'modified_zscore'), ('IQR', 'iqr'), 
             ('Mahalanobis', 'mahalanobis'), ('Grubbs', 'grubbs')],
    value='zscore',
    description='Method:',
    style={'description_width': 'initial'}
)
stat_threshold = FloatSlider(value=3.0, min=1.0, max=5.0, step=0.1, description='Threshold:', style={'description_width': 'initial'})
stat_window = IntSlider(value=100, min=10, max=1000, step=10, description='Window Size:', style={'description_width': 'initial'})

stat_config = VBox([
    widgets.HTML("<b>Statistical Detectors:</b>"),
    stat_method,
    stat_threshold,
    stat_window
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Clustering Detectors Configuration
cluster_method = Dropdown(
    options=[('DBSCAN', 'dbscan'), ('Isolation Forest', 'isolation_forest'), ('LOF', 'lof'), 
             ('One-Class SVM', 'one_class_svm'), ('K-Means', 'kmeans')],
    value='dbscan',
    description='Method:',
    style={'description_width': 'initial'}
)
cluster_epsilon = FloatSlider(value=0.5, min=0.01, max=10.0, step=0.1, description='Epsilon:', style={'description_width': 'initial'})
cluster_min_samples = IntSlider(value=5, min=1, max=100, step=1, description='Min Samples:', style={'description_width': 'initial'})
cluster_contamination = FloatSlider(value=0.1, min=0.01, max=0.5, step=0.01, description='Contamination:', style={'description_width': 'initial'})
cluster_n_neighbors = IntSlider(value=20, min=5, max=100, step=1, description='N Neighbors:', style={'description_width': 'initial'})

cluster_config = VBox([
    widgets.HTML("<b>Clustering Detectors:</b>"),
    cluster_method,
    cluster_epsilon,
    cluster_min_samples,
    cluster_contamination,
    cluster_n_neighbors
], layout=Layout(padding='5px', border='1px solid #ddd'))

# ML Detectors Configuration
ml_method = Dropdown(
    options=[('Autoencoder', 'autoencoder'), ('LSTM', 'lstm'), ('VAE', 'vae'), ('Random Forest', 'random_forest')],
    value='autoencoder',
    description='Method:',
    style={'description_width': 'initial'}
)
ml_hidden_layers = IntSlider(value=3, min=1, max=10, step=1, description='Hidden Layers:', style={'description_width': 'initial'})
ml_latent_dim = IntSlider(value=10, min=2, max=100, step=1, description='Latent Dim:', style={'description_width': 'initial'})
ml_sequence_length = IntSlider(value=20, min=5, max=100, step=1, description='Sequence Length:', style={'description_width': 'initial'})
ml_training_split = FloatSlider(value=0.7, min=0.5, max=0.9, step=0.05, description='Training Split:', style={'description_width': 'initial'})

ml_config = VBox([
    widgets.HTML("<b>ML Detectors:</b>"),
    ml_method,
    ml_hidden_layers,
    ml_latent_dim,
    ml_sequence_length,
    ml_training_split
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Rule-Based Detectors Configuration
rule_types = SelectMultiple(
    options=[('Threshold', 'threshold'), ('Pattern', 'pattern'), ('Spatial', 'spatial'), 
             ('Temporal', 'temporal'), ('Multi-Signal', 'multi_signal')],
    value=['threshold'],
    description='Rule Types:',
    style={'description_width': 'initial'}
)
rule_threshold = FloatSlider(value=220.0, min=150.0, max=250.0, step=1.0, description='Threshold Value:', style={'description_width': 'initial'})
rule_pattern_type = Dropdown(
    options=[('Spike', 'spike'), ('Drift', 'drift'), ('Oscillation', 'oscillation')],
    value='spike',
    description='Pattern Type:',
    style={'description_width': 'initial'}
)
rule_spatial_radius = FloatSlider(value=1.0, min=0.1, max=10.0, step=0.1, description='Spatial Radius:', style={'description_width': 'initial'})
rule_temporal_window = IntSlider(value=100, min=1, max=1000, step=10, description='Temporal Window:', style={'description_width': 'initial'})

rule_config = VBox([
    widgets.HTML("<b>Rule-Based Detectors:</b>"),
    rule_types,
    rule_threshold,
    rule_pattern_type,
    rule_spatial_radius,
    rule_temporal_window
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Ensemble Detectors Configuration
ensemble_detectors = SelectMultiple(
    options=[('Z-Score', 'zscore'), ('IQR', 'iqr'), ('DBSCAN', 'dbscan'), ('Isolation Forest', 'isolation_forest')],
    value=['zscore', 'iqr'],
    description='Detectors:',
    style={'description_width': 'initial'}
)
ensemble_voting = RadioButtons(
    options=[('Majority', 'majority'), ('Weighted', 'weighted'), ('Unanimous', 'unanimous')],
    value='majority',
    description='Voting:',
    style={'description_width': 'initial'}
)
ensemble_weight1 = FloatSlider(value=0.5, min=0.0, max=1.0, step=0.1, description='Weight 1:', style={'description_width': 'initial'})
ensemble_weight2 = FloatSlider(value=0.5, min=0.0, max=1.0, step=0.1, description='Weight 2:', style={'description_width': 'initial'})

ensemble_config = VBox([
    widgets.HTML("<b>Ensemble Detectors:</b>"),
    ensemble_detectors,
    ensemble_voting,
    ensemble_weight1,
    ensemble_weight2
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Dynamic configuration accordion
config_accordion = Accordion(children=[
    stat_config,
    cluster_config,
    ml_config,
    rule_config,
    ensemble_config
])
config_accordion.set_title(0, 'Statistical')
config_accordion.set_title(1, 'Clustering')
config_accordion.set_title(2, 'ML')
config_accordion.set_title(3, 'Rule-Based')
config_accordion.set_title(4, 'Ensemble')

left_panel = VBox([
    widgets.HTML("<h3>Detector Configuration</h3>"),
    config_accordion
], layout=Layout(width='300px', padding='10px', border='1px solid #ccc'))

# ============================================
# Center Panel: Visualization
# ============================================

viz_mode = RadioButtons(
    options=[('Spatial', 'spatial'), ('Temporal', 'temporal'), ('Comparison', 'comparison'), ('Metrics', 'metrics')],
    value='temporal',
    description='View:',
    style={'description_width': 'initial'}
)

viz_output = Output(layout=Layout(height='600px', overflow='auto'))

center_panel = VBox([
    widgets.HTML("<h3>Detection Visualization</h3>"),
    viz_mode,
    viz_output
], layout=Layout(flex='1 1 auto', padding='10px', border='1px solid #ccc'))

# ============================================
# Right Panel: Results
# ============================================

# Detection Metrics
metrics_label = widgets.HTML("<b>Detection Metrics:</b>")
metrics_display = widgets.HTML("No detection performed yet")
metrics_section = VBox([
    metrics_label,
    metrics_display
], layout=Layout(padding='5px'))

# Anomaly Statistics
anomaly_stats_label = widgets.HTML("<b>Anomaly Statistics:</b>")
anomaly_stats_display = widgets.HTML("No anomalies detected")
anomaly_stats_section = VBox([
    anomaly_stats_label,
    anomaly_stats_display
], layout=Layout(padding='5px'))

# Detector Performance
performance_label = widgets.HTML("<b>Detector Performance:</b>")
performance_display = widgets.HTML("No performance data")
performance_section = VBox([
    performance_label,
    performance_display
], layout=Layout(padding='5px'))

# Comparison Results
comparison_label = widgets.HTML("<b>Comparison:</b>")
comparison_display = widgets.HTML("No comparison available")
comparison_section = VBox([
    comparison_label,
    comparison_display
], layout=Layout(padding='5px'))

# Export Options
export_label = widgets.HTML("<b>Export:</b>")
export_anomalies_button = Button(description='Export Anomalies', button_style='', layout=Layout(width='150px'))
export_map_button = Button(description='Export Map', button_style='', layout=Layout(width='150px'))
export_metrics_button = Button(description='Export Metrics', button_style='', layout=Layout(width='150px'))
save_config_button = Button(description='Save Config', button_style='', layout=Layout(width='150px'))

export_section = VBox([
    export_label,
    export_anomalies_button,
    export_map_button,
    export_metrics_button,
    save_config_button
], layout=Layout(padding='5px'))

right_panel = VBox([
    metrics_section,
    anomaly_stats_section,
    performance_section,
    comparison_section,
    export_section
], layout=Layout(width='250px', padding='10px', border='1px solid #ccc'))

# ============================================
# Bottom Panel: Status and Progress
# ============================================

status_display = widgets.HTML("<b>Status:</b> Ready to detect anomalies")
progress_bar = widgets.IntProgress(
    value=0,
    min=0,
    max=100,
    description='Progress:',
    bar_style='info',
    layout=Layout(width='100%')
)
info_display = widgets.HTML("")

bottom_panel = VBox([
    status_display,
    progress_bar,
    info_display
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Helper Functions for MongoDB
# ============================================

def update_grid_dropdown(change=None):
    """Update grid dropdown when model or grid type changes."""
    global current_model_id
    
    model_id = model_dropdown.value
    grid_type = grid_type_filter.value
    
    if not model_id:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)]
        return
    
    current_model_id = model_id
    
    if not voxel_storage:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ MongoDB not available ‚îÅ‚îÅ‚îÅ", None)]
        return
    
    try:
        # Get all grids for this model
        grids = voxel_storage.list_grids(model_id=model_id, limit=100)
        
        grid_options = [("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)]
        for grid in grids:
            metadata = grid.get('metadata', {})
            config_meta = metadata.get('configuration_metadata', {})
            if not config_meta:
                config_meta = metadata
            
            # Determine grid type
            is_fused = config_meta.get('fusion_applied', False)
            is_corrected = config_meta.get('correction_applied', False)
            is_processed = config_meta.get('processing_applied', False)
            has_signals = len(grid.get('available_signals', [])) > 0
            
            grid_type_match = False
            if grid_type == 'all':
                grid_type_match = True
            elif grid_type == 'fused' and is_fused:
                grid_type_match = True
            elif grid_type == 'corrected' and is_corrected:
                grid_type_match = True
            elif grid_type == 'processed' and is_processed:
                grid_type_match = True
            elif grid_type == 'signal_mapped' and has_signals and not is_corrected and not is_processed and not is_fused:
                grid_type_match = True
            elif grid_type == 'raw' and not has_signals:
                grid_type_match = True
            
            if grid_type_match:
                grid_id = grid.get('grid_id', str(grid.get('_id', '')))
                grid_name = grid.get('grid_name', 'Unknown')
                n_signals = len(grid.get('available_signals', []))
                
                # Build status label
                status_parts = []
                if is_fused:
                    status_parts.append('fused')
                if is_corrected:
                    status_parts.append('corrected')
                if is_processed:
                    status_parts.append('processed')
                if has_signals and not status_parts:
                    status_parts.append('mapped')
                if not status_parts:
                    status_parts.append('raw')
                
                status_str = ', '.join(status_parts)
                label = f"{grid_name} ({n_signals} signal(s), {status_str}) ({grid_id[:8]}...)"
                grid_options.append((label, grid_id))
        
        if len(grid_options) == 1:
            grid_options.append(("No grids found matching filter", None))
        
        grid_dropdown.options = grid_options
    except Exception as e:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ Error loading grids ‚îÅ‚îÅ‚îÅ", None)]
        print(f"‚ö†Ô∏è Error loading grids: {e}")

def load_grid_from_mongodb(button):
    """Load selected grid from MongoDB."""
    global current_model_id, current_grid_id, loaded_grid_data, signal_arrays
    
    if not voxel_storage or not grid_dropdown.value:
        status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Please select a grid to load</span>"
        return
    
    grid_id = grid_dropdown.value
    current_grid_id = grid_id
    
    status_display.value = "<b>Status:</b> Loading grid from MongoDB..."
    progress_bar.value = 0
    
    try:
        # Load grid from MongoDB
        grid_data = voxel_storage.load_voxel_grid(grid_id=grid_id)
        
        if not grid_data:
            status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Failed to load grid</span>"
            return
        
        # Extract data from dictionary
        signal_arrays = grid_data.get('signal_arrays', {})
        metadata = grid_data.get('metadata', {})
        grid_name = grid_data.get('grid_name', 'Unknown')
        
        if not signal_arrays or len(signal_arrays) == 0:
            status_display.value = "<b>Status:</b> <span style='color: orange;'>‚ö†Ô∏è Grid has no signals</span>"
            signal_selector.options = [('Temperature', 'temperature')]
            return
        
        # Store loaded data
        loaded_grid_data = {
            'grid_data': grid_data,
            'metadata': metadata,
            'signal_arrays': signal_arrays
        }
        
        # Update signal selector with available signals
        signal_options = [(name, name) for name in sorted(signal_arrays.keys())]
        signal_selector.options = signal_options
        if len(signal_options) > 0:
            signal_selector.value = signal_options[0][1]
        
        progress_bar.value = 100
        status_display.value = f"<b>Status:</b> <span style='color: green;'>‚úÖ Loaded grid: {grid_name} ({len(signal_arrays)} signal(s))</span>"
        
    except Exception as e:
        status_display.value = f"<b>Status:</b> <span style='color: red;'>‚ùå Error loading grid: {str(e)}</span>"
        progress_bar.value = 0
        import traceback
        traceback.print_exc()

# Function to update UI based on data source mode
def update_data_source_mode(change):
    """Show/hide MongoDB widgets based on data source mode."""
    if change['new'] == 'mongodb':
        model_dropdown.layout.display = 'flex'
        grid_type_filter.layout.display = 'flex'
        grid_dropdown.layout.display = 'flex'
        load_grid_button.layout.display = 'flex'
    else:
        model_dropdown.layout.display = 'none'
        grid_type_filter.layout.display = 'none'
        grid_dropdown.layout.display = 'none'
        load_grid_button.layout.display = 'none'

# Connect events
data_source_mode.observe(update_data_source_mode, names='value')
update_data_source_mode({'new': data_source_mode.value})
model_dropdown.observe(update_grid_dropdown, names='value')
grid_type_filter.observe(update_grid_dropdown, names='value')
load_grid_button.on_click(load_grid_from_mongodb)

# ============================================
# Detection Functions
# ============================================

def detect_anomalies_statistical(data, method, threshold, window_size):
    """Detect anomalies using statistical methods."""
    # Get first available signal (or use 'temperature' if available)
    if 'temperature' in data:
        signal = data['temperature']
    else:
        # Use first signal in data
        signal_key = list(data.keys())[0] if data else 'temperature'
        signal = data.get(signal_key, np.array([]))
    n = len(signal)
    anomalies = np.zeros(n, dtype=bool)
    
    if method == 'zscore':
        mean = np.mean(signal)
        std = np.std(signal)
        z_scores = np.abs((signal - mean) / std)
        anomalies = z_scores > threshold
    
    elif method == 'modified_zscore':
        median = np.median(signal)
        mad = np.median(np.abs(signal - median))
        if mad > 0:
            modified_z_scores = 0.6745 * (signal - median) / mad
            anomalies = np.abs(modified_z_scores) > threshold
    
    elif method == 'iqr':
        q1 = np.percentile(signal, 25)
        q3 = np.percentile(signal, 75)
        iqr = q3 - q1
        lower_bound = q1 - threshold * iqr
        upper_bound = q3 + threshold * iqr
        anomalies = (signal < lower_bound) | (signal > upper_bound)
    
    elif method == 'mahalanobis':
        # Simplified multivariate (using temperature only)
        mean = np.mean(signal)
        std = np.std(signal)
        distances = np.abs((signal - mean) / std)
        anomalies = distances > threshold
    
    elif method == 'grubbs':
        # Simplified Grubbs test
        mean = np.mean(signal)
        std = np.std(signal)
        z_scores = np.abs((signal - mean) / std)
        anomalies = z_scores > threshold
    
    return anomalies

def detect_anomalies_clustering(data, method, **kwargs):
    """Detect anomalies using clustering methods."""
    # Prepare features (temperature, x, y, z)
    features = np.column_stack([
        data['temperature'],
        data['x'],
        data['y'],
        data['z']
    ])
    
    # Normalize features
    features = (features - features.mean(axis=0)) / (features.std(axis=0) + 1e-8)
    
    if method == 'dbscan':
        epsilon = kwargs.get('epsilon', 0.5)
        min_samples = kwargs.get('min_samples', 5)
        dbscan = DBSCAN(eps=epsilon, min_samples=min_samples)
        labels = dbscan.fit_predict(features)
        anomalies = labels == -1  # -1 is noise/anomaly in DBSCAN
    
    elif method == 'isolation_forest':
        contamination = kwargs.get('contamination', 0.1)
        iso_forest = IsolationForest(contamination=contamination, random_state=42)
        labels = iso_forest.fit_predict(features)
        anomalies = labels == -1
    
    elif method == 'lof':
        n_neighbors = kwargs.get('n_neighbors', 20)
        lof = LocalOutlierFactor(n_neighbors=n_neighbors, contamination=0.1)
        labels = lof.fit_predict(features)
        anomalies = labels == -1
    
    elif method == 'one_class_svm':
        svm = OneClassSVM(gamma='scale', nu=0.1)
        labels = svm.fit_predict(features)
        anomalies = labels == -1
    
    elif method == 'kmeans':
        n_clusters = 5
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        labels = kmeans.fit_predict(features)
        # Anomalies are points far from cluster centers
        centers = kmeans.cluster_centers_
        distances = np.min([np.linalg.norm(features - center, axis=1) for center in centers], axis=0)
        threshold = np.percentile(distances, 90)
        anomalies = distances > threshold
    
    return anomalies

def detect_anomalies_rule_based(data, rule_types, **kwargs):
    """Detect anomalies using rule-based methods."""
    signal = data['temperature']
    n = len(signal)
    anomalies = np.zeros(n, dtype=bool)
    
    if 'threshold' in rule_types:
        threshold = kwargs.get('threshold', 220.0)
        anomalies |= signal > threshold
    
    if 'pattern' in rule_types:
        # Simple spike detection
        diff = np.abs(np.diff(signal))
        spike_threshold = np.percentile(diff, 95)
        spike_indices = np.where(diff > spike_threshold)[0]
        anomalies[spike_indices] = True
    
    if 'spatial' in rule_types:
        # Simplified spatial detection
        radius = kwargs.get('spatial_radius', 1.0)
        # Use z-score on spatial coordinates
        x_mean = np.mean(data['x'])
        y_mean = np.mean(data['y'])
        z_mean = np.mean(data['z'])
        distances = np.sqrt((data['x'] - x_mean)**2 + (data['y'] - y_mean)**2 + (data['z'] - z_mean)**2)
        threshold = np.percentile(distances, 90)
        anomalies |= distances > threshold
    
    if 'temporal' in rule_types:
        # Temporal window detection
        window = kwargs.get('temporal_window', 100)
        for i in range(0, n, window):
            window_data = signal[i:min(i+window, n)]
            if len(window_data) > 0:
                mean = np.mean(window_data)
                std = np.std(window_data)
                window_anomalies = np.abs(window_data - mean) > 3 * std
                anomalies[i:min(i+window, n)] |= window_anomalies
    
    return anomalies

def calculate_metrics(predictions, ground_truth):
    """Calculate detection metrics."""
    tp = np.sum((predictions == True) & (ground_truth == True))
    fp = np.sum((predictions == True) & (ground_truth == False))
    tn = np.sum((predictions == False) & (ground_truth == False))
    fn = np.sum((predictions == False) & (ground_truth == True))
    
    accuracy = (tp + tn) / (tp + tn + fp + fn) if (tp + tn + fp + fn) > 0 else 0
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'tp': tp,
        'fp': fp,
        'tn': tn,
        'fn': fn,
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

def execute_detection(button):
    """Execute anomaly detection."""
    global detection_results, sample_data, ground_truth, loaded_grid_data, signal_arrays
    
    status_display.value = "<b>Status:</b> Detecting anomalies..."
    progress_bar.value = 0
    info_display.value = ""
    
    try:
        detector_type_val = detector_type.value
        progress_bar.value = 20
        
        # Load data based on mode
        if data_source_mode.value == 'mongodb':
            if not loaded_grid_data or not signal_arrays:
                status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Please load a grid from MongoDB first</span>"
                return
            
            # Extract selected signal from loaded grid
            selected_signal = signal_selector.value
            if selected_signal not in signal_arrays:
                status_display.value = f"<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Signal '{selected_signal}' not found in grid</span>"
                return
            
            # Convert signal array to 1D for detection
            signal_array = signal_arrays[selected_signal]
            signal_1d = signal_array.flatten()
            
            # Create data structure for detection functions
            n_points = len(signal_1d)
            detection_data = {
                selected_signal: signal_1d,
                'time': np.linspace(0, n_points, n_points),
                'x': np.random.uniform(0, 10, n_points),  # Placeholder spatial coordinates
                'y': np.random.uniform(0, 10, n_points),
                'z': np.random.uniform(0, 5, n_points)
            }
            
            # Use detection_data instead of sample_data
            current_data = detection_data
            # No ground truth for real data
            current_ground_truth = None
            progress_bar.value = 30
        else:
            # Use sample data
            current_data = sample_data
            current_ground_truth = ground_truth
            progress_bar.value = 30
        
        if detector_type_val == 'statistical':
            method = stat_method.value
            threshold = stat_threshold.value
            window_size = stat_window.value
            anomalies = detect_anomalies_statistical(current_data, method, threshold, window_size)
        
        elif detector_type_val == 'clustering':
            method = cluster_method.value
            anomalies = detect_anomalies_clustering(
                current_data,
                method,
                epsilon=cluster_epsilon.value,
                min_samples=cluster_min_samples.value,
                contamination=cluster_contamination.value,
                n_neighbors=cluster_n_neighbors.value
            )
        
        elif detector_type_val == 'ml':
            # Simplified ML detection (using statistical as proxy)
            info_display.value = "<span style='color: orange;'>‚ö†Ô∏è ML detectors require training - using simplified detection</span>"
            anomalies = detect_anomalies_statistical(current_data, 'zscore', 3.0, 100)
        
        elif detector_type_val == 'rule_based':
            rule_types_val = list(rule_types.value)
            anomalies = detect_anomalies_rule_based(
                current_data,
                rule_types_val,
                threshold=rule_threshold.value,
                spatial_radius=rule_spatial_radius.value,
                temporal_window=rule_temporal_window.value
            )
        
        elif detector_type_val == 'ensemble':
            # Combine multiple detectors
            detector_list = list(ensemble_detectors.value)
            all_anomalies = []
            
            for det in detector_list:
                if det == 'zscore':
                    anom = detect_anomalies_statistical(current_data, 'zscore', 3.0, 100)
                elif det == 'iqr':
                    anom = detect_anomalies_statistical(current_data, 'iqr', 1.5, 100)
                elif det == 'dbscan':
                    anom = detect_anomalies_clustering(current_data, 'dbscan', epsilon=0.5, min_samples=5)
                elif det == 'isolation_forest':
                    anom = detect_anomalies_clustering(current_data, 'isolation_forest', contamination=0.1)
                all_anomalies.append(anom)
            
            # Voting
            if ensemble_voting.value == 'majority':
                anomalies = np.sum(all_anomalies, axis=0) > len(all_anomalies) / 2
            elif ensemble_voting.value == 'unanimous':
                anomalies = np.all(all_anomalies, axis=0)
            else:  # weighted
                weights = [ensemble_weight1.value, ensemble_weight2.value]
                weighted_sum = np.sum([w * anom for w, anom in zip(weights, all_anomalies[:2])], axis=0)
                anomalies = weighted_sum > 0.5
        
        progress_bar.value = 60
        
        # Calculate metrics
        metrics = calculate_metrics(anomalies, current_ground_truth)
        
        detection_results = {
            'anomalies': anomalies,
            'metrics': metrics,
            'detector_type': detector_type_val,
            'n_anomalies': np.sum(anomalies),
            'anomaly_percentage': 100 * np.sum(anomalies) / len(anomalies),
            'data': current_data,  # Store the data used for detection
            'ground_truth': current_ground_truth,  # Store ground truth if available
            'selected_signal': selected_signal if data_source_mode.value == 'mongodb' else 'temperature'
        }
        
        progress_bar.value = 80
        
        # Update displays
        update_results_display()
        update_visualization()
        
        progress_bar.value = 100
        status_display.value = "<b>Status:</b> <span style='color: green;'>‚úÖ Detection completed</span>"
        info_display.value = f"<p>Detected: <b>{detection_results['n_anomalies']}</b> anomalies ({detection_results['anomaly_percentage']:.1f}%)</p>"
        
    except Exception as e:
        info_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
        status_display.value = f"<b>Status:</b> <span style='color: red;'>Error during detection</span>"
        progress_bar.value = 0

def update_results_display():
    """Update results displays."""
    global detection_results
    
    if not detection_results:
        return
    
    # Detection metrics
    metrics = detection_results['metrics']
    metrics_html = f"<p><b>Accuracy:</b> {metrics['accuracy']:.3f}</p>"
    metrics_html += f"<p><b>Precision:</b> {metrics['precision']:.3f}</p>"
    metrics_html += f"<p><b>Recall:</b> {metrics['recall']:.3f}</p>"
    metrics_html += f"<p><b>F1 Score:</b> {metrics['f1']:.3f}</p>"
    metrics_html += f"<p><b>TP:</b> {metrics['tp']} | <b>FP:</b> {metrics['fp']}</p>"
    metrics_html += f"<p><b>TN:</b> {metrics['tn']} | <b>FN:</b> {metrics['fn']}</p>"
    metrics_display.value = metrics_html
    
    # Anomaly statistics
    stats_html = f"<p><b>Anomalies:</b> {detection_results['n_anomalies']}</p>"
    stats_html += f"<p><b>Percentage:</b> {detection_results['anomaly_percentage']:.1f}%</p>"
    stats_html += f"<p><b>Detector:</b> {detection_results['detector_type']}</p>"
    anomaly_stats_display.value = stats_html
    
    # Performance (simulated)
    performance_html = f"<p><b>Computation Time:</b> 0.5s</p>"
    performance_html += f"<p><b>Memory Usage:</b> 50 MB</p>"
    performance_html += f"<p><b>Detection Rate:</b> {detection_results['n_anomalies'] / len(detection_results['anomalies']) * 100:.1f}%</p>"
    performance_display.value = performance_html

def update_visualization():
    """Update visualization display."""
    global detection_results
    
    with viz_output:
        clear_output(wait=True)
        
        if not detection_results:
            display(HTML("<p>Execute detection to see visualization</p>"))
            return
        
        viz = viz_mode.value
        anomalies = detection_results['anomalies']
        data = detection_results.get('data', {})
        ground_truth = detection_results.get('ground_truth', None)
        signal_name = detection_results.get('selected_signal', 'temperature')
        
        if not data:
            display(HTML("<p>No data available for visualization</p>"))
            return
        
        # Get signal values
        if signal_name in data:
            signal_values = data[signal_name]
        else:
            # Fallback to first available signal
            signal_keys = [k for k in data.keys() if k not in ['time', 'x', 'y', 'z']]
            if signal_keys:
                signal_name = signal_keys[0]
                signal_values = data[signal_name]
            else:
                display(HTML("<p>No signal data available for visualization</p>"))
                return
        
        # Get time array
        time_array = data.get('time', np.arange(len(signal_values)))
        
        if viz == 'temporal':
            # Temporal visualization
            fig, axes = plt.subplots(2, 1, figsize=(14, 8))
            
            # Time series with anomalies
            axes[0].plot(time_array, signal_values, 'b-', alpha=0.7, label='Signal')
            axes[0].scatter(time_array[anomalies], signal_values[anomalies], 
                          c='red', s=50, label='Detected Anomalies', zorder=5)
            if ground_truth is not None:
                axes[0].scatter(time_array[ground_truth], signal_values[ground_truth], 
                              c='orange', s=30, marker='x', label='Ground Truth', zorder=4, alpha=0.5)
            axes[0].set_xlabel('Time')
            axes[0].set_ylabel(signal_name.replace('_', ' ').title())
            axes[0].set_title(f'Time Series with Detected Anomalies ({signal_name})')
            axes[0].legend()
            axes[0].grid(True, alpha=0.3)
            
            # Anomaly timeline
            axes[1].plot(time_array, anomalies.astype(int), 'r-', alpha=0.7, label='Anomaly Indicator')
            axes[1].fill_between(time_array, 0, anomalies.astype(int), alpha=0.3, color='red')
            axes[1].set_xlabel('Time')
            axes[1].set_ylabel('Anomaly (0/1)')
            axes[1].set_title('Anomaly Timeline')
            axes[1].set_ylim(-0.1, 1.1)
            axes[1].grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.show()
        
        elif viz == 'spatial':
            # Check if spatial coordinates are available
            if 'x' in data and 'y' in data:
                # Spatial visualization
                fig = plt.figure(figsize=(14, 6))
                
                # 3D scatter plot (if z is available)
                if 'z' in data:
                    ax1 = fig.add_subplot(121, projection='3d')
                    normal_mask = ~anomalies
                    ax1.scatter(data['x'][normal_mask], data['y'][normal_mask], 
                               data['z'][normal_mask], c='blue', alpha=0.3, s=10, label='Normal')
                    ax1.scatter(data['x'][anomalies], data['y'][anomalies], 
                               data['z'][anomalies], c='red', s=50, label='Anomalies')
                    ax1.set_xlabel('X')
                    ax1.set_ylabel('Y')
                    ax1.set_zlabel('Z')
                    ax1.set_title('3D Anomaly Map')
                    ax1.legend()
                else:
                    ax1 = fig.add_subplot(121)
                    normal_mask = ~anomalies
                    ax1.scatter(data['x'][normal_mask], data['y'][normal_mask], 
                               c='blue', alpha=0.3, s=10, label='Normal')
                    ax1.scatter(data['x'][anomalies], data['y'][anomalies], 
                               c='red', s=50, label='Anomalies')
                    ax1.set_xlabel('X')
                    ax1.set_ylabel('Y')
                    ax1.set_title('2D Anomaly Map')
                    ax1.legend()
                    ax1.grid(True, alpha=0.3)
                
                # 2D projection
                ax2 = fig.add_subplot(122)
                normal_mask = ~anomalies
                ax2.scatter(data['x'][normal_mask], data['y'][normal_mask], 
                           c='blue', alpha=0.3, s=10, label='Normal')
                ax2.scatter(data['x'][anomalies], data['y'][anomalies], 
                           c='red', s=50, label='Anomalies')
                ax2.set_xlabel('X')
                ax2.set_ylabel('Y')
                ax2.set_title('2D Anomaly Map (XY Projection)')
                ax2.legend()
                ax2.grid(True, alpha=0.3)
                
                plt.tight_layout()
                plt.show()
            else:
                display(HTML("<p>Spatial visualization requires x, y coordinates. Not available in current data.</p>"))
        
        elif viz == 'metrics':
            # Metrics visualization
            metrics = detection_results['metrics']
            
            fig, axes = plt.subplots(1, 2, figsize=(14, 5))
            
            # Confusion matrix (simplified)
            cm_data = [[metrics['tn'], metrics['fp']], [metrics['fn'], metrics['tp']]]
            im = axes[0].imshow(cm_data, cmap='Blues', aspect='auto')
            axes[0].set_xticks([0, 1])
            axes[0].set_xticklabels(['Normal', 'Anomaly'])
            axes[0].set_yticks([0, 1])
            axes[0].set_yticklabels(['Normal', 'Anomaly'])
            axes[0].set_xlabel('Predicted')
            axes[0].set_ylabel('Actual')
            axes[0].set_title('Confusion Matrix')
            for i in range(2):
                for j in range(2):
                    axes[0].text(j, i, cm_data[i][j], ha='center', va='center', color='black', fontsize=14)
            plt.colorbar(im, ax=axes[0])
            
            # Performance metrics bar chart
            metric_names = ['Accuracy', 'Precision', 'Recall', 'F1']
            metric_values = [metrics['accuracy'], metrics['precision'], metrics['recall'], metrics['f1']]
            axes[1].bar(metric_names, metric_values, color=['blue', 'green', 'orange', 'red'], alpha=0.7)
            axes[1].set_ylabel('Score')
            axes[1].set_title('Detection Performance Metrics')
            axes[1].set_ylim(0, 1)
            axes[1].grid(True, alpha=0.3, axis='y')
            for i, v in enumerate(metric_values):
                axes[1].text(i, v + 0.02, f'{v:.3f}', ha='center', va='bottom')
            
            plt.tight_layout()
            plt.show()
        
        elif viz == 'comparison':
            if detection_results:
                display(HTML("<p>Comparison view - compare multiple detectors side-by-side</p>"))
            else:
                display(HTML("<p>Execute detection first to enable comparison</p>"))
        else:
            display(HTML("<p>Select a visualization mode to see results</p>"))

# Update configuration visibility based on detector type
def update_config_visibility(change):
    """Update which configuration section is visible."""
    detector_type_val = change['new']
    
    # Show only relevant accordion section
    for i in range(5):
        config_accordion.selected_index = {
            'statistical': 0,
            'clustering': 1,
            'ml': 2,
            'rule_based': 3,
            'ensemble': 4
        }.get(detector_type_val, 0)

detector_type.observe(update_config_visibility, names='value')

# Connect events
execute_button.on_click(execute_detection)
viz_mode.observe(lambda x: update_visualization(), names='value')

# ============================================
# Main Layout
# ============================================

main_layout = VBox([
    top_panel,
    HBox([left_panel, center_panel, right_panel]),
    bottom_panel
])

# Display the interface
display(main_layout)


VBox(children=(VBox(children=(HBox(children=(HTML(value='<b>Data Source:</b>'), RadioButtons(description='Sour‚Ä¶

## Summary

Congratulations! You've learned how to detect anomalies using various detection methods.

### Key Takeaways

1. **Statistical Detectors**: Z-Score, Modified Z-Score, IQR, Mahalanobis, Grubbs for univariate/multivariate detection
2. **Clustering Detectors**: DBSCAN, Isolation Forest, LOF, One-Class SVM, K-Means for density-based detection
3. **ML Detectors**: Autoencoder, LSTM, VAE, Random Forest for learned pattern detection
4. **Rule-Based Detectors**: Threshold, Pattern, Spatial, Temporal, Multi-Signal for domain-specific rules
5. **Ensemble Detectors**: Voting, Weighted voting for combining multiple detectors
6. **Visualization**: Spatial (3D maps), Temporal (time series), Metrics (confusion matrix, performance), Comparison views
7. **Metrics**: Accuracy, Precision, Recall, F1 Score, TP/FP/TN/FN

### Next Steps

Proceed to:
- **14_Anomaly_Detection_Workflow.ipynb** - Complete anomaly detection workflow with pipelines

### Related Resources

- Anomaly Detection Documentation: `../docs/AM_QADF/05-modules/anomaly-detection.md`
- API Reference: `../docs/AM_QADF/06-api-reference/anomaly-detection-api.md`
- Examples: `../examples/`
