# Comprehensive Anomaly Classification Guide

This notebook demonstrates the complete Pynomaly anomaly classification system, showcasing both severity and type classification with various configuration options.

## Table of Contents
1. [Basic Classification](#basic-classification)
2. [Custom Severity Thresholds](#custom-severity-thresholds)
3. [Batch Processing](#batch-processing)
4. [Dashboard Integration](#dashboard-integration)
5. [Domain-Specific Examples](#domain-specific-examples)

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
from pynomaly.application.services.anomaly_classification_service import AnomalyClassificationService
from pynomaly.domain.entities.anomaly import Anomaly
from pynomaly.domain.value_objects.anomaly_score import AnomalyScore
from pynomaly.domain.services.anomaly_classifiers import (
    DefaultSeverityClassifier,
    DefaultTypeClassifier,
    BatchProcessingSeverityClassifier,
    DashboardTypeClassifier
)

# Set up for reproducible results
np.random.seed(42)
print("Setup complete!")

## Basic Classification

Let's start with the basic classification using default settings.

In [None]:
# Initialize the classification service
service = AnomalyClassificationService()

# Create sample anomalies with different scores
anomalies = [
    Anomaly(
        id="critical-001",
        score=AnomalyScore(0.95),
        data_point={"feature1": 10.0, "feature2": 15.0}
    ),
    Anomaly(
        id="high-001",
        score=AnomalyScore(0.8),
        data_point={"feature1": 5.0, "feature2": 8.0}
    ),
    Anomaly(
        id="medium-001",
        score=AnomalyScore(0.6),
        data_point={"feature1": 3.0, "feature2": 4.0}
    ),
    Anomaly(
        id="low-001",
        score=AnomalyScore(0.3),
        data_point={"feature1": 1.0, "feature2": 2.0}
    )
]

# Classify all anomalies
for anomaly in anomalies:
    service.classify(anomaly)
    print(f"Anomaly {anomaly.id}:")
    print(f"  Score: {anomaly.score.value:.2f}")
    print(f"  Severity: {anomaly.metadata.get('severity')}")
    print(f"  Type: {anomaly.metadata.get('type')}")
    print()

## Custom Severity Thresholds

Let's demonstrate how to use custom severity thresholds for domain-specific requirements.

In [None]:
# Financial services example with stricter thresholds
financial_classifier = DefaultSeverityClassifier({
    'critical': 0.95,  # Fraud indicators
    'high': 0.85,      # Suspicious transactions
    'medium': 0.7,     # Unusual patterns
    'low': 0.5         # Minor deviations
})

# Create a new service with custom classifier
financial_service = AnomalyClassificationService(
    severity_classifier=financial_classifier
)

# Test with financial transaction anomaly
transaction_anomaly = Anomaly(
    id="transaction-001",
    score=AnomalyScore(0.82),
    data_point={
        "amount": 50000,
        "location": "foreign",
        "time": "3am"
    }
)

# Classify with default and custom thresholds
service.classify(transaction_anomaly)
default_severity = transaction_anomaly.metadata.get('severity')

financial_service.classify(transaction_anomaly)
financial_severity = transaction_anomaly.metadata.get('severity')

print(f"Transaction Anomaly (Score: {transaction_anomaly.score.value:.2f})")
print(f"Default severity: {default_severity}")
print(f"Financial severity: {financial_severity}")
print(f"Type: {transaction_anomaly.metadata.get('type')}")

## Batch Processing

For high-throughput scenarios, we can enable batch processing optimization.

In [None]:
# Generate a large batch of anomalies
batch_size = 1000
scores = np.random.uniform(0.0, 1.0, batch_size)

batch_anomalies = []
for i, score in enumerate(scores):
    anomaly = Anomaly(
        id=f"batch-{i:04d}",
        score=AnomalyScore(score),
        data_point={"feature1": np.random.randn(), "feature2": np.random.randn()}
    )
    batch_anomalies.append(anomaly)

# Enable batch processing
batch_service = AnomalyClassificationService()
batch_service.use_batch_processing_classifiers()

# Measure classification time
import time
start_time = time.time()

for anomaly in batch_anomalies:
    batch_service.classify(anomaly)

end_time = time.time()
processing_time = end_time - start_time

print(f"Processed {batch_size} anomalies in {processing_time:.2f} seconds")
print(f"Average time per anomaly: {processing_time/batch_size*1000:.2f} ms")

# Analyze classification results
severity_counts = {}
type_counts = {}

for anomaly in batch_anomalies:
    severity = anomaly.metadata.get('severity')
    type_category = anomaly.metadata.get('type')
    
    severity_counts[severity] = severity_counts.get(severity, 0) + 1
    type_counts[type_category] = type_counts.get(type_category, 0) + 1

print("\nSeverity Distribution:")
for severity, count in sorted(severity_counts.items()):
    percentage = (count / batch_size) * 100
    print(f"  {severity}: {count} ({percentage:.1f}%)")

print("\nType Distribution:")
for type_category, count in sorted(type_counts.items()):
    percentage = (count / batch_size) * 100
    print(f"  {type_category}: {count} ({percentage:.1f}%)")

# Clear cache after batch processing
batch_service.clear_classifier_cache()
print("\nCache cleared successfully!")

## Dashboard Integration

For user-facing applications, we can use dashboard-optimized classifiers.

In [None]:
# Create dashboard service
dashboard_service = AnomalyClassificationService()
dashboard_service.use_dashboard_classifiers()

# Sample anomalies for dashboard
dashboard_anomalies = [
    Anomaly(
        id="dashboard-001",
        score=AnomalyScore(0.92),
        data_point={"sensor1": 100.0}  # Single sensor - point anomaly
    ),
    Anomaly(
        id="dashboard-002",
        score=AnomalyScore(0.78),
        data_point={"sensor1": 10.0, "sensor2": 15.0, "sensor3": 20.0, "sensor4": 25.0},
        metadata={"temporal_context": True}  # Contextual anomaly
    ),
    Anomaly(
        id="dashboard-003",
        score=AnomalyScore(0.65),
        data_point={"s1": 1.0, "s2": 2.0, "s3": 3.0, "s4": 4.0, "s5": 5.0}  # Multiple sensors - collective
    )
]

# Classify for dashboard display
dashboard_data = []
for anomaly in dashboard_anomalies:
    dashboard_service.classify(anomaly)
    dashboard_data.append({
        'id': anomaly.id,
        'score': anomaly.score.value,
        'severity': anomaly.metadata.get('severity'),
        'type': anomaly.metadata.get('type'),
        'features': len(anomaly.data_point)
    })

# Display as dashboard table
dashboard_df = pd.DataFrame(dashboard_data)
print("Dashboard Anomaly Summary:")
print(dashboard_df.to_string(index=False))

# Create summary statistics for dashboard
print("\nDashboard Summary Statistics:")
print(f"Total Anomalies: {len(dashboard_anomalies)}")
print(f"Critical: {sum(1 for d in dashboard_data if d['severity'] == 'critical')}")
print(f"High: {sum(1 for d in dashboard_data if d['severity'] == 'high')}")
print(f"Medium: {sum(1 for d in dashboard_data if d['severity'] == 'medium')}")
print(f"Low: {sum(1 for d in dashboard_data if d['severity'] == 'low')}")

## Domain-Specific Examples

Let's demonstrate how to implement domain-specific classifiers for different industries.

In [None]:
# IoT Type Classifier Example
class IoTTypeClassifier:
    def classify_type(self, anomaly):
        data_point = anomaly.data_point
        metadata = anomaly.metadata
        
        # Check for sensor failure patterns
        if len(data_point) == 1 and list(data_point.values())[0] > 90:
            return "sensor_failure"
        
        # Check for environmental anomalies
        if metadata.get('temporal_context'):
            return "environmental"
        
        # Check for cascade failures
        if len(data_point) > 3:
            return "cascade_failure"
        
        return "normal_deviation"

# Security Severity Classifier Example
class SecuritySeverityClassifier:
    def classify_severity(self, anomaly):
        threat_indicators = anomaly.metadata.get('threat_indicators', [])
        
        if 'malware' in threat_indicators:
            return "critical"
        elif 'intrusion_attempt' in threat_indicators:
            return "high"
        elif 'policy_violation' in threat_indicators:
            return "medium"
        else:
            # Fall back to score-based classification
            score = anomaly.score.value
            if score >= 0.9:
                return "critical"
            elif score >= 0.7:
                return "high"
            elif score >= 0.5:
                return "medium"
            else:
                return "low"

# Test IoT classifier
iot_anomaly = Anomaly(
    id="iot-001",
    score=AnomalyScore(0.85),
    data_point={"temperature_sensor": 95.0}
)

iot_service = AnomalyClassificationService(type_classifier=IoTTypeClassifier())
iot_service.classify(iot_anomaly)

print("IoT Classification Example:")
print(f"  Severity: {iot_anomaly.metadata.get('severity')}")
print(f"  Type: {iot_anomaly.metadata.get('type')}")

# Test security classifier
security_anomaly = Anomaly(
    id="security-001",
    score=AnomalyScore(0.75),
    data_point={"network_traffic": 1000},
    metadata={"threat_indicators": ["intrusion_attempt"]}
)

security_service = AnomalyClassificationService(severity_classifier=SecuritySeverityClassifier())
security_service.classify(security_anomaly)

print("\nSecurity Classification Example:")
print(f"  Severity: {security_anomaly.metadata.get('severity')}")
print(f"  Type: {security_anomaly.metadata.get('type')}")

## Performance Comparison

Let's compare the performance of different classification configurations.

In [None]:
# Performance comparison
test_size = 100
test_anomalies = []

for i in range(test_size):
    anomaly = Anomaly(
        id=f"perf-{i:03d}",
        score=AnomalyScore(np.random.uniform(0.0, 1.0)),
        data_point={f"feature{j}": np.random.randn() for j in range(3)}
    )
    test_anomalies.append(anomaly)

# Test different configurations
configurations = {
    'Default': AnomalyClassificationService(),
    'Batch Processing': AnomalyClassificationService(),
    'Dashboard': AnomalyClassificationService()
}

configurations['Batch Processing'].use_batch_processing_classifiers()
configurations['Dashboard'].use_dashboard_classifiers()

results = {}

for config_name, service in configurations.items():
    start_time = time.time()
    
    for anomaly in test_anomalies:
        service.classify(anomaly)
    
    end_time = time.time()
    processing_time = end_time - start_time
    
    results[config_name] = {
        'total_time': processing_time,
        'avg_time_ms': (processing_time / test_size) * 1000
    }
    
    # Clear cache if applicable
    service.clear_classifier_cache()

print("Performance Comparison Results:")
print("=" * 50)
for config_name, metrics in results.items():
    print(f"{config_name}:")
    print(f"  Total time: {metrics['total_time']:.4f} seconds")
    print(f"  Average per anomaly: {metrics['avg_time_ms']:.2f} ms")
    print()

## Summary

This notebook demonstrated:

1. **Basic Classification**: Using default severity and type classifiers
2. **Custom Thresholds**: Adapting classification for domain-specific requirements
3. **Batch Processing**: Optimizing performance for high-throughput scenarios
4. **Dashboard Integration**: Creating user-friendly classification outputs
5. **Domain-Specific Classifiers**: Implementing custom logic for specialized use cases
6. **Performance Analysis**: Comparing different configuration options

The Pynomaly classification system provides a flexible, extensible framework for categorizing anomalies in a consistent and actionable manner across different domains and use cases.