# Performance Degradation Detection Demo

This notebook demonstrates how to use Pynomaly's performance degradation detection capabilities to monitor model performance over time and trigger automated retraining when performance drops below acceptable thresholds.

## Features Demonstrated

- Real-time performance monitoring
- Performance baseline management
- Degradation detection and alerting
- Automated retraining integration
- Performance trend analysis


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import time
import warnings
warnings.filterwarnings('ignore')

# Pynomaly imports
from pynomaly.application.services.performance_monitoring_service import PerformanceMonitoringService
from pynomaly.infrastructure.monitoring.performance_monitor import PerformanceMonitor
from pynomaly.domain.entities import Dataset
from pynomaly.infrastructure.adapters.sklearn_adapter import SklearnAdapter
from pynomaly.domain.value_objects import ContaminationRate

## 1. Initialize Performance Monitoring

Set up the performance monitoring system with configurable thresholds and monitoring intervals.

In [None]:
# Configure performance monitoring
performance_monitor = PerformanceMonitor(
    max_history=1000,
    alert_thresholds={
        "execution_time": 5.0,      # seconds
        "memory_usage": 500.0,      # MB
        "cpu_usage": 70.0,          # percentage
        "samples_per_second": 50.0  # minimum throughput
    },
    monitoring_interval=1.0
)

# Initialize monitoring service
monitoring_service = PerformanceMonitoringService(
    performance_monitor=performance_monitor,
    auto_start_monitoring=True
)

print("Performance monitoring system initialized successfully!")

## 2. Create Sample Data and Detector

Generate synthetic data and create an anomaly detector for demonstration purposes.

In [None]:
# Generate synthetic data with anomalies
np.random.seed(42)

def generate_sample_data(n_samples=1000, n_features=5, contamination=0.1):
    # Generate normal data
    normal_data = np.random.normal(0, 1, (int(n_samples * (1 - contamination)), n_features))
    
    # Generate anomalies
    anomalies = np.random.uniform(-4, 4, (int(n_samples * contamination), n_features))
    
    # Combine data
    data = np.vstack([normal_data, anomalies])
    
    # Shuffle
    indices = np.random.permutation(len(data))
    data = data[indices]
    
    return pd.DataFrame(data, columns=[f'feature_{i}' for i in range(n_features)])

# Create sample dataset
sample_data = generate_sample_data()
dataset = Dataset(name="Sample Production Data", data=sample_data)

# Create detector
detector = SklearnAdapter(
    algorithm_name="IsolationForest",
    name="Production Detector",
    contamination_rate=ContaminationRate(0.1),
    random_state=42
)

print(f"Created dataset with {len(sample_data)} samples")
print(f"Dataset shape: {sample_data.shape}")

## 3. Monitor Detection Operations

Execute detection operations while monitoring performance metrics.

In [None]:
# Detection function
def run_detection(detector, dataset):
    """Run anomaly detection"""
    detector.fit(dataset)
    result = detector.detect(dataset)
    return result

# Monitor detection operation
print("Running monitored detection operation...")
result, metrics = monitoring_service.monitor_detection_operation(
    detector=detector,
    dataset=dataset,
    operation_func=run_detection
)

# Display results
print(f"\nDetection Results:")
print(f"  Anomalies detected: {len(result.anomalies)}")
print(f"  Execution time: {metrics.execution_time:.3f}s")
print(f"  Memory usage: {metrics.memory_usage:.1f}MB")
print(f"  CPU usage: {metrics.cpu_usage:.1f}%")
print(f"  Throughput: {metrics.samples_per_second:.1f} samples/sec")

## 4. Set Performance Baselines

Establish baseline performance metrics for comparison.

In [None]:
# Set baseline performance expectations
monitoring_service.set_performance_baseline(
    operation_name="detection_IsolationForest",
    baseline_metrics={
        "execution_time": 1.0,    # seconds
        "memory_usage": 50.0,     # MB
        "cpu_usage": 30.0         # percentage
    }
)

print("Performance baselines established successfully!")

# Display baselines
dashboard_data = monitoring_service.get_monitoring_dashboard_data()
print("\nCurrent Performance Baselines:")
for operation, metrics in dashboard_data['performance_baselines'].items():
    print(f"  {operation}:")
    for metric, value in metrics.items():
        print(f"    {metric}: {value}")

## 5. Simulate Performance Degradation

Simulate a scenario where model performance degrades over time.

In [None]:
# Simulate performance degradation by running multiple operations
print("Simulating performance degradation over time...")

performance_history = []

# Run multiple detection operations with gradually degrading performance
for i in range(5):
    # Simulate degradation by increasing data size
    degraded_data = generate_sample_data(n_samples=1000 + i * 500, n_features=5)
    degraded_dataset = Dataset(name=f"Degraded Data {i+1}", data=degraded_data)
    
    # Monitor operation
    result, metrics = monitoring_service.monitor_detection_operation(
        detector=detector,
        dataset=degraded_dataset,
        operation_func=run_detection
    )
    
    performance_history.append({
        'iteration': i + 1,
        'data_size': len(degraded_data),
        'execution_time': metrics.execution_time,
        'memory_usage': metrics.memory_usage,
        'throughput': metrics.samples_per_second
    })
    
    print(f"  Iteration {i+1}: {metrics.execution_time:.3f}s, {metrics.memory_usage:.1f}MB")
    
    # Small delay to simulate real-world timing
    time.sleep(0.5)

print("\nPerformance degradation simulation completed!")

## 6. Check for Performance Regression

Analyze the recent performance data to detect regression.

In [None]:
# Check for performance regression
regression_result = monitoring_service.check_performance_regression(
    operation_name="detection_IsolationForest",
    recent_window=timedelta(minutes=5)
)

print("Performance Regression Analysis:")
print(f"  Operations analyzed: {regression_result.get('recent_operations', 0)}")
print(f"  Regressions detected: {regression_result.get('regressions_detected', 0)}")

if regression_result.get('regressions_detected', 0) > 0:
    print("\n⚠️  Performance regressions detected!")
    for metric, regression in regression_result['regressions'].items():
        print(f"    {metric}:")
        print(f"      Baseline: {regression['baseline']:.3f}")
        print(f"      Current: {regression['current']:.3f}")
        print(f"      Degradation: {regression['degradation_percent']:.1f}%")
else:
    print("\n✅ No significant performance regressions detected.")

# Display current vs baseline performance
print("\nCurrent vs Baseline Performance:")
if 'current_performance' in regression_result:
    current = regression_result['current_performance']
    baseline = regression_result['baseline_performance']
    
    for metric in current.keys():
        if metric in baseline:
            change = ((current[metric] - baseline[metric]) / baseline[metric]) * 100
            print(f"  {metric}: {current[metric]:.3f} vs {baseline[metric]:.3f} ({change:+.1f}%)")
        else:
            print(f"  {metric}: {current[metric]:.3f} (no baseline)")

## 7. Visualize Performance Trends

Create visualizations to show performance trends over time.

In [None]:
# Create performance trend visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Performance Degradation Analysis', fontsize=16, fontweight='bold')

# Convert to DataFrame for easier plotting
df = pd.DataFrame(performance_history)

# Plot execution time
axes[0, 0].plot(df['iteration'], df['execution_time'], 'b-o', linewidth=2)
axes[0, 0].set_title('Execution Time Over Time')
axes[0, 0].set_xlabel('Iteration')
axes[0, 0].set_ylabel('Execution Time (seconds)')
axes[0, 0].grid(True, alpha=0.3)

# Plot memory usage
axes[0, 1].plot(df['iteration'], df['memory_usage'], 'r-o', linewidth=2)
axes[0, 1].set_title('Memory Usage Over Time')
axes[0, 1].set_xlabel('Iteration')
axes[0, 1].set_ylabel('Memory Usage (MB)')
axes[0, 1].grid(True, alpha=0.3)

# Plot throughput
axes[1, 0].plot(df['iteration'], df['throughput'], 'g-o', linewidth=2)
axes[1, 0].set_title('Throughput Over Time')
axes[1, 0].set_xlabel('Iteration')
axes[1, 0].set_ylabel('Samples/Second')
axes[1, 0].grid(True, alpha=0.3)

# Plot data size
axes[1, 1].plot(df['iteration'], df['data_size'], 'purple', marker='o', linewidth=2)
axes[1, 1].set_title('Data Size Over Time')
axes[1, 1].set_xlabel('Iteration')
axes[1, 1].set_ylabel('Number of Samples')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print("\nPerformance Summary:")
print(f"  Execution time increased by {((df['execution_time'].iloc[-1] - df['execution_time'].iloc[0]) / df['execution_time'].iloc[0] * 100):.1f}%")
print(f"  Memory usage increased by {((df['memory_usage'].iloc[-1] - df['memory_usage'].iloc[0]) / df['memory_usage'].iloc[0] * 100):.1f}%")
print(f"  Throughput decreased by {((df['throughput'].iloc[0] - df['throughput'].iloc[-1]) / df['throughput'].iloc[0] * 100):.1f}%")

## 8. Set Up Automated Alert Handling

Configure automated responses to performance alerts.

In [None]:
# Define alert handler
def handle_performance_alert(alert):
    """Handle performance alerts with automated responses"""
    print(f"\n🚨 Performance Alert: {alert.severity.upper()}")
    print(f"   Metric: {alert.metric_name}")
    print(f"   Current Value: {alert.current_value:.3f}")
    print(f"   Threshold: {alert.threshold_value:.3f}")
    print(f"   Operation: {alert.operation_name}")
    print(f"   Timestamp: {alert.timestamp}")
    
    # Automated response based on severity
    if alert.severity == "critical":
        print("   🔥 CRITICAL: Triggering emergency response...")
        # In a real scenario, this would trigger immediate retraining
        # trigger_emergency_retraining(alert.operation_name)
    elif alert.severity == "high":
        print("   ⚠️  HIGH: Scheduling retraining...")
        # schedule_retraining(alert.operation_name)
    elif alert.severity == "medium":
        print("   ⚡ MEDIUM: Monitoring closely...")
        # increase_monitoring_frequency(alert.operation_name)
    
    # Log the alert
    print(f"   📝 Alert logged for further analysis")

# Add alert handler to monitoring service
monitoring_service.add_alert_handler(handle_performance_alert)

print("Performance alert handling configured successfully!")

# Display current system status
dashboard_data = monitoring_service.get_monitoring_dashboard_data()
print(f"\nCurrent System Status:")
print(f"  Monitoring enabled: {dashboard_data['system_status']['monitoring_enabled']}")
print(f"  Total operations monitored: {dashboard_data['system_status']['total_operations_monitored']}")
print(f"  Active alerts: {dashboard_data['system_status']['alert_count']}")
print(f"  Failed operations: {dashboard_data['system_status']['failed_operations']}")

## 9. Performance Trends Analysis

Analyze performance trends over different time windows.

In [None]:
# Get performance trends
trends = monitoring_service.get_performance_trends(
    operation_name="detection_IsolationForest",
    time_window=timedelta(minutes=10),
    bucket_size=timedelta(minutes=1)
)

print("Performance Trends Analysis:")
print(f"  Analysis period: {trends.get('time_window_hours', 0):.1f} hours")
print(f"  Total operations analyzed: {trends.get('total_operations', 0)}")
print(f"  Time buckets: {len(trends.get('time_buckets', []))}")

if 'trends' in trends:
    print("\nTrend Analysis:")
    for metric, trend in trends['trends'].items():
        trend_icon = "📈" if trend == "increasing" else "📉"
        print(f"  {trend_icon} {metric}: {trend}")

# Algorithm performance comparison
print("\nAlgorithm Performance Comparison:")
comparison = monitoring_service.get_algorithm_performance_comparison(
    time_window=timedelta(minutes=10),
    min_operations=1
)

if comparison.get('algorithms'):
    for algorithm, stats in comparison['algorithms'].items():
        print(f"\n  {algorithm}:")
        print(f"    Operations: {stats['operation_count']}")
        print(f"    Avg execution time: {stats['avg_execution_time']:.3f}s")
        print(f"    Avg memory usage: {stats['avg_memory_usage']:.1f}MB")
        print(f"    Avg throughput: {stats['avg_throughput']:.1f} samples/sec")
        print(f"    Reliability score: {stats['reliability_score']:.3f}")

    # Display rankings
    if 'rankings' in comparison:
        print("\nAlgorithm Rankings:")
        for criterion, ranking in comparison['rankings'].items():
            print(f"  {criterion}: {' > '.join(ranking)}")

## 10. Integration with Automated Retraining

Demonstrate how performance monitoring integrates with automated retraining.

In [None]:
# Simulated retraining integration
def simulate_retraining_decision(alert):
    """Simulate automated retraining decision based on performance alert"""
    if alert.metric_name in ['execution_time', 'memory_usage'] and alert.severity in ['high', 'critical']:
        print(f"\n🔄 Retraining Decision for {alert.operation_name}:")
        
        # Simulate retraining decision logic
        degradation_percent = ((alert.current_value - alert.threshold_value) / alert.threshold_value) * 100
        
        if degradation_percent > 50:
            decision = "IMMEDIATE_RETRAINING"
            confidence = 0.95
        elif degradation_percent > 25:
            decision = "SCHEDULED_RETRAINING"
            confidence = 0.8
        else:
            decision = "MONITOR_CLOSELY"
            confidence = 0.6
        
        print(f"   Decision: {decision}")
        print(f"   Confidence: {confidence:.2f}")
        print(f"   Degradation: {degradation_percent:.1f}%")
        
        if decision == "IMMEDIATE_RETRAINING":
            print("   🚀 Triggering immediate retraining...")
            # In a real scenario:
            # auto_retraining_service.trigger_immediate_retraining(model_id)
        elif decision == "SCHEDULED_RETRAINING":
            print("   📅 Scheduling retraining for next maintenance window...")
            # auto_retraining_service.schedule_retraining(model_id, schedule_time)
        else:
            print("   👀 Increasing monitoring frequency...")
            # monitoring_service.increase_monitoring_frequency(operation_name)

# Add retraining decision handler
monitoring_service.add_alert_handler(simulate_retraining_decision)

print("Automated retraining integration configured successfully!")

# Test the system with a large dataset to trigger alerts
print("\nTesting alert system with large dataset...")
large_data = generate_sample_data(n_samples=5000, n_features=10)
large_dataset = Dataset(name="Large Test Data", data=large_data)

# This should trigger alerts due to increased execution time and memory usage
result, metrics = monitoring_service.monitor_detection_operation(
    detector=detector,
    dataset=large_dataset,
    operation_func=run_detection
)

print(f"\nLarge dataset test completed:")
print(f"  Execution time: {metrics.execution_time:.3f}s")
print(f"  Memory usage: {metrics.memory_usage:.1f}MB")
print(f"  Throughput: {metrics.samples_per_second:.1f} samples/sec")

## 11. Export and Reporting

Export performance metrics and generate reports.

In [None]:
# Export performance metrics
print("Exporting performance metrics...")

# Export to JSON
json_export = monitoring_service.monitor.export_metrics(
    format_type="json",
    time_window=timedelta(minutes=10)
)

print(f"JSON export contains {json_export['total_metrics']} metrics")

# Export to CSV
csv_export = monitoring_service.monitor.export_metrics(
    format_type="csv",
    time_window=timedelta(minutes=10)
)

print(f"CSV export generated ({len(csv_export.split(chr(10)))} lines)")

# Display sample of exported data
print("\nSample exported data (first 3 metrics):")
for i, metric in enumerate(json_export['metrics'][:3]):
    print(f"\n  Metric {i+1}:")
    print(f"    Operation: {metric['operation_name']}")
    print(f"    Algorithm: {metric['algorithm_name']}")
    print(f"    Execution time: {metric['execution_time']:.3f}s")
    print(f"    Memory usage: {metric['memory_usage']:.1f}MB")
    print(f"    Timestamp: {metric['timestamp']}")

# Generate final dashboard summary
dashboard_data = monitoring_service.get_monitoring_dashboard_data()
print(f"\nFinal Dashboard Summary:")
print(f"  Total operations monitored: {dashboard_data['system_status']['total_operations_monitored']}")
print(f"  Active alerts: {len(dashboard_data['active_alerts'])}")
print(f"  Failed operations: {dashboard_data['system_status']['failed_operations']}")
print(f"  Monitoring enabled: {dashboard_data['system_status']['monitoring_enabled']}")

if dashboard_data['active_alerts']:
    print("\nActive Alerts:")
    for alert in dashboard_data['active_alerts']:
        print(f"  - {alert['severity'].upper()}: {alert['message']}")

# Stop monitoring
monitoring_service.stop_monitoring()
print("\n✅ Performance monitoring demonstration completed successfully!")

## Summary

This notebook demonstrated the comprehensive performance degradation detection capabilities of Pynomaly:

1. **Real-time Monitoring**: Continuous tracking of execution time, memory usage, and throughput
2. **Baseline Management**: Setting and maintaining performance baselines for comparison
3. **Degradation Detection**: Automatic detection of performance regression with configurable thresholds
4. **Intelligent Alerting**: Multi-level alerting system with automated response capabilities
5. **Trend Analysis**: Historical performance analysis and trend visualization
6. **Automated Retraining**: Integration with automated retraining systems for proactive model maintenance
7. **Export and Reporting**: Comprehensive data export capabilities for further analysis

### Key Benefits

- **Proactive Monitoring**: Detect performance issues before they impact production
- **Automated Response**: Reduce manual intervention with intelligent alerting and retraining
- **Comprehensive Metrics**: Track all aspects of model performance in one system
- **Scalable Architecture**: Designed to handle high-volume production environments
- **Easy Integration**: Simple API for integration with existing ML pipelines

### Next Steps

1. Integrate with your existing ML pipeline
2. Configure appropriate thresholds for your use case
3. Set up automated retraining workflows
4. Configure alerting and notification systems
5. Monitor and adjust baselines over time
