# Algorithm Comparison Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/organization/anomaly-detection/blob/main/docs/notebooks/02_algorithm_comparison_tutorial.ipynb)

**Objective**: Compare different anomaly detection algorithms side-by-side with interactive visualizations and parameter tuning.

**Duration**: 45 minutes  
**Level**: Intermediate  
**Prerequisites**: Basic Python and ML knowledge

## 📋 What You'll Learn

- Compare Isolation Forest, LOF, and One-Class SVM algorithms
- Understand algorithm strengths and weaknesses
- Interactive parameter tuning with real-time feedback
- Performance metrics and visualization
- How to choose the right algorithm for your data

## 🚀 Getting Started

In [None]:
# Install required packages (uncomment if running in Colab)
# !pip install anomaly-detection plotly ipywidgets scikit-learn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("🎉 All packages imported successfully!")
print("📊 Ready to compare anomaly detection algorithms")

## 📊 Mock Implementation for Tutorial

Since we're demonstrating the concepts, we'll use a mock implementation that shows how the real package would work:

In [None]:
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from sklearn.svm import OneClassSVM
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import time

class MockDetectionService:
    """Mock implementation of the DetectionService for educational purposes."""
    
    def __init__(self):
        self.algorithms = {
            'isolation_forest': IsolationForest,
            'lof': LocalOutlierFactor,
            'one_class_svm': OneClassSVM
        }
    
    def detect(self, data, algorithm='isolation_forest', **kwargs):
        """Detect anomalies using the specified algorithm."""
        start_time = time.time()
        
        if algorithm == 'isolation_forest':
            model = IsolationForest(
                contamination=kwargs.get('contamination', 0.1),
                n_estimators=kwargs.get('n_estimators', 100),
                random_state=kwargs.get('random_state', 42)
            )
            predictions = model.fit_predict(data)
            scores = model.score_samples(data)
            
        elif algorithm == 'lof':
            model = LocalOutlierFactor(
                n_neighbors=kwargs.get('n_neighbors', 20),
                contamination=kwargs.get('contamination', 0.1)
            )
            predictions = model.fit_predict(data)
            scores = model.negative_outlier_factor_
            
        elif algorithm == 'one_class_svm':
            model = OneClassSVM(
                kernel=kwargs.get('kernel', 'rbf'),
                gamma=kwargs.get('gamma', 'scale'),
                nu=kwargs.get('nu', 0.1)
            )
            predictions = model.fit_predict(data)
            scores = model.score_samples(data)
        
        processing_time = time.time() - start_time
        
        return MockResult(predictions, scores, len(data), processing_time, algorithm)

class MockResult:
    """Mock result class that mimics the real DetectionResult."""
    
    def __init__(self, predictions, scores, total_samples, processing_time, algorithm):
        self.predictions = predictions
        self.scores = scores
        self.total_samples = total_samples
        self.processing_time = processing_time
        self.algorithm = algorithm
        self.anomaly_count = np.sum(predictions == -1)
        self.anomaly_rate = self.anomaly_count / total_samples

# Initialize the mock service
detection_service = MockDetectionService()
print("✅ Mock DetectionService initialized")
print("📝 Available algorithms:", list(detection_service.algorithms.keys()))

## 🎲 Generate Synthetic Dataset

Let's create a synthetic dataset with known anomalies to test our algorithms:

In [None]:
def generate_synthetic_data(n_samples=1000, n_features=2, contamination=0.1, random_state=42):
    """Generate synthetic data with known anomalies."""
    np.random.seed(random_state)
    
    # Normal data (multivariate normal distribution)
    n_normal = int(n_samples * (1 - contamination))
    n_anomalies = n_samples - n_normal
    
    # Create correlation matrix
    if n_features == 2:
        cov_matrix = [[1, 0.3], [0.3, 1]]
    else:
        cov_matrix = np.eye(n_features)
        # Add some correlation
        for i in range(n_features-1):
            cov_matrix[i, i+1] = 0.3
            cov_matrix[i+1, i] = 0.3
    
    # Normal samples
    normal_data = np.random.multivariate_normal(
        mean=np.zeros(n_features), 
        cov=cov_matrix, 
        size=n_normal
    )
    
    # Anomalous samples (different distribution)
    anomaly_data = np.random.multivariate_normal(
        mean=np.ones(n_features) * 4, 
        cov=np.eye(n_features) * 0.5, 
        size=n_anomalies
    )
    
    # Combine data
    X = np.vstack([normal_data, anomaly_data])
    y_true = np.hstack([np.ones(n_normal), -np.ones(n_anomalies)])
    
    # Shuffle the data
    indices = np.random.permutation(n_samples)
    X = X[indices]
    y_true = y_true[indices]
    
    return X, y_true

# Generate the dataset
X, y_true = generate_synthetic_data(n_samples=1000, n_features=2, contamination=0.1)

print(f"📊 Generated dataset:")
print(f"   Samples: {X.shape[0]}")
print(f"   Features: {X.shape[1]}")
print(f"   Normal samples: {np.sum(y_true == 1)}")
print(f"   Anomalous samples: {np.sum(y_true == -1)}")
print(f"   Contamination rate: {np.sum(y_true == -1) / len(y_true):.1%}")

## 📈 Visualize the Dataset

Let's visualize our synthetic dataset to understand the ground truth:

In [None]:
def plot_dataset(X, y_true, title="Synthetic Dataset"):
    """Plot the 2D dataset with true labels."""
    fig = go.Figure()
    
    # Normal points
    normal_mask = y_true == 1
    fig.add_trace(go.Scatter(
        x=X[normal_mask, 0],
        y=X[normal_mask, 1],
        mode='markers',
        name='Normal',
        marker=dict(color='blue', size=6, opacity=0.7),
        hovertemplate='Normal<br>X: %{x:.2f}<br>Y: %{y:.2f}<extra></extra>'
    ))
    
    # Anomalous points
    anomaly_mask = y_true == -1
    fig.add_trace(go.Scatter(
        x=X[anomaly_mask, 0],
        y=X[anomaly_mask, 1],
        mode='markers',
        name='Anomaly (True)',
        marker=dict(color='red', size=8, opacity=0.9, symbol='x'),
        hovertemplate='Anomaly<br>X: %{x:.2f}<br>Y: %{y:.2f}<extra></extra>'
    ))
    
    fig.update_layout(
        title=title,
        xaxis_title='Feature 1',
        yaxis_title='Feature 2',
        showlegend=True,
        width=600,
        height=500
    )
    
    return fig

# Plot the dataset
fig = plot_dataset(X, y_true, "Ground Truth: Normal vs Anomalous Points")
fig.show()

print("💡 The red X marks show the true anomalies we want to detect!")

## 🔬 Algorithm Comparison Function

Let's create a comprehensive function to compare algorithms:

In [None]:
def compare_algorithms(X, y_true, algorithms_config):
    """Compare multiple algorithms and return results."""
    results = {}
    
    for algo_name, config in algorithms_config.items():
        print(f"🔍 Testing {algo_name}...")
        
        # Detect anomalies
        result = detection_service.detect(X, algorithm=config['algorithm'], **config['params'])
        
        # Calculate metrics
        y_pred = result.predictions
        
        # Convert to binary classification metrics (1 = normal, 0 = anomaly)
        y_true_binary = (y_true == 1).astype(int)
        y_pred_binary = (y_pred == 1).astype(int)
        
        # Calculate metrics
        tn, fp, fn, tp = confusion_matrix(y_true_binary, y_pred_binary).ravel()
        
        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        accuracy = (tp + tn) / (tp + tn + fp + fn)
        
        # For anomaly detection, we often care more about detecting anomalies
        anomaly_precision = tn / (tn + fn) if (tn + fn) > 0 else 0  # True anomalies detected
        anomaly_recall = tn / (tn + fp) if (tn + fp) > 0 else 0     # Precision of anomaly detection
        
        results[algo_name] = {
            'result': result,
            'predictions': y_pred,
            'scores': result.scores,
            'processing_time': result.processing_time,
            'anomaly_count': result.anomaly_count,
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'f1_score': f1,
            'anomaly_precision': anomaly_precision,
            'anomaly_recall': anomaly_recall,
            'true_positives': tp,
            'false_positives': fp,
            'true_negatives': tn,
            'false_negatives': fn
        }
        
        print(f"   ✅ Completed in {result.processing_time:.3f}s")
        print(f"   📊 Detected {result.anomaly_count} anomalies ({result.anomaly_rate:.1%})")
    
    return results

print("✅ Algorithm comparison function ready!")

## ⚙️ Run Algorithm Comparison

Now let's compare three popular algorithms with default parameters:

In [None]:
# Define algorithms and their configurations
algorithms_config = {
    'Isolation Forest': {
        'algorithm': 'isolation_forest',
        'params': {
            'contamination': 0.1,
            'n_estimators': 100,
            'random_state': 42
        }
    },
    'Local Outlier Factor': {
        'algorithm': 'lof',
        'params': {
            'n_neighbors': 20,
            'contamination': 0.1
        }
    },
    'One-Class SVM': {
        'algorithm': 'one_class_svm',
        'params': {
            'kernel': 'rbf',
            'gamma': 'scale',
            'nu': 0.1
        }
    }
}

# Run comparison
print("🚀 Starting algorithm comparison...\n")
comparison_results = compare_algorithms(X, y_true, algorithms_config)
print("\n✅ Algorithm comparison complete!")

## 📊 Results Visualization

Let's create comprehensive visualizations to compare the algorithms:

In [None]:
def create_comparison_dashboard(X, y_true, results):
    """Create a comprehensive comparison dashboard."""
    
    # Create subplots
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Algorithm Predictions', 'Performance Metrics', 
                       'Processing Time', 'Anomaly Score Distributions'),
        specs=[[{"type": "scatter"}, {"type": "bar"}],
               [{"type": "bar"}, {"type": "histogram"}]]
    )
    
    # Color palette for algorithms
    colors = ['red', 'green', 'purple']
    
    # 1. Algorithm Predictions (scatter plot)
    for i, (algo_name, result) in enumerate(results.items()):
        y_pred = result['predictions']
        
        # Predicted anomalies
        anomaly_mask = y_pred == -1
        fig.add_trace(
            go.Scatter(
                x=X[anomaly_mask, 0],
                y=X[anomaly_mask, 1],
                mode='markers',
                name=f'{algo_name}',
                marker=dict(color=colors[i], size=8, opacity=0.7, symbol='circle-open'),
                legendgroup=algo_name,
                hovertemplate=f'{algo_name}<br>X: %{{x:.2f}}<br>Y: %{{y:.2f}}<extra></extra>'
            ),
            row=1, col=1
        )
    
    # Add true anomalies
    true_anomaly_mask = y_true == -1
    fig.add_trace(
        go.Scatter(
            x=X[true_anomaly_mask, 0],
            y=X[true_anomaly_mask, 1],
            mode='markers',
            name='True Anomalies',
            marker=dict(color='black', size=6, symbol='x'),
            hovertemplate='True Anomaly<br>X: %{x:.2f}<br>Y: %{y:.2f}<extra></extra>'
        ),
        row=1, col=1
    )
    
    # 2. Performance Metrics (bar chart)
    metrics = ['accuracy', 'precision', 'recall', 'f1_score']
    for i, (algo_name, result) in enumerate(results.items()):
        metric_values = [result[metric] for metric in metrics]
        fig.add_trace(
            go.Bar(
                x=metrics,
                y=metric_values,
                name=algo_name,
                marker_color=colors[i],
                legendgroup=algo_name,
                showlegend=False
            ),
            row=1, col=2
        )
    
    # 3. Processing Time (bar chart)
    algo_names = list(results.keys())
    processing_times = [results[name]['processing_time'] for name in algo_names]
    
    fig.add_trace(
        go.Bar(
            x=algo_names,
            y=processing_times,
            marker_color=colors,
            name='Processing Time',
            showlegend=False,
            hovertemplate='%{x}<br>Time: %{y:.3f}s<extra></extra>'
        ),
        row=2, col=1
    )
    
    # 4. Anomaly Score Distributions (histograms)
    for i, (algo_name, result) in enumerate(results.items()):
        scores = result['scores']
        fig.add_trace(
            go.Histogram(
                x=scores,
                name=f'{algo_name} Scores',
                marker_color=colors[i],
                opacity=0.7,
                legendgroup=algo_name,
                showlegend=False
            ),
            row=2, col=2
        )
    
    # Update layout
    fig.update_layout(
        height=800,
        title_text="Anomaly Detection Algorithm Comparison Dashboard",
        showlegend=True
    )
    
    # Update axes labels
    fig.update_xaxes(title_text="Feature 1", row=1, col=1)
    fig.update_yaxes(title_text="Feature 2", row=1, col=1)
    fig.update_yaxes(title_text="Score", row=1, col=2)
    fig.update_yaxes(title_text="Time (seconds)", row=2, col=1)
    fig.update_xaxes(title_text="Anomaly Score", row=2, col=2)
    fig.update_yaxes(title_text="Count", row=2, col=2)
    
    return fig

# Create and display the dashboard
dashboard_fig = create_comparison_dashboard(X, y_true, comparison_results)
dashboard_fig.show()

print("📊 Comparison dashboard displayed!")

## 📈 Performance Summary Table

Let's create a detailed performance comparison table:

In [None]:
def create_performance_table(results):
    """Create a comprehensive performance comparison table."""
    
    performance_data = []
    
    for algo_name, result in results.items():
        performance_data.append({
            'Algorithm': algo_name,
            'Accuracy': f"{result['accuracy']:.3f}",
            'Precision': f"{result['precision']:.3f}",
            'Recall': f"{result['recall']:.3f}",
            'F1-Score': f"{result['f1_score']:.3f}",
            'Anomalies Detected': result['anomaly_count'],
            'Processing Time (s)': f"{result['processing_time']:.3f}",
            'True Positives': result['true_positives'],
            'False Positives': result['false_positives'],
            'True Negatives': result['true_negatives'],
            'False Negatives': result['false_negatives']
        })
    
    df_performance = pd.DataFrame(performance_data)
    
    return df_performance

# Create and display the performance table
performance_df = create_performance_table(comparison_results)

print("📊 Algorithm Performance Comparison")
print("=" * 80)
display(performance_df)

# Find the best algorithm for each metric
print("\n🏆 Best Performing Algorithm by Metric:")
print("-" * 40)

numeric_cols = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
for col in numeric_cols:
    performance_df[col] = performance_df[col].astype(float)
    best_idx = performance_df[col].idxmax()
    best_algo = performance_df.loc[best_idx, 'Algorithm']
    best_score = performance_df.loc[best_idx, col]
    print(f"📈 {col}: {best_algo} ({best_score:.3f})")

# Find fastest algorithm
performance_df['Processing Time (s)'] = performance_df['Processing Time (s)'].astype(float)
fastest_idx = performance_df['Processing Time (s)'].idxmin()
fastest_algo = performance_df.loc[fastest_idx, 'Algorithm']
fastest_time = performance_df.loc[fastest_idx, 'Processing Time (s)']
print(f"⚡ Fastest: {fastest_algo} ({fastest_time:.3f}s)")

## 🎛️ Interactive Parameter Tuning

Now let's create interactive widgets to tune algorithm parameters in real-time:

In [None]:
def create_interactive_tuning():
    """Create interactive parameter tuning widgets."""
    
    # Output widget to display results
    output = widgets.Output()
    
    # Algorithm selection
    algorithm_dropdown = widgets.Dropdown(
        options=['isolation_forest', 'lof', 'one_class_svm'],
        value='isolation_forest',
        description='Algorithm:',
        style={'description_width': 'initial'}
    )
    
    # Common parameters
    contamination_slider = widgets.FloatSlider(
        value=0.1,
        min=0.01,
        max=0.5,
        step=0.01,
        description='Contamination:',
        style={'description_width': 'initial'}
    )
    
    # Isolation Forest specific
    n_estimators_slider = widgets.IntSlider(
        value=100,
        min=10,
        max=300,
        step=10,
        description='N Estimators:',
        style={'description_width': 'initial'}
    )
    
    # LOF specific
    n_neighbors_slider = widgets.IntSlider(
        value=20,
        min=5,
        max=100,
        step=5,
        description='N Neighbors:',
        style={'description_width': 'initial'}
    )
    
    # One-Class SVM specific
    kernel_dropdown = widgets.Dropdown(
        options=['rbf', 'linear', 'poly'],
        value='rbf',
        description='Kernel:',
        style={'description_width': 'initial'}
    )
    
    nu_slider = widgets.FloatSlider(
        value=0.1,
        min=0.01,
        max=0.5,
        step=0.01,
        description='Nu:',
        style={'description_width': 'initial'}
    )
    
    def update_visualization(*args):
        """Update visualization based on parameter changes."""
        with output:
            output.clear_output()
            
            # Get current parameter values
            algorithm = algorithm_dropdown.value
            contamination = contamination_slider.value
            
            # Prepare parameters based on algorithm
            if algorithm == 'isolation_forest':
                params = {
                    'contamination': contamination,
                    'n_estimators': n_estimators_slider.value,
                    'random_state': 42
                }
            elif algorithm == 'lof':
                params = {
                    'contamination': contamination,
                    'n_neighbors': n_neighbors_slider.value
                }
            else:  # one_class_svm
                params = {
                    'kernel': kernel_dropdown.value,
                    'nu': nu_slider.value
                }
            
            # Run detection
            result = detection_service.detect(X, algorithm=algorithm, **params)
            
            # Calculate metrics
            y_pred = result.predictions
            y_true_binary = (y_true == 1).astype(int)
            y_pred_binary = (y_pred == 1).astype(int)
            
            tn, fp, fn, tp = confusion_matrix(y_true_binary, y_pred_binary).ravel()
            accuracy = (tp + tn) / (tp + tn + fp + fn)
            precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0
            f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
            
            # Create visualization
            fig = go.Figure()
            
            # Normal points
            normal_mask = y_pred == 1
            fig.add_trace(go.Scatter(
                x=X[normal_mask, 0],
                y=X[normal_mask, 1],
                mode='markers',
                name='Predicted Normal',
                marker=dict(color='lightblue', size=6, opacity=0.7)
            ))
            
            # Predicted anomalies
            anomaly_mask = y_pred == -1
            fig.add_trace(go.Scatter(
                x=X[anomaly_mask, 0],
                y=X[anomaly_mask, 1],
                mode='markers',
                name='Predicted Anomaly',
                marker=dict(color='orange', size=8, opacity=0.8, symbol='circle-open')
            ))
            
            # True anomalies
            true_anomaly_mask = y_true == -1
            fig.add_trace(go.Scatter(
                x=X[true_anomaly_mask, 0],
                y=X[true_anomaly_mask, 1],
                mode='markers',
                name='True Anomaly',
                marker=dict(color='red', size=6, symbol='x')
            ))
            
            fig.update_layout(
                title=f'{algorithm.replace("_", " ").title()} Results',
                xaxis_title='Feature 1',
                yaxis_title='Feature 2',
                width=600,
                height=400
            )
            
            fig.show()
            
            # Display metrics
            print(f"📊 Performance Metrics:")
            print(f"   Accuracy: {accuracy:.3f}")
            print(f"   Precision: {precision:.3f}")
            print(f"   Recall: {recall:.3f}")
            print(f"   F1-Score: {f1:.3f}")
            print(f"   Detected Anomalies: {result.anomaly_count}")
            print(f"   Processing Time: {result.processing_time:.3f}s")
    
    # Connect widgets to update function
    algorithm_dropdown.observe(update_visualization, names='value')
    contamination_slider.observe(update_visualization, names='value')
    n_estimators_slider.observe(update_visualization, names='value')
    n_neighbors_slider.observe(update_visualization, names='value')
    kernel_dropdown.observe(update_visualization, names='value')
    nu_slider.observe(update_visualization, names='value')
    
    # Show/hide algorithm-specific widgets
    def update_widget_visibility(*args):
        algorithm = algorithm_dropdown.value
        
        if algorithm == 'isolation_forest':
            n_estimators_slider.layout.display = 'block'
            n_neighbors_slider.layout.display = 'none'
            kernel_dropdown.layout.display = 'none'
            nu_slider.layout.display = 'none'
        elif algorithm == 'lof':
            n_estimators_slider.layout.display = 'none'
            n_neighbors_slider.layout.display = 'block'
            kernel_dropdown.layout.display = 'none'
            nu_slider.layout.display = 'none'
        else:  # one_class_svm
            n_estimators_slider.layout.display = 'none'
            n_neighbors_slider.layout.display = 'none'
            kernel_dropdown.layout.display = 'block'
            nu_slider.layout.display = 'block'
    
    algorithm_dropdown.observe(update_widget_visibility, names='value')
    update_widget_visibility()  # Initial call
    
    # Create widget layout
    controls = widgets.VBox([
        widgets.HTML("<h3>🎛️ Interactive Parameter Tuning</h3>"),
        algorithm_dropdown,
        contamination_slider,
        n_estimators_slider,
        n_neighbors_slider,
        kernel_dropdown,
        nu_slider
    ])
    
    # Initial visualization
    update_visualization()
    
    return widgets.HBox([controls, output])

# Create and display interactive tuning interface
print("🎛️ Creating interactive parameter tuning interface...")
interactive_widget = create_interactive_tuning()
display(interactive_widget)

## 💡 Key Insights and Algorithm Selection Guide

Based on our comparison, here are the key insights:

In [None]:
def generate_insights(results):
    """Generate insights and recommendations based on results."""
    
    print("💡 Algorithm Comparison Insights")
    print("=" * 50)
    
    # Find best algorithm for each metric
    best_accuracy = max(results.items(), key=lambda x: x[1]['accuracy'])
    best_f1 = max(results.items(), key=lambda x: x[1]['f1_score'])
    fastest = min(results.items(), key=lambda x: x[1]['processing_time'])
    
    print(f"🎯 Best Overall Performance: {best_accuracy[0]} (Accuracy: {best_accuracy[1]['accuracy']:.3f})")
    print(f"⚖️ Best F1-Score: {best_f1[0]} (F1: {best_f1[1]['f1_score']:.3f})")
    print(f"⚡ Fastest Algorithm: {fastest[0]} ({fastest[1]['processing_time']:.3f}s)")
    
    print("\n📋 Algorithm Characteristics:")
    print("-" * 30)
    
    algorithm_chars = {
        'Isolation Forest': {
            'strengths': ['Fast training', 'Good for large datasets', 'Handles high dimensions well'],
            'weaknesses': ['Less interpretable', 'May struggle with local anomalies'],
            'best_for': 'General purpose, large-scale anomaly detection'
        },
        'Local Outlier Factor': {
            'strengths': ['Good for local anomalies', 'Interpretable scores', 'Handles varying densities'],
            'weaknesses': ['Slower on large datasets', 'Memory intensive', 'Sensitive to parameters'],
            'best_for': 'Local anomaly detection, interpretability needed'
        },
        'One-Class SVM': {
            'strengths': ['Good boundary definition', 'Kernel flexibility', 'Robust to outliers'],
            'weaknesses': ['Slow training', 'Hard to tune', 'Not suitable for large datasets'],
            'best_for': 'Small to medium datasets, complex boundaries'
        }
    }
    
    for algo_name, chars in algorithm_chars.items():
        if algo_name in results:
            result = results[algo_name]
            print(f"\n🔍 {algo_name}:")
            print(f"   ✅ Strengths: {', '.join(chars['strengths'])}")
            print(f"   ⚠️ Weaknesses: {', '.join(chars['weaknesses'])}")
            print(f"   🎯 Best for: {chars['best_for']}")
            print(f"   📊 Our results: Accuracy={result['accuracy']:.3f}, F1={result['f1_score']:.3f}, Time={result['processing_time']:.3f}s")
    
    print("\n🎯 Selection Guidelines:")
    print("-" * 25)
    print("• For **large datasets (>10K samples)**: Use Isolation Forest")
    print("• For **local anomaly patterns**: Use Local Outlier Factor")
    print("• For **complex non-linear boundaries**: Use One-Class SVM")
    print("• For **maximum accuracy**: Use ensemble methods (combine multiple algorithms)")
    print("• For **real-time detection**: Use Isolation Forest (fastest)")
    print("• For **interpretability**: Use Local Outlier Factor")

# Generate insights
generate_insights(comparison_results)

## 🚀 Next Steps and Advanced Topics

Congratulations! You have successfully compared different anomaly detection algorithms. Here are the next steps:

In [None]:
print("🎓 Learning Path: What's Next?")
print("=" * 40)

next_steps = [
    {
        'title': '🎯 Ensemble Methods',
        'description': 'Combine multiple algorithms for better performance',
        'notebook': '06_ensemble_methods_deep_dive.ipynb',
        'difficulty': 'Advanced',
        'time': '55 minutes'
    },
    {
        'title': '🏦 Real-world Use Case: Fraud Detection',
        'description': 'End-to-end fraud detection pipeline',
        'notebook': '03_fraud_detection_end_to_end.ipynb',
        'difficulty': 'Intermediate',
        'time': '60 minutes'
    },
    {
        'title': '📡 Real-time Streaming Detection',
        'description': 'Detect anomalies in streaming data',
        'notebook': '07_real_time_streaming_detection.ipynb',
        'difficulty': 'Advanced',
        'time': '50 minutes'
    },
    {
        'title': '🔍 Model Explainability',
        'description': 'Understand why data points are anomalous',
        'notebook': '08_model_explainability_tutorial.ipynb',
        'difficulty': 'Intermediate',
        'time': '40 minutes'
    }
]

for i, step in enumerate(next_steps, 1):
    print(f"\n{i}. {step['title']}")
    print(f"   📝 {step['description']}")
    print(f"   📓 Notebook: {step['notebook']}")
    print(f"   🎯 Difficulty: {step['difficulty']}")
    print(f"   ⏱️ Time: {step['time']}")

print("\n🔗 Additional Resources:")
print("-" * 25)
print("• 📖 Algorithm Documentation: ../algorithms.md")
print("• 🎯 Quickstart Templates: ../quickstart.md")
print("• 📊 Example Datasets: ../datasets/README.md")
print("• 🏗️ Production Deployment: ../deployment.md")
print("• 🔧 Troubleshooting Guide: ../troubleshooting.md")

print("\n✅ Tutorial Complete!")
print("🎉 You now understand how to compare and select anomaly detection algorithms!")

## 📝 Summary

In this tutorial, you learned:

✅ **Algorithm Comparison**: How to systematically compare different anomaly detection algorithms  
✅ **Performance Metrics**: Understanding accuracy, precision, recall, and F1-score for anomaly detection  
✅ **Interactive Tuning**: Real-time parameter adjustment with immediate visual feedback  
✅ **Algorithm Selection**: Guidelines for choosing the right algorithm for your use case  
✅ **Visualization**: Creating comprehensive dashboards to analyze results  

### 🎯 Key Takeaways

1. **No single algorithm is best for all cases** - algorithm choice depends on your data characteristics
2. **Parameter tuning is crucial** - default parameters may not be optimal for your specific dataset
3. **Visualization helps understanding** - always plot your results to gain insights
4. **Consider trade-offs** - balance between accuracy, speed, and interpretability
5. **Ensemble methods** often provide the best overall performance

### 🔗 Related Notebooks

- **[Ensemble Methods Deep Dive](06_ensemble_methods_deep_dive.ipynb)** - Learn to combine algorithms
- **[Fraud Detection End-to-End](03_fraud_detection_end_to_end.ipynb)** - Apply these concepts to real-world fraud detection
- **[Model Explainability](08_model_explainability_tutorial.ipynb)** - Understand model decisions

**Happy anomaly detecting!** 🚀