# 2.2 Interactive Outlier Detection for Semiconductor Manufacturing

## 📚 Learning Objectives

By the end of this section, you will:
- **Master** multiple outlier detection algorithms (statistical and ML-based)
- **Implement** real-time anomaly detection for semiconductor processes
- **Build** consensus outlier detection frameworks
- **Create** interactive visualizations for outlier analysis
- **Apply** domain-specific validation rules for manufacturing
- **Design** production-ready outlier monitoring systems

## 🎯 What You'll Build

In this interactive notebook, you'll develop a comprehensive **Outlier Detection Pipeline** featuring:

1. **🔍 Statistical Methods**: Z-Score, IQR, Modified Z-Score with interactive controls
2. **🤖 ML-Based Detection**: Isolation Forest, One-Class SVM, Local Outlier Factor
3. **🎯 Consensus Framework**: Combine multiple methods for robust detection
4. **📊 Interactive Visualizations**: Real-time plotting with Plotly widgets
5. **⚡ Real-Time Simulation**: Streaming data outlier detection demo
6. **🏭 Domain Rules**: Semiconductor-specific physics-based validation
7. **📋 Comprehensive Reports**: Automated outlier analysis summaries

Let's dive into the fascinating world of anomaly detection! 🚀

In [None]:
# Essential imports for outlier detection
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from datetime import datetime, timedelta
from typing import Dict, List, Tuple, Optional, Union, Any
import time

# Scientific computing
from scipy import stats
from scipy.spatial.distance import mahalanobis
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

# Interactive widgets and visualization
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
from IPython.display import display, HTML, clear_output
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio

# Configure settings
warnings.filterwarnings('ignore')
plt.style.use('default')
sns.set_palette("husl")
pio.templates.default = "plotly_white"

# Random seed for reproducibility
np.random.seed(42)

print("✅ All libraries imported successfully!")
print(f"📊 Pandas: {pd.__version__} | NumPy: {np.__version__}")
print(f"🔧 Scikit-learn ready | 📈 Plotly ready | 🎛️ Widgets ready")
print("🚀 Ready to detect outliers!")

## 📊 Data Preparation: Creating Realistic Semiconductor Data

We'll create a **synthetic semiconductor manufacturing dataset** that mimics real-world conditions:

- **Multiple process parameters** (temperature, pressure, flow rates, etc.)
- **Realistic correlations** between related sensors
- **Planted outliers** of different types (point, contextual, collective)
- **Missing values** and **noise** typical in manufacturing
- **Recipe-based** process variations

This will help you understand outlier detection in a controlled environment before applying to real data.

In [None]:
def create_semiconductor_dataset(n_samples=2000, n_features=20, outlier_rate=0.05, random_state=42):
    """
    Create a realistic semiconductor manufacturing dataset with planted outliers.
    
    Parameters:
    -----------
    n_samples : int
        Number of process runs to simulate
    n_features : int  
        Number of process parameters/sensors
    outlier_rate : float
        Fraction of samples that should be outliers
    random_state : int
        Random seed for reproducibility
    
    Returns:
    --------
    pd.DataFrame : Semiconductor process data with outliers
    """
    np.random.seed(random_state)
    
    # Define realistic semiconductor process parameters
    param_names = [
        'chamber_pressure', 'plasma_temperature', 'rf_power', 'gas_flow_ar',
        'gas_flow_o2', 'etch_rate', 'substrate_temp', 'plasma_density',
        'ion_energy', 'residence_time', 'chuck_temp', 'wall_temp',
        'impedance_match', 'dc_bias', 'optical_emission', 'mass_spec_signal',
        'pump_speed', 'throttle_valve', 'backside_pressure', 'leak_rate'
    ]
    
    # Ensure we have enough parameter names
    if n_features > len(param_names):
        param_names.extend([f'sensor_{i:03d}' for i in range(len(param_names), n_features)])
    
    param_names = param_names[:n_features]
    
    # Create base data with realistic correlations
    data = np.random.normal(0, 1, (n_samples, n_features))
    
    # Add realistic correlations between related parameters
    correlations = [
        ([0, 1, 2], 0.7),      # Pressure, temperature, RF power
        ([3, 4], 0.6),         # Gas flows
        ([1, 6, 11], 0.5),     # Various temperatures
        ([7, 8, 14], 0.4),     # Plasma-related parameters
    ]
    
    for group, corr_strength in correlations:
        if max(group) < n_features:
            base_signal = np.random.normal(0, 1, n_samples)
            for idx in group:
                noise = np.random.normal(0, np.sqrt(1 - corr_strength**2), n_samples)
                data[:, idx] = corr_strength * base_signal + noise
    
    # Add process recipes (different operating conditions)
    recipes = ['Recipe_A', 'Recipe_B', 'Recipe_C', 'Recipe_D']
    recipe_column = np.random.choice(recipes, n_samples)
    
    # Apply recipe-specific offsets
    recipe_offsets = {
        'Recipe_A': [0, 0, 0],      # Baseline
        'Recipe_B': [1, -0.5, 0.8], # Higher pressure, lower temp
        'Recipe_C': [-1, 1, -0.5],  # Lower pressure, higher temp
        'Recipe_D': [0.5, 0.5, 1.2] # Mixed conditions
    }
    
    for recipe, offsets in recipe_offsets.items():
        mask = recipe_column == recipe
        for i, offset in enumerate(offsets):
            if i < n_features:
                data[mask, i] += offset
    
    # Plant different types of outliers
    n_outliers = int(outlier_rate * n_samples)
    outlier_indices = np.random.choice(n_samples, n_outliers, replace=False)
    true_outliers = np.zeros(n_samples, dtype=bool)
    true_outliers[outlier_indices] = True
    
    outlier_types = []
    
    for idx in outlier_indices:
        outlier_type = np.random.choice(['point', 'contextual', 'collective'], p=[0.6, 0.3, 0.1])
        outlier_types.append(outlier_type)
        
        if outlier_type == 'point':
            # Point outliers: extreme values in few features
            n_features_affected = np.random.randint(1, min(4, n_features))
            features_affected = np.random.choice(n_features, n_features_affected, replace=False)
            
            for feat in features_affected:
                magnitude = np.random.uniform(4, 8)  # 4-8 standard deviations
                sign = np.random.choice([-1, 1])
                data[idx, feat] = sign * magnitude
                
        elif outlier_type == 'contextual':
            # Contextual outliers: values normal individually but abnormal for recipe
            current_recipe = recipe_column[idx]
            wrong_recipe = np.random.choice([r for r in recipes if r != current_recipe])
            wrong_offsets = recipe_offsets[wrong_recipe]
            
            for i, offset in enumerate(wrong_offsets):
                if i < n_features:
                    # Apply wrong recipe parameters
                    data[idx, i] += offset * 2  # Amplify the difference
    
    # Create DataFrame
    df = pd.DataFrame(data, columns=param_names)
    
    # Add metadata columns
    df['recipe'] = recipe_column
    df['timestamp'] = pd.date_range(start='2024-01-01', periods=n_samples, freq='5min')
    df['true_outlier'] = true_outliers
    
    # Add some missing values (typical in manufacturing)
    missing_rate = 0.02
    for col in param_names:
        n_missing = int(np.random.poisson(missing_rate * n_samples))
        if n_missing > 0:
            missing_indices = np.random.choice(n_samples, min(n_missing, n_samples), replace=False)
            df.loc[missing_indices, col] = np.nan
    
    return df

# Create our semiconductor dataset
print("🏭 Generating synthetic semiconductor manufacturing data...")
semiconductor_data = create_semiconductor_dataset(n_samples=2000, n_features=15, outlier_rate=0.08)

print(f"✅ Dataset created successfully!")
print(f"📏 Shape: {semiconductor_data.shape}")
print(f"🎯 True outliers: {semiconductor_data['true_outlier'].sum()} ({semiconductor_data['true_outlier'].mean():.1%})")
print(f"📝 Recipes: {semiconductor_data['recipe'].value_counts().to_dict()}")
print(f"❓ Missing values: {semiconductor_data.isnull().sum().sum()} total")

# Display first few rows
display(semiconductor_data.head())

## 👀 Data Preview: Understanding Our Manufacturing Dataset

Let's explore our dataset interactively to understand the process parameters and identify obvious patterns.

In [None]:
# Interactive data exploration widget
def create_data_explorer():
    """Create interactive data exploration dashboard."""
    
    # Get feature columns (exclude metadata)
    feature_cols = [col for col in semiconductor_data.columns 
                   if col not in ['recipe', 'timestamp', 'true_outlier']]
    
    # Create widgets
    feature_selector = widgets.Dropdown(
        options=feature_cols,
        value=feature_cols[0],
        description='Parameter:',
        style={'description_width': 'initial'}
    )
    
    plot_type = widgets.RadioButtons(
        options=['Histogram', 'Box Plot', 'Time Series', 'Scatter vs True Outliers'],
        value='Histogram',
        description='Plot Type:',
        style={'description_width': 'initial'}
    )
    
    recipe_filter = widgets.SelectMultiple(
        options=['All'] + list(semiconductor_data['recipe'].unique()),
        value=['All'],
        description='Recipes:',
        style={'description_width': 'initial'}
    )
    
    def update_plot(feature, plot_style, recipes):
        """Update the plot based on widget selections."""
        
        # Filter data by recipe
        if 'All' in recipes:
            plot_data = semiconductor_data.copy()
        else:
            plot_data = semiconductor_data[semiconductor_data['recipe'].isin(recipes)].copy()
        
        if len(plot_data) == 0:
            print("❌ No data matches the selected filters")
            return
        
        # Create the plot
        fig = make_subplots(rows=1, cols=1)
        
        if plot_style == 'Histogram':
            # Separate normal and outlier data
            normal_data = plot_data[~plot_data['true_outlier']][feature].dropna()
            outlier_data = plot_data[plot_data['true_outlier']][feature].dropna()
            
            fig.add_trace(go.Histogram(
                x=normal_data,
                name='Normal',
                opacity=0.7,
                nbinsx=30,
                marker_color='blue'
            ))
            
            if len(outlier_data) > 0:
                fig.add_trace(go.Histogram(
                    x=outlier_data,
                    name='True Outliers',
                    opacity=0.7,
                    nbinsx=30,
                    marker_color='red'
                ))
            
            fig.update_layout(
                title=f"Distribution of {feature}",
                xaxis_title=feature,
                yaxis_title="Frequency",
                barmode='overlay'
            )
            
        elif plot_style == 'Box Plot':
            # Box plot by recipe
            for recipe in plot_data['recipe'].unique():
                recipe_data = plot_data[plot_data['recipe'] == recipe]
                fig.add_trace(go.Box(
                    y=recipe_data[feature],
                    name=recipe,
                    boxpoints='outliers'
                ))
            
            fig.update_layout(
                title=f"Box Plot of {feature} by Recipe",
                yaxis_title=feature
            )
            
        elif plot_style == 'Time Series':
            # Time series plot
            normal_data = plot_data[~plot_data['true_outlier']]
            outlier_data = plot_data[plot_data['true_outlier']]
            
            fig.add_trace(go.Scatter(
                x=normal_data['timestamp'],
                y=normal_data[feature],
                mode='markers',
                name='Normal',
                marker=dict(color='blue', size=4, opacity=0.6)
            ))
            
            if len(outlier_data) > 0:
                fig.add_trace(go.Scatter(
                    x=outlier_data['timestamp'],
                    y=outlier_data[feature],
                    mode='markers',
                    name='True Outliers',
                    marker=dict(color='red', size=8, symbol='x')
                ))
            
            fig.update_layout(
                title=f"Time Series of {feature}",
                xaxis_title="Time",
                yaxis_title=feature
            )
            
        elif plot_style == 'Scatter vs True Outliers':
            # Scatter plot colored by outlier status
            fig.add_trace(go.Scatter(
                x=plot_data.index,
                y=plot_data[feature],
                mode='markers',
                marker=dict(
                    color=plot_data['true_outlier'],
                    colorscale=['blue', 'red'],
                    size=6,
                    colorbar=dict(title="Is Outlier")
                ),
                text=plot_data['recipe'],
                hovertemplate=f"{feature}: %{{y:.3f}}<br>Recipe: %{{text}}<br>Outlier: %{{marker.color}}<extra></extra>"
            ))
            
            fig.update_layout(
                title=f"Scatter Plot of {feature} (Colored by Outlier Status)",
                xaxis_title="Sample Index",
                yaxis_title=feature
            )
        
        fig.update_layout(
            height=500,
            showlegend=True
        )
        
        fig.show()
        
        # Show summary statistics
        print(f"\n📊 Summary Statistics for {feature}:")
        print("-" * 50)
        summary = plot_data[feature].describe()
        for stat, value in summary.items():
            print(f"{stat:>10}: {value:>10.3f}")
        
        print(f"\n🔍 Outlier Analysis:")
        print("-" * 50)
        normal_mean = plot_data[~plot_data['true_outlier']][feature].mean()
        outlier_mean = plot_data[plot_data['true_outlier']][feature].mean()
        print(f"Normal mean:   {normal_mean:.3f}")
        print(f"Outlier mean:  {outlier_mean:.3f}")
        print(f"Difference:    {abs(outlier_mean - normal_mean):.3f}")
    
    # Create interactive widget
    widget = widgets.interactive(
        update_plot,
        feature=feature_selector,
        plot_style=plot_type,
        recipes=recipe_filter
    )
    
    return widget

# Display the interactive explorer
print("🎛️ Interactive Data Explorer")
print("Use the controls below to explore different process parameters and visualizations:")
data_explorer = create_data_explorer()
display(data_explorer)

## 📈 Statistical Outlier Detection Methods

Now let's implement and compare different statistical methods for outlier detection. These methods form the foundation of anomaly detection:

1. **Z-Score**: Identifies points more than `k` standard deviations from the mean
2. **Modified Z-Score**: Robust version using median and MAD
3. **IQR Method**: Based on interquartile range
4. **Mahalanobis Distance**: Considers correlations between features

Each method has different strengths and is suitable for different types of data and outliers.

In [None]:
class InteractiveStatisticalDetectors:
    """Interactive statistical outlier detection methods."""
    
    def __init__(self, data):
        self.data = data
        self.feature_cols = [col for col in data.columns 
                           if col not in ['recipe', 'timestamp', 'true_outlier']]
        self.results = {}
    
    def z_score_detection(self, threshold=3.0, features=None):
        """Z-Score based outlier detection."""
        if features is None:
            features = self.feature_cols
        
        feature_data = self.data[features].copy()
        
        # Calculate Z-scores
        z_scores = np.abs((feature_data - feature_data.mean()) / feature_data.std())
        
        # Identify outliers (any feature exceeds threshold)
        outliers = (z_scores > threshold).any(axis=1)
        
        # Anomaly score is the maximum Z-score
        scores = z_scores.max(axis=1)
        
        self.results['z_score'] = {
            'outliers': outliers,
            'scores': scores,
            'threshold': threshold,
            'n_outliers': outliers.sum(),
            'percentage': outliers.mean() * 100
        }
        
        return outliers, scores
    
    def modified_z_score_detection(self, threshold=3.5, features=None):
        """Modified Z-Score based outlier detection (robust)."""
        if features is None:
            features = self.feature_cols
        
        feature_data = self.data[features].copy()
        
        # Calculate Modified Z-scores using median and MAD
        median = feature_data.median()
        mad = (feature_data - median).abs().median()
        modified_z_scores = 0.6745 * np.abs((feature_data - median) / mad)
        
        # Identify outliers
        outliers = (modified_z_scores > threshold).any(axis=1)
        scores = modified_z_scores.max(axis=1)
        
        self.results['modified_z_score'] = {
            'outliers': outliers,
            'scores': scores,
            'threshold': threshold,
            'n_outliers': outliers.sum(),
            'percentage': outliers.mean() * 100
        }
        
        return outliers, scores
    
    def iqr_detection(self, k=1.5, features=None):
        """IQR based outlier detection."""
        if features is None:
            features = self.feature_cols
        
        feature_data = self.data[features].copy()
        
        q1 = feature_data.quantile(0.25)
        q3 = feature_data.quantile(0.75)
        iqr = q3 - q1
        
        lower_bound = q1 - k * iqr
        upper_bound = q3 + k * iqr
        
        # Identify outliers
        outliers_mask = (feature_data < lower_bound) | (feature_data > upper_bound)
        outliers = outliers_mask.any(axis=1)
        
        # Calculate distance from bounds as score
        dist_lower = np.maximum(0, lower_bound - feature_data)
        dist_upper = np.maximum(0, feature_data - upper_bound)
        scores = np.maximum(dist_lower, dist_upper).max(axis=1)
        
        self.results['iqr'] = {
            'outliers': outliers,
            'scores': scores,
            'k': k,
            'n_outliers': outliers.sum(),
            'percentage': outliers.mean() * 100
        }
        
        return outliers, scores
    
    def mahalanobis_detection(self, threshold_percentile=95, features=None):
        """Mahalanobis distance based outlier detection."""
        if features is None:
            features = self.feature_cols
        
        feature_data = self.data[features].dropna().copy()
        
        if len(feature_data) < len(features):
            print("⚠️ Insufficient data for Mahalanobis distance calculation")
            return np.zeros(len(self.data), dtype=bool), np.zeros(len(self.data))
        
        # Calculate mean and covariance
        mean = feature_data.mean().values
        cov_matrix = feature_data.cov().values
        
        # Use pseudo-inverse for numerical stability
        inv_cov = np.linalg.pinv(cov_matrix)
        
        # Calculate Mahalanobis distances
        distances = []
        for _, row in self.data[features].iterrows():
            if row.isnull().any():
                distances.append(np.nan)
            else:
                dist = mahalanobis(row.values, mean, inv_cov)
                distances.append(dist)
        
        distances = np.array(distances)
        
        # Set threshold based on percentile
        threshold = np.nanpercentile(distances, threshold_percentile)
        outliers = distances > threshold
        
        # Handle NaN values
        outliers = np.nan_to_num(outliers, nan=False)
        distances = np.nan_to_num(distances, nan=0)
        
        self.results['mahalanobis'] = {
            'outliers': outliers,
            'scores': distances,
            'threshold': threshold,
            'threshold_percentile': threshold_percentile,
            'n_outliers': outliers.sum(),
            'percentage': outliers.mean() * 100
        }
        
        return outliers, distances

# Initialize the detector
stat_detector = InteractiveStatisticalDetectors(semiconductor_data)

print("✅ Statistical outlier detectors initialized!")
print(f"📊 Ready to analyze {len(stat_detector.feature_cols)} process parameters")

In [None]:
def create_statistical_detector_widget():
    """Create interactive widget for statistical outlier detection."""
    
    # Method selection
    method_selector = widgets.Dropdown(
        options=['Z-Score', 'Modified Z-Score', 'IQR', 'Mahalanobis'],
        value='Z-Score',
        description='Method:',
        style={'description_width': 'initial'}
    )
    
    # Parameter controls
    z_threshold = widgets.FloatSlider(
        value=3.0,
        min=1.0,
        max=5.0,
        step=0.1,
        description='Z Threshold:',
        style={'description_width': 'initial'}
    )
    
    mod_z_threshold = widgets.FloatSlider(
        value=3.5,
        min=1.0,
        max=5.0,
        step=0.1,
        description='Mod-Z Threshold:',
        style={'description_width': 'initial'}
    )
    
    iqr_k = widgets.FloatSlider(
        value=1.5,
        min=0.5,
        max=3.0,
        step=0.1,
        description='IQR K:',
        style={'description_width': 'initial'}
    )
    
    mahal_percentile = widgets.IntSlider(
        value=95,
        min=85,
        max=99,
        step=1,
        description='Mahal %ile:',
        style={'description_width': 'initial'}
    )
    
    # Feature selection
    feature_selector = widgets.SelectMultiple(
        options=stat_detector.feature_cols,
        value=stat_detector.feature_cols[:5],  # Select first 5 by default
        description='Features:',
        style={'description_width': 'initial'},
        layout={'height': '120px'}
    )
    
    # Run button
    run_button = widgets.Button(
        description='🔍 Detect Outliers',
        button_style='primary',
        icon='search'
    )
    
    # Output area
    output = widgets.Output()
    
    def run_detection(button):
        """Run the selected detection method."""
        with output:
            clear_output(wait=True)
            
            method = method_selector.value
            features = list(feature_selector.value)
            
            if not features:
                print("❌ Please select at least one feature")
                return
            
            print(f"🔍 Running {method} detection on {len(features)} features...")
            
            # Run the appropriate detection method
            if method == 'Z-Score':
                outliers, scores = stat_detector.z_score_detection(
                    threshold=z_threshold.value, features=features
                )
                result_key = 'z_score'
                
            elif method == 'Modified Z-Score':
                outliers, scores = stat_detector.modified_z_score_detection(
                    threshold=mod_z_threshold.value, features=features
                )
                result_key = 'modified_z_score'
                
            elif method == 'IQR':
                outliers, scores = stat_detector.iqr_detection(
                    k=iqr_k.value, features=features
                )
                result_key = 'iqr'
                
            elif method == 'Mahalanobis':
                outliers, scores = stat_detector.mahalanobis_detection(
                    threshold_percentile=mahal_percentile.value, features=features
                )
                result_key = 'mahalanobis'
            
            # Display results
            result = stat_detector.results[result_key]
            
            print(f"\n📊 {method} Results:")
            print("-" * 40)
            print(f"Outliers detected: {result['n_outliers']} ({result['percentage']:.1f}%)")
            
            # True outlier comparison
            true_outliers = semiconductor_data['true_outlier'].values
            tp = np.sum(outliers & true_outliers)  # True positives
            fp = np.sum(outliers & ~true_outliers)  # False positives
            fn = np.sum(~outliers & true_outliers)  # False negatives
            tn = np.sum(~outliers & ~true_outliers)  # True negatives
            
            precision = tp / (tp + fp) if (tp + fp) > 0 else 0
            recall = tp / (tp + fn) if (tp + fn) > 0 else 0
            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
            
            print(f"\n🎯 Performance vs True Outliers:")
            print("-" * 40)
            print(f"True Positives:  {tp:>4d}")
            print(f"False Positives: {fp:>4d}")
            print(f"False Negatives: {fn:>4d}")
            print(f"True Negatives:  {tn:>4d}")
            print(f"Precision:       {precision:>7.3f}")
            print(f"Recall:          {recall:>7.3f}")
            print(f"F1-Score:        {f1:>7.3f}")
            
            # Create visualization
            create_outlier_visualization(outliers, scores, method, features)
    
    def create_outlier_visualization(outliers, scores, method, features):
        """Create visualization of outlier detection results."""
        
        # Create subplots
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                f'{method} Outlier Detection Results',
                'Anomaly Score Distribution',
                'Outliers in Time Series',
                'Feature Comparison'
            ],
            specs=[[{"colspan": 2}, None],
                   [{}, {}]]
        )
        
        # 1. Scatter plot of detection results
        detected_outliers = semiconductor_data[outliers]
        normal_points = semiconductor_data[~outliers]
        true_outliers = semiconductor_data[semiconductor_data['true_outlier']]
        
        # Normal points
        fig.add_trace(
            go.Scatter(
                x=normal_points.index,
                y=scores[~outliers],
                mode='markers',
                name='Normal',
                marker=dict(color='blue', size=4, opacity=0.6),
                hovertemplate="Index: %{x}<br>Score: %{y:.3f}<extra></extra>"
            ), row=1, col=1
        )
        
        # Detected outliers
        if len(detected_outliers) > 0:
            fig.add_trace(
                go.Scatter(
                    x=detected_outliers.index,
                    y=scores[outliers],
                    mode='markers',
                    name='Detected Outliers',
                    marker=dict(color='red', size=8, symbol='diamond'),
                    hovertemplate="Index: %{x}<br>Score: %{y:.3f}<extra></extra>"
                ), row=1, col=1
            )
        
        # True outliers overlay
        fig.add_trace(
            go.Scatter(
                x=true_outliers.index,
                y=scores[true_outliers.index],
                mode='markers',
                name='True Outliers',
                marker=dict(color='orange', size=10, symbol='x', line=dict(width=2)),
                hovertemplate="Index: %{x}<br>Score: %{y:.3f}<br>True Outlier<extra></extra>"
            ), row=1, col=1
        )
        
        # 2. Score distribution
        fig.add_trace(
            go.Histogram(
                x=scores,
                name='Score Distribution',
                nbinsx=30,
                opacity=0.7,
                marker_color='lightblue'
            ), row=2, col=1
        )
        
        # Add threshold line if applicable
        result_key = method.lower().replace('-', '_').replace(' ', '_')
        if result_key in stat_detector.results:
            threshold_val = stat_detector.results[result_key].get('threshold')
            if threshold_val:
                fig.add_vline(
                    x=threshold_val,
                    line_dash="dash",
                    line_color="red",
                    annotation_text=f"Threshold: {threshold_val:.2f}",
                    row=2, col=1
                )
        
        # 3. Time series view
        if len(features) > 0:
            feature = features[0]  # Use first selected feature
            
            fig.add_trace(
                go.Scatter(
                    x=semiconductor_data['timestamp'],
                    y=semiconductor_data[feature],
                    mode='markers',
                    name=f'{feature} (Normal)',
                    marker=dict(color='blue', size=3, opacity=0.5),
                    hovertemplate=f"{feature}: %{{y:.3f}}<br>%{{x}}<extra></extra>"
                ), row=2, col=2
            )
            
            if len(detected_outliers) > 0:
                fig.add_trace(
                    go.Scatter(
                        x=detected_outliers['timestamp'],
                        y=detected_outliers[feature],
                        mode='markers',
                        name=f'{feature} (Outliers)',
                        marker=dict(color='red', size=8, symbol='diamond'),
                        hovertemplate=f"{feature}: %{{y:.3f}}<br>%{{x}}<extra></extra>"
                    ), row=2, col=2
                )
        
        # Update layout
        fig.update_layout(
            height=800,
            title=f"{method} Outlier Detection Analysis",
            showlegend=True
        )
        
        fig.update_xaxes(title_text="Sample Index", row=1, col=1)
        fig.update_yaxes(title_text="Anomaly Score", row=1, col=1)
        fig.update_xaxes(title_text="Anomaly Score", row=2, col=1)
        fig.update_yaxes(title_text="Frequency", row=2, col=1)
        fig.update_xaxes(title_text="Time", row=2, col=2)
        fig.update_yaxes(title_text="Feature Value", row=2, col=2)
        
        fig.show()
    
    # Connect button to function
    run_button.on_click(run_detection)
    
    # Create parameter control panel that shows/hides based on method
    def update_parameter_visibility(change):
        method = change['new']
        z_threshold.layout.display = 'block' if method == 'Z-Score' else 'none'
        mod_z_threshold.layout.display = 'block' if method == 'Modified Z-Score' else 'none'
        iqr_k.layout.display = 'block' if method == 'IQR' else 'none'
        mahal_percentile.layout.display = 'block' if method == 'Mahalanobis' else 'none'
    
    method_selector.observe(update_parameter_visibility, names='value')
    
    # Initial parameter visibility
    mod_z_threshold.layout.display = 'none'
    iqr_k.layout.display = 'none'
    mahal_percentile.layout.display = 'none'
    
    # Layout
    controls = widgets.VBox([
        widgets.HTML("<h3>🔧 Detection Parameters</h3>"),
        method_selector,
        z_threshold,
        mod_z_threshold,
        iqr_k,
        mahal_percentile,
        widgets.HTML("<h3>📊 Feature Selection</h3>"),
        feature_selector,
        run_button
    ])
    
    return widgets.VBox([controls, output])

# Display the interactive statistical detector
print(\"🎛️ Interactive Statistical Outlier Detection\")\nprint(\"Use the controls below to experiment with different detection methods and parameters:\")\nstatistical_widget = create_statistical_detector_widget()\ndisplay(statistical_widget)
  },
  {
   "cell_type": "markdown",
   "id": "VSC-ml-methods",
   "metadata": {},
   "source": [
    "## 🤖 Machine Learning-Based Outlier Detection\n",
    "\n",
    "While statistical methods are interpretable and fast, machine learning approaches can capture more complex patterns and relationships in the data. Let's explore three powerful ML-based outlier detection algorithms:\n",
    "\n",
    "1. **🌲 Isolation Forest**: Isolates outliers by randomly partitioning the data\n",
    "2. **🎯 One-Class SVM**: Learns a boundary around normal data points\n",
    "3. **📍 Local Outlier Factor (LOF)**: Considers local density of data points\n",
    "\n",
    "These methods are particularly effective for:\n",
    "- High-dimensional data\n",
    "- Complex, non-linear patterns\n",
    "- Data with multiple normal clusters\n",
    "- When ground truth is limited"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "VSC-ml-detectors",
   "metadata": {},
   "outputs": [],
   "source": [
    "class InteractiveMLDetectors:\n",
    "    \"\"\"Interactive machine learning-based outlier detection methods.\"\"\"\n",
    "    \n",
    "    def __init__(self, data):\n",
    "        self.data = data\n",
    "        self.feature_cols = [col for col in data.columns \n",
    "                           if col not in ['recipe', 'timestamp', 'true_outlier']]\n",
    "        self.results = {}\n",
    "        self.models = {}\n",
    "    \n",
    "    def isolation_forest_detection(self, contamination=0.1, n_estimators=100, features=None, random_state=42):\n",
    "        \"\"\"Isolation Forest based outlier detection.\"\"\"\n",
    "        if features is None:\n",
    "            features = self.feature_cols\n",
    "        \n",
    "        feature_data = self.data[features].copy()\n",
    "        \n",
    "        # Handle missing values\n",
    "        feature_data = feature_data.fillna(feature_data.median())\n",
    "        \n",
    "        # Scale the data\n",
    "        scaler = StandardScaler()\n",
    "        data_scaled = scaler.fit_transform(feature_data)\n",
    "        \n",
    "        # Train Isolation Forest\n",
    "        model = IsolationForest(\n",
    "            contamination=contamination,\n",
    "            n_estimators=n_estimators,\n",
    "            random_state=random_state,\n",
    "            n_jobs=-1\n",
    "        )\n",
    "        \n",
    "        # Fit and predict\n",
    "        predictions = model.fit_predict(data_scaled)\n",
    "        scores = model.decision_function(data_scaled)\n",
    "        \n",
    "        # Convert predictions to boolean outliers\n",
    "        outliers = predictions == -1\n",
    "        \n",
    "        # Convert scores to positive values (higher = more anomalous)\n",
    "        anomaly_scores = -scores\n",
    "        \n",
    "        # Store model and results\n",
    "        self.models['isolation_forest'] = {'model': model, 'scaler': scaler}\n",
    "        self.results['isolation_forest'] = {\n",
    "            'outliers': outliers,\n",
    "            'scores': anomaly_scores,\n",
    "            'contamination': contamination,\n",
    "            'n_estimators': n_estimators,\n",
    "            'n_outliers': outliers.sum(),\n",
    "            'percentage': outliers.mean() * 100\n",
    "        }\n",
    "        \n",
    "        return outliers, anomaly_scores\n",
    "    \n",
    "    def oneclass_svm_detection(self, nu=0.1, kernel='rbf', gamma='scale', features=None):\n",
    "        \"\"\"One-Class SVM based outlier detection.\"\"\"\n",
    "        if features is None:\n",
    "            features = self.feature_cols\n",
    "        \n",
    "        feature_data = self.data[features].copy()\n",
    "        \n",
    "        # Handle missing values\n",
    "        feature_data = feature_data.fillna(feature_data.median())\n",
    "        \n",
    "        # Scale the data (SVM requires scaling)\n",
    "        scaler = StandardScaler()\n",
    "        data_scaled = scaler.fit_transform(feature_data)\n",
    "        \n",
    "        # Train One-Class SVM\n",
    "        model = OneClassSVM(nu=nu, kernel=kernel, gamma=gamma)\n",
    "        \n",
    "        # Fit and predict\n",
    "        predictions = model.fit_predict(data_scaled)\n",
    "        scores = model.decision_function(data_scaled)\n",
    "        \n",
    "        # Convert predictions to boolean outliers\n",
    "        outliers = predictions == -1\n",
    "        \n",
    "        # Convert scores to positive values\n",
    "        anomaly_scores = -scores\n",
    "        \n",
    "        # Store model and results\n",
    "        self.models['oneclass_svm'] = {'model': model, 'scaler': scaler}\n",
    "        self.results['oneclass_svm'] = {\n",
    "            'outliers': outliers,\n",
    "            'scores': anomaly_scores,\n",
    "            'nu': nu,\n",
    "            'kernel': kernel,\n",
    "            'gamma': gamma,\n",
    "            'n_outliers': outliers.sum(),\n",
    "            'percentage': outliers.mean() * 100\n",
    "        }\n",
    "        \n",
    "        return outliers, anomaly_scores\n",
    "    \n",
    "    def lof_detection(self, n_neighbors=20, contamination=0.1, features=None):\n",
    "        \"\"\"Local Outlier Factor based outlier detection.\"\"\"\n",
    "        if features is None:\n",
    "            features = self.feature_cols\n",
    "        \n",
    "        feature_data = self.data[features].copy()\n",
    "        \n",
    "        # Handle missing values\n",
    "        feature_data = feature_data.fillna(feature_data.median())\n",
    "        \n",
    "        # Scale the data\n",
    "        scaler = StandardScaler()\n",
    "        data_scaled = scaler.fit_transform(feature_data)\n",
    "        \n",
    "        # Train LOF\n",
    "        model = LocalOutlierFactor(\n",
    "            n_neighbors=n_neighbors,\n",
    "            contamination=contamination\n",
    "        )\n",
    "        \n",
    "        # Fit and predict\n",
    "        predictions = model.fit_predict(data_scaled)\n",
    "        scores = model.negative_outlier_factor_\n",
    "        \n",
    "        # Convert predictions to boolean outliers\n",
    "        outliers = predictions == -1\n",
    "        \n",
    "        # Convert scores to positive values\n",
    "        anomaly_scores = -scores\n",
    "        \n",
    "        # Store results (LOF doesn't store the model for new predictions)\n",
    "        self.results['lof'] = {\n",
    "            'outliers': outliers,\n",
    "            'scores': anomaly_scores,\n",
    "            'n_neighbors': n_neighbors,\n",
    "            'contamination': contamination,\n",
    "            'n_outliers': outliers.sum(),\n",
    "            'percentage': outliers.mean() * 100\n",
    "        }\n",
    "        \n",
    "        return outliers, anomaly_scores\n",
    "    \n",
    "    def predict_new_data(self, new_data, method='isolation_forest'):\n",
    "        \"\"\"Predict outliers on new data using trained models.\"\"\"\n",
    "        if method not in self.models:\n",
    "            raise ValueError(f\"Model {method} not trained yet\")\n",
    "        \n",
    "        model_info = self.models[method]\n",
    "        model = model_info['model']\n",
    "        scaler = model_info['scaler']\n",
    "        \n",
    "        # Preprocess new data\n",
    "        new_data_scaled = scaler.transform(new_data.fillna(new_data.median()))\n",
    "        \n",
    "        # Predict\n",
    "        predictions = model.predict(new_data_scaled)\n",
    "        scores = model.decision_function(new_data_scaled)\n",
    "        \n",
    "        outliers = predictions == -1\n",
    "        anomaly_scores = -scores\n",
    "        \n",
    "        return outliers, anomaly_scores\n",
    "\n",
    "# Initialize the ML detector\n",
    "ml_detector = InteractiveMLDetectors(semiconductor_data)\n",
    "\n",
    "print(\"✅ Machine Learning outlier detectors initialized!\")\n",
    "print(f\"🤖 Ready to train models on {len(ml_detector.feature_cols)} process parameters\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "VSC-interactive-ml",
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_ml_detector_widget():\n",
    "    \"\"\"Create interactive widget for machine learning outlier detection.\"\"\"\n",
    "    \n",
    "    # Method selection\n",
    "    method_selector = widgets.Dropdown(\n",
    "        options=['Isolation Forest', 'One-Class SVM', 'Local Outlier Factor'],\n",
    "        value='Isolation Forest',\n",
    "        description='ML Method:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    # Isolation Forest parameters\n",
    "    if_contamination = widgets.FloatSlider(\n",
    "        value=0.1,\n",
    "        min=0.01,\n",
    "        max=0.3,\n",
    "        step=0.01,\n",
    "        description='Contamination:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    if_n_estimators = widgets.IntSlider(\n",
    "        value=100,\n",
    "        min=50,\n",
    "        max=300,\n",
    "        step=25,\n",
    "        description='N Estimators:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    # One-Class SVM parameters\n",
    "    svm_nu = widgets.FloatSlider(\n",
    "        value=0.1,\n",
    "        min=0.01,\n",
    "        max=0.5,\n",
    "        step=0.01,\n",
    "        description='Nu:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    svm_kernel = widgets.Dropdown(\n",
    "        options=['rbf', 'linear', 'poly', 'sigmoid'],\n",
    "        value='rbf',\n",
    "        description='Kernel:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    svm_gamma = widgets.Dropdown(\n",
    "        options=['scale', 'auto'],\n",
    "        value='scale',\n",
    "        description='Gamma:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    # LOF parameters\n",
    "    lof_neighbors = widgets.IntSlider(\n",
    "        value=20,\n",
    "        min=5,\n",
    "        max=50,\n",
    "        step=5,\n",
    "        description='N Neighbors:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    lof_contamination = widgets.FloatSlider(\n",
    "        value=0.1,\n",
    "        min=0.01,\n",
    "        max=0.3,\n",
    "        step=0.01,\n",
    "        description='Contamination:',\n",
    "        style={'description_width': 'initial'}\n",
    "    )\n",
    "    \n",
    "    # Feature selection\n",
    "    feature_selector = widgets.SelectMultiple(\n",
    "        options=ml_detector.feature_cols,\n",
    "        value=ml_detector.feature_cols[:8],  # Select first 8 by default\n",
    "        description='Features:',\n",
    "        style={'description_width': 'initial'},\n",
    "        layout={'height': '150px'}\n",
    "    )\n",
    "    \n",
    "    # Run button\n",
    "    run_button = widgets.Button(\n",
    "        description='🤖 Train & Detect',\n",
    "        button_style='success',\n",
    "        icon='cogs'\n",
    "    )\n",
    "    \n",
    "    # Compare button\n",
    "    compare_button = widgets.Button(\n",
    "        description='⚖️ Compare Methods',\n",
    "        button_style='info',\n",
    "        icon='balance-scale'\n",
    "    )\n",
    "    \n",
    "    # Output area\n",
    "    output = widgets.Output()\n",
    "    \n",
    "    def run_ml_detection(button):\n",
    "        \"\"\"Run the selected ML detection method.\"\"\"\n",
    "        with output:\n",
    "            clear_output(wait=True)\n",
    "            \n",
    "            method = method_selector.value\n",
    "            features = list(feature_selector.value)\n",
    "            \n",
    "            if not features:\n",
    "                print(\"❌ Please select at least one feature\")\n",
    "                return\n",
    "            \n",
    "            print(f\"🤖 Training {method} on {len(features)} features...\")\n",
    "            start_time = time.time()\n",
    "            \n",
    "            # Run the appropriate detection method\n",
    "            if method == 'Isolation Forest':\n",
    "                outliers, scores = ml_detector.isolation_forest_detection(\n",
    "                    contamination=if_contamination.value,\n",
    "                    n_estimators=if_n_estimators.value,\n",
    "                    features=features\n",
    "                )\n",
    "                result_key = 'isolation_forest'\n",
    "                \n",
    "            elif method == 'One-Class SVM':\n",
    "                outliers, scores = ml_detector.oneclass_svm_detection(\n",
    "                    nu=svm_nu.value,\n",
    "                    kernel=svm_kernel.value,\n",
    "                    gamma=svm_gamma.value,\n",
    "                    features=features\n",
    "                )\n",
    "                result_key = 'oneclass_svm'\n",
    "                \n",
    "            elif method == 'Local Outlier Factor':\n",
    "                outliers, scores = ml_detector.lof_detection(\n",
    "                    n_neighbors=lof_neighbors.value,\n",
    "                    contamination=lof_contamination.value,\n",
    "                    features=features\n",
    "                )\n",
    "                result_key = 'lof'\n",
    "            \n",
    "            training_time = time.time() - start_time\n",
    "            \n",
    "            # Display results\n",
    "            result = ml_detector.results[result_key]\n",
    "            \n",
    "            print(f\"\\n📊 {method} Results:\")\n",
    "            print(\"-\" * 40)\n",
    "            print(f\"Training time: {training_time:.2f} seconds\")\n",
    "            print(f\"Outliers detected: {result['n_outliers']} ({result['percentage']:.1f}%)\")\n",
    "            \n",
    "            # Performance evaluation\n",
    "            true_outliers = semiconductor_data['true_outlier'].values\n",
    "            tp = np.sum(outliers & true_outliers)\n",
    "            fp = np.sum(outliers & ~true_outliers)\n",
    "            fn = np.sum(~outliers & true_outliers)\n",
    "            tn = np.sum(~outliers & ~true_outliers)\n",
    "            \n",
    "            precision = tp / (tp + fp) if (tp + fp) > 0 else 0\n",
    "            recall = tp / (tp + fn) if (tp + fn) > 0 else 0\n",
    "            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0\n",
    "            accuracy = (tp + tn) / len(true_outliers)\n",
    "            \n",
    "            print(f\"\\n🎯 Performance Metrics:\")\n",
    "            print(\"-\" * 40)\n",
    "            print(f\"Accuracy:        {accuracy:>7.3f}\")\n",
    "            print(f\"Precision:       {precision:>7.3f}\")\n",
    "            print(f\"Recall:          {recall:>7.3f}\")\n",
    "            print(f\"F1-Score:        {f1:>7.3f}\")\n",
    "            \n",
    "            # Model-specific info\n",
    "            print(f\"\\n⚙️ Model Parameters:\")\n",
    "            print(\"-\" * 40)\n",
    "            if method == 'Isolation Forest':\n",
    "                print(f\"Contamination:   {result['contamination']:.3f}\")\n",
    "                print(f\"N Estimators:    {result['n_estimators']}\")\n",
    "            elif method == 'One-Class SVM':\n",
    "                print(f\"Nu:              {result['nu']:.3f}\")\n",
    "                print(f\"Kernel:          {result['kernel']}\")\n",
    "                print(f\"Gamma:           {result['gamma']}\")\n",
    "            elif method == 'Local Outlier Factor':\n",
    "                print(f\"N Neighbors:     {result['n_neighbors']}\")\n",
    "                print(f\"Contamination:   {result['contamination']:.3f}\")\n",
    "            \n",
    "            # Create visualization\n",
    "            create_ml_visualization(outliers, scores, method, features)\n",
    "    \n",
    "    def compare_all_methods(button):\n",
    "        \"\"\"Compare all ML methods side by side.\"\"\"\n",
    "        with output:\n",
    "            clear_output(wait=True)\n",
    "            \n",
    "            features = list(feature_selector.value)\n",
    "            \n",
    "            if not features:\n",
    "                print(\"❌ Please select at least one feature\")\n",
    "                return\n",
    "            \n",
    "            print(f\"⚖️ Comparing all ML methods on {len(features)} features...\")\n",
    "            \n",
    "            # Run all methods\n",
    "            methods_results = {}\n",
    "            \n",
    "            # Isolation Forest\n",
    "            print(\"🌲 Running Isolation Forest...\")\n",
    "            start_time = time.time()\n",
    "            if_outliers, if_scores = ml_detector.isolation_forest_detection(\n",
    "                contamination=if_contamination.value,\n",
    "                n_estimators=if_n_estimators.value,\n",
    "                features=features\n",
    "            )\n",
    "            methods_results['Isolation Forest'] = {\n",
    "                'outliers': if_outliers,\n",
    "                'scores': if_scores,\n",
    "                'time': time.time() - start_time\n",
    "            }\n",
    "            \n",
    "            # One-Class SVM\n",
    "            print(\"🎯 Running One-Class SVM...\")\n",
    "            start_time = time.time()\n",
    "            svm_outliers, svm_scores = ml_detector.oneclass_svm_detection(\n",
    "                nu=svm_nu.value,\n",
    "                kernel=svm_kernel.value,\n",
    "                gamma=svm_gamma.value,\n",
    "                features=features\n",
    "            )\n",
    "            methods_results['One-Class SVM'] = {\n",
    "                'outliers': svm_outliers,\n",
    "                'scores': svm_scores,\n",
    "                'time': time.time() - start_time\n",
    "            }\n",
    "            \n",
    "            # LOF\n",
    "            print(\"📍 Running Local Outlier Factor...\")\n",
    "            start_time = time.time()\n",
    "            lof_outliers, lof_scores = ml_detector.lof_detection(\n",
    "                n_neighbors=lof_neighbors.value,\n",
    "                contamination=lof_contamination.value,\n",
    "                features=features\n",
    "            )\n",
    "            methods_results['LOF'] = {\n",
    "                'outliers': lof_outliers,\n",
    "                'scores': lof_scores,\n",
    "                'time': time.time() - start_time\n",
    "            }\n",
    "            \n",
    "            # Create comparison visualization\n",
    "            create_ml_comparison_visualization(methods_results, features)\n",
    "    \n",
    "    def create_ml_visualization(outliers, scores, method, features):\n",
    "        \"\"\"Create visualization for ML outlier detection results.\"\"\"\n",
    "        \n",
    "        # Create subplots\n",
    "        fig = make_subplots(\n",
    "            rows=2, cols=2,\n",
    "            subplot_titles=[\n",
    "                f'{method} Detection Results',\n",
    "                'Anomaly Score Distribution',\n",
    "                'Feature Space (2D Projection)',\n",
    "                'Performance Analysis'\n",
    "            ]\n",
    "        )\n",
    "        \n",
    "        # 1. Detection results scatter\n",
    "        normal_indices = np.where(~outliers)[0]\n",
    "        outlier_indices = np.where(outliers)[0]\n",
    "        true_outlier_indices = np.where(semiconductor_data['true_outlier'])[0]\n",
    "        \n",
    "        # Normal points\n",
    "        fig.add_trace(\n",
    "            go.Scatter(\n",
    "                x=normal_indices,\n",
    "                y=scores[normal_indices],\n",
    "                mode='markers',\n",
    "                name='Normal',\n",
    "                marker=dict(color='blue', size=4, opacity=0.6)\n",
    "            ), row=1, col=1\n",
    "        )\n",
    "        \n",
    "        # Detected outliers\n",
    "        if len(outlier_indices) > 0:\n",
    "            fig.add_trace(\n",
    "                go.Scatter(\n",
    "                    x=outlier_indices,\n",
    "                    y=scores[outlier_indices],\n",
    "                    mode='markers',\n",
    "                    name='Detected Outliers',\n",
    "                    marker=dict(color='red', size=8, symbol='diamond')\n",
    "                ), row=1, col=1\n",
    "            )\n",
    "        \n",
    "        # True outliers overlay\n",
    "        fig.add_trace(\n",
    "            go.Scatter(\n",
    "                x=true_outlier_indices,\n",
    "                y=scores[true_outlier_indices],\n",
    "                mode='markers',\n",
    "                name='True Outliers',\n",
    "                marker=dict(color='orange', size=10, symbol='x', line=dict(width=2))\n",
    "            ), row=1, col=1\n",
    "        )\n",
    "        \n",
    "        # 2. Score distribution\n",
    "        fig.add_trace(\n",
    "            go.Histogram(\n",
    "                x=scores,\n",
    "                name='Anomaly Scores',\n",
    "                nbinsx=30,\n",
    "                opacity=0.7,\n",
    "                marker_color='lightblue'\n",
    "            ), row=1, col=2\n",
    "        )\n",
    "        \n",
    "        # 3. 2D feature space (if we have at least 2 features)\n",
    "        if len(features) >= 2:\n",
    "            feat1, feat2 = features[0], features[1]\n",
    "            \n",
    "            # Normal points\n",
    "            fig.add_trace(\n",
    "                go.Scatter(\n",
    "                    x=semiconductor_data.loc[~outliers, feat1],\n",
    "                    y=semiconductor_data.loc[~outliers, feat2],\n",
    "                    mode='markers',\n",
    "                    name=f'Normal ({feat1} vs {feat2})',\n",
    "                    marker=dict(color='blue', size=4, opacity=0.6)\n",
    "                ), row=2, col=1\n",
    "            )\n",
    "            \n",
    "            # Outliers\n",
    "            if len(outlier_indices) > 0:\n",
    "                fig.add_trace(\n",
    "                    go.Scatter(\n",
    "                        x=semiconductor_data.loc[outliers, feat1],\n",
    "                        y=semiconductor_data.loc[outliers, feat2],\n",
    "                        mode='markers',\n",
    "                        name=f'Outliers ({feat1} vs {feat2})',\n",
    "                        marker=dict(color='red', size=8, symbol='diamond')\n",
    "                    ), row=2, col=1\n",
    "                )\n",
    "        \n",
    "        # 4. Performance metrics visualization\n",
    "        true_outliers = semiconductor_data['true_outlier'].values\n",
    "        tp = np.sum(outliers & true_outliers)\n",
    "        fp = np.sum(outliers & ~true_outliers)\n",
    "        fn = np.sum(~outliers & true_outliers)\n",
    "        tn = np.sum(~outliers & ~true_outliers)\n",
    "        \n",
    "        # Confusion matrix heatmap\n",
    "        conf_matrix = np.array([[tn, fp], [fn, tp]])\n",
    "        \n",
    "        fig.add_trace(\n",
    "            go.Heatmap(\n",
    "                z=conf_matrix,\n",
    "                x=['Predicted Normal', 'Predicted Outlier'],\n",
    "                y=['Actual Normal', 'Actual Outlier'],\n",
    "                colorscale='Blues',\n",
    "                text=conf_matrix,\n",
    "                texttemplate=\"%{text}\",\n",
    "                textfont={\"size\": 16},\n",
    "                showscale=False\n",
    "            ), row=2, col=2\n",
    "        )\n",
    "        \n",
    "        # Update layout\n",
    "        fig.update_layout(\n",
    "            height=800,\n",
    "            title=f\"{method} Outlier Detection Analysis\",\n",
    "            showlegend=True\n",
    "        )\n",
    "        \n",
    "        # Update axes labels\n",
    "        fig.update_xaxes(title_text=\"Sample Index\", row=1, col=1)\n",
    "        fig.update_yaxes(title_text=\"Anomaly Score\", row=1, col=1)\n",
    "        fig.update_xaxes(title_text=\"Anomaly Score\", row=1, col=2)\n",
    "        fig.update_yaxes(title_text=\"Frequency\", row=1, col=2)\n",
    "        \n",
    "        if len(features) >= 2:\n",
    "            fig.update_xaxes(title_text=features[0], row=2, col=1)\n",
    "            fig.update_yaxes(title_text=features[1], row=2, col=1)\n",
    "        \n",
    "        fig.show()\n",
    "    \n",
    "    def create_ml_comparison_visualization(methods_results, features):\n",
    "        \"\"\"Create comparison visualization for all ML methods.\"\"\"\n",
    "        \n",
    "        # Calculate performance metrics for each method\n",
    "        true_outliers = semiconductor_data['true_outlier'].values\n",
    "        performance_data = []\n",
    "        \n",
    "        for method_name, results in methods_results.items():\n",
    "            outliers = results['outliers']\n",
    "            tp = np.sum(outliers & true_outliers)\n",
    "            fp = np.sum(outliers & ~true_outliers)\n",
    "            fn = np.sum(~outliers & true_outliers)\n",
    "            tn = np.sum(~outliers & ~true_outliers)\n",
    "            \n",
    "            precision = tp / (tp + fp) if (tp + fp) > 0 else 0\n",
    "            recall = tp / (tp + fn) if (tp + fn) > 0 else 0\n",
    "            f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0\n",
    "            accuracy = (tp + tn) / len(true_outliers)\n",
    "            \n",
    "            performance_data.append({\n",
    "                'Method': method_name,\n",
    "                'Accuracy': accuracy,\n",
    "                'Precision': precision,\n",
    "                'Recall': recall,\n",
    "                'F1-Score': f1,\n",
    "                'Time (s)': results['time'],\n",
    "                'N_Outliers': outliers.sum()\n",
    "            })\n",
    "        \n",
    "        # Create comparison visualizations\n",
    "        fig = make_subplots(\n",
    "            rows=2, cols=2,\n",
    "            subplot_titles=[\n",
    "                'Performance Metrics Comparison',\n",
    "                'Training Time Comparison',\n",
    "                'Outlier Count Comparison',\n",
    "                'Method Agreement Analysis'\n",
    "            ]\n",
    "        )\n",
    "        \n",
    "        methods = list(methods_results.keys())\n",
    "        colors = ['blue', 'red', 'green']\n",
    "        \n",
    "        # 1. Performance metrics\n",
    "        metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']\n",
    "        for i, metric in enumerate(metrics):\n",
    "            values = [p[metric] for p in performance_data]\n",
    "            fig.add_trace(\n",
    "                go.Bar(\n",
    "                    x=methods,\n",
    "                    y=values,\n",
    "                    name=metric,\n",
    "                    text=[f\"{v:.3f}\" for v in values],\n",
    "                    textposition='auto'\n",
    "                ), row=1, col=1\n",
    "            )\n",
    "        \n",
    "        # 2. Training time\n",
    "        times = [p['Time (s)'] for p in performance_data]\n",
    "        fig.add_trace(\n",
    "            go.Bar(\n",
    "                x=methods,\n",
    "                y=times,\n",
    "                name='Training Time',\n",
    "                marker_color='orange',\n",
    "                text=[f\"{t:.2f}s\" for t in times],\n",
    "                textposition='auto'\n",
    "            ), row=1, col=2\n",
    "        )\n",
    "        \n",
    "        # 3. Outlier counts\n",
    "        outlier_counts = [p['N_Outliers'] for p in performance_data]\n",
    "        true_count = np.sum(true_outliers)\n",
    "        \n",
    "        fig.add_trace(\n",
    "            go.Bar(\n",
    "                x=methods,\n",
    "                y=outlier_counts,\n",
    "                name='Detected Outliers',\n",
    "                marker_color='purple',\n",
    "                text=outlier_counts,\n",
    "                textposition='auto'\n",
    "            ), row=2, col=1\n",
    "        )\n",
    "        \n",
    "        # Add true outlier count line\n",
    "        fig.add_hline(\n",
    "            y=true_count,\n",
    "            line_dash=\"dash\",\n",
    "            line_color=\"red\",\n",
    "            annotation_text=f\"True Outliers: {true_count}\",\n",
    "            row=2, col=1\n",
    "        )\n",
    "        \n",
    "        # 4. Method agreement matrix\n",
    "        agreement_matrix = np.zeros((len(methods), len(methods)))\n",
    "        for i, method1 in enumerate(methods):\n",
    "            for j, method2 in enumerate(methods):\n",
    "                outliers1 = methods_results[method1]['outliers']\n",
    "                outliers2 = methods_results[method2]['outliers']\n",
    "                agreement = np.sum(outliers1 == outliers2) / len(outliers1)\n",
    "                agreement_matrix[i, j] = agreement\n",
    "        \n",
    "        fig.add_trace(\n",
    "            go.Heatmap(\n",
    "                z=agreement_matrix,\n",
    "                x=methods,\n",
    "                y=methods,\n",
    "                colorscale='RdYlBu_r',\n",
    "                text=np.round(agreement_matrix, 3),\n",
    "                texttemplate=\"%{text}\",\n",
    "                textfont={\"size\": 12},\n",
    "                colorbar=dict(title=\"Agreement\")\n",
    "            ), row=2, col=2\n",
    "        )\n",
    "        \n",
    "        # Update layout\n",
    "        fig.update_layout(\n",
    "            height=800,\n",
    "            title=\"ML Methods Comparison Dashboard\",\n",
    "            showlegend=True\n",
    "        )\n",
    "        \n",
    "        fig.show()\n",
    "        \n",
    "        # Print summary table\n",
    "        print(\"\\n📊 Performance Summary Table:\")\n",
    "        print(\"=\" * 80)\n",
    "        print(f\"{'Method':<18} {'Accuracy':<10} {'Precision':<10} {'Recall':<10} {'F1-Score':<10} {'Time (s)':<10}\")\n",
    "        print(\"=\" * 80)\n",
    "        for p in performance_data:\n",
    "            print(f\"{p['Method']:<18} {p['Accuracy']:<10.3f} {p['Precision']:<10.3f} {p['Recall']:<10.3f} {p['F1-Score']:<10.3f} {p['Time (s)']:<10.2f}\")\n",
    "        print(\"=\" * 80)\n",
    "    \n",
    "    # Connect buttons to functions\n",
    "    run_button.on_click(run_ml_detection)\n",
    "    compare_button.on_click(compare_all_methods)\n",
    "    \n",
    "    # Create parameter control panels that show/hide based on method\n",
    "    def update_parameter_visibility(change):\n",
    "        method = change['new']\n",
    "        # Isolation Forest parameters\n",
    "        if_contamination.layout.display = 'block' if method == 'Isolation Forest' else 'none'\n",
    "        if_n_estimators.layout.display = 'block' if method == 'Isolation Forest' else 'none'\n",
    "        # SVM parameters\n",
    "        svm_nu.layout.display = 'block' if method == 'One-Class SVM' else 'none'\n",
    "        svm_kernel.layout.display = 'block' if method == 'One-Class SVM' else 'none'\n",
    "        svm_gamma.layout.display = 'block' if method == 'One-Class SVM' else 'none'\n",
    "        # LOF parameters\n",
    "        lof_neighbors.layout.display = 'block' if method == 'Local Outlier Factor' else 'none'\n",
    "        lof_contamination.layout.display = 'block' if method == 'Local Outlier Factor' else 'none'\n",
    "    \n",
    "    method_selector.observe(update_parameter_visibility, names='value')\n",
    "    \n",
    "    # Initial parameter visibility\n",
    "    svm_nu.layout.display = 'none'\n",
    "    svm_kernel.layout.display = 'none'\n",
    "    svm_gamma.layout.display = 'none'\n",
    "    lof_neighbors.layout.display = 'none'\n",
    "    lof_contamination.layout.display = 'none'\n",
    "    \n",
    "    # Layout\n",
    "    controls = widgets.VBox([\n",
    "        widgets.HTML(\"<h3>🤖 ML Algorithm Selection</h3>\"),\n",
    "        method_selector,\n",
    "        widgets.HTML(\"<h3>⚙️ Algorithm Parameters</h3>\"),\n",
    "        # Isolation Forest\n",
    "        if_contamination,\n",
    "        if_n_estimators,\n",
    "        # One-Class SVM\n",
    "        svm_nu,\n",
    "        svm_kernel,\n",
    "        svm_gamma,\n",
    "        # LOF\n",
    "        lof_neighbors,\n",
    "        lof_contamination,\n",
    "        widgets.HTML(\"<h3>📊 Feature Selection</h3>\"),\n",
    "        feature_selector,\n",
    "        widgets.HTML(\"<h3>🚀 Actions</h3>\"),\n",
    "        widgets.HBox([run_button, compare_button])\n",
    "    ])\n",
    "    \n",
    "    return widgets.VBox([controls, output])\n",
    "\n",
    "# Display the interactive ML detector\n",
    "print(\"🤖 Interactive Machine Learning Outlier Detection\")\n",
    "print(\"Experiment with different ML algorithms and compare their performance:\")\n",
    "ml_widget = create_ml_detector_widget()\ndisplay(ml_widget)"