# Validation and Benchmarking

## Purpose

This notebook teaches you how to validate framework outputs against reference data (like MPM systems) and benchmark framework performance. You'll learn validation concepts, benchmarking techniques, MPM comparison workflows, accuracy validation, and statistical validation methods using a unified interactive interface with real-time progress tracking and detailed logging.

## Learning Objectives

By the end of this notebook, you will:
- ‚úÖ Understand validation concepts and their importance in AM-QADF
- ‚úÖ Benchmark framework operations for performance analysis
- ‚úÖ Compare framework outputs with Melt Pool Monitoring (MPM) systems
- ‚úÖ Perform accuracy validation against ground truth data
- ‚úÖ Conduct statistical validation using hypothesis testing
- ‚úÖ Generate comprehensive validation reports
- ‚úÖ Interpret validation results and make data-driven decisions
- ‚úÖ Monitor validation progress with real-time status and logs

## Estimated Duration

90-120 minutes

---

## Overview

Validation and benchmarking are critical for ensuring framework reliability and performance. The AM-QADF validation module provides:

- ‚è±Ô∏è **Performance Benchmarking**: Measure execution time, memory usage, and throughput
- üî¨ **MPM Comparison**: Compare framework outputs with Melt Pool Monitoring systems
- üéØ **Accuracy Validation**: Validate against ground truth with RMSE, MAE, R¬≤ metrics
- üìä **Statistical Validation**: Hypothesis testing, correlation analysis, significance tests
- üìà **Comprehensive Reports**: Generate detailed validation reports with visualizations
- üìä **Real-Time Monitoring**: Track progress with status bars and detailed execution logs
- ‚è±Ô∏è **Time Tracking**: Monitor execution time for all validation operations

The notebook features a unified interactive interface with:
- **Progress Tracking**: Visual progress bars showing completion percentage
- **Status Monitoring**: Real-time status updates with elapsed time
- **Detailed Logging**: Timestamped logs with success/warning/error indicators for all operations
- **Error Handling**: Comprehensive error messages and tracebacks in the logs

Use the interactive widgets below to validate and benchmark - no coding required! Monitor your validation progress in real-time using the status bar and logs section at the bottom.

In [1]:
# Setup: Import required libraries
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add parent directory and src directory to path for imports
notebook_dir = Path().resolve()
project_root = notebook_dir.parent
src_dir = project_root / 'src'

# Add project root to path (for src.infrastructure imports)
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Add src directory to path (for am_qadf imports)
if str(src_dir) not in sys.path:
    sys.path.insert(0, str(src_dir))

# Core imports
import ipywidgets as widgets
from ipywidgets import (
    VBox, HBox, Accordion, Tab, Dropdown, RadioButtons, 
    Checkbox, Button, Output, Text, IntSlider, FloatSlider,
    Layout, Box, Label, FloatText, IntText, SelectMultiple,
    HTML as WidgetHTML, Textarea, FileUpload
)
from IPython.display import display, Markdown, HTML, clear_output
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import time
import json
from typing import Optional, Tuple, Dict, Any, List

# Set style for plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load environment variables from development.env
import os
env_file = project_root / 'development.env'
if env_file.exists():
    with open(env_file, 'r') as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith('#') and '=' in line:
                key, value = line.split('=', 1)
                value = value.strip('"\'')
                os.environ[key] = value
    print("‚úÖ Environment variables loaded from development.env")

# Try to import validation classes
VALIDATION_AVAILABLE = False
validation_client = None
quality_client = None

try:
    from am_qadf.validation import (
        ValidationClient, ValidationConfig,
        PerformanceBenchmarker, BenchmarkResult,
        MPMComparisonEngine, MPMComparisonResult,
        AccuracyValidator, AccuracyValidationResult,
        StatisticalValidator, StatisticalValidationResult
    )
    from am_qadf.validation.benchmarking import benchmark
    VALIDATION_AVAILABLE = True
    print("‚úÖ Validation classes available")
except ImportError as e:
    print(f"‚ö†Ô∏è Validation classes not available: {e} - using demo mode")

# Try to import quality assessment client
try:
    from am_qadf.quality.quality_assessment_client import QualityAssessmentClient
    quality_client = QualityAssessmentClient(enable_validation=VALIDATION_AVAILABLE)
    print("‚úÖ Quality assessment client with validation available")
except ImportError as e:
    print(f"‚ö†Ô∏è Quality assessment client not available: {e}")

# MongoDB connection setup (optional, for loading real data)
INFRASTRUCTURE_AVAILABLE = False
mongo_client = None
voxel_storage = None

try:
    from src.infrastructure.config import MongoDBConfig
    from src.infrastructure.database import MongoDBClient
    from am_qadf.voxel_domain import VoxelGridStorage
    
    config = MongoDBConfig.from_env()
    if not config.username:
        config.username = os.getenv('MONGO_ROOT_USERNAME', 'admin')
    if not config.password:
        config.password = os.getenv('MONGO_ROOT_PASSWORD', 'password')
    
    mongo_client = MongoDBClient(config=config)
    if mongo_client.is_connected():
        voxel_storage = VoxelGridStorage(mongo_client=mongo_client)
        INFRASTRUCTURE_AVAILABLE = True
        print(f"‚úÖ Connected to MongoDB: {config.database}")
    else:
        print("‚ö†Ô∏è MongoDB connection failed - using demo mode")
except Exception as e:
    print(f"‚ö†Ô∏è MongoDB not available: {e} - using demo mode")

print("‚úÖ Setup complete!")

‚úÖ Environment variables loaded from development.env
‚úÖ Validation classes available
‚úÖ Quality assessment client with validation available
‚úÖ Connected to MongoDB: am_qadf_data
‚úÖ Setup complete!


## Interactive Validation and Benchmarking Interface

Use the widgets below to validate framework outputs, benchmark performance, compare with MPM systems, and perform statistical validation. All validation tasks are organized systematically in one unified interface!

In [None]:
# Create Interactive Validation and Benchmarking Interface

# Global state
validation_results = {}
benchmark_results = {}
mpm_comparison_results = {}
accuracy_results = {}
statistical_results = {}
current_validation_type = None

# ============================================
# Helper Functions for Demo Data Generation
# ============================================

def generate_demo_framework_metrics():
    """Generate demo framework quality metrics."""
    np.random.seed(42)
    return {
        'overall_quality_score': 0.90,
        'data_quality_score': 0.85,
        'signal_quality_score': 0.92,
        'alignment_score': 0.88,
        'completeness_score': 0.95,
        'completeness': 0.90,
        'snr': 25.5,
        'alignment_accuracy': 0.95,
    }

def generate_demo_mpm_metrics():
    """Generate demo MPM quality metrics."""
    np.random.seed(43)
    return {
        'overall_quality_score': 0.88,
        'data_quality_score': 0.83,
        'signal_quality_score': 0.90,
        'alignment_score': 0.86,
        'completeness_score': 0.93,
        'completeness': 0.88,
        'snr': 24.8,
        'alignment_accuracy': 0.94,
    }

def generate_demo_ground_truth_signal(shape=(50, 50, 10), noise_level=0.0):
    """Generate demo ground truth signal."""
    np.random.seed(42)
    signal = np.zeros(shape)
    if len(shape) == 3:
        z_coords = np.arange(shape[2])[:, np.newaxis, np.newaxis]
        y_coords = np.arange(shape[1])[np.newaxis, :, np.newaxis]
        x_coords = np.arange(shape[0])[np.newaxis, np.newaxis, :]
        signal = 100 + 10 * np.sin(x_coords * 0.1) + 10 * np.cos(y_coords * 0.1) + 5 * np.sin(z_coords * 0.2)
    if noise_level > 0:
        signal += np.random.normal(0, noise_level, shape)
    return signal

def generate_demo_framework_signal(ground_truth, noise_level=0.05):
    """Generate demo framework signal with some error."""
    np.random.seed(42)
    error = np.random.normal(0, np.std(ground_truth) * noise_level, ground_truth.shape)
    return ground_truth + error

def generate_demo_coordinates(n_points=1000):
    """Generate demo coordinate arrays."""
    np.random.seed(42)
    ground_truth = np.random.rand(n_points, 3) * 10.0
    framework = ground_truth + np.random.normal(0, 0.01, (n_points, 3))
    return ground_truth, framework

# ============================================
# Top Panel: Validation Type Selection and Actions
# ============================================

validation_type_label = WidgetHTML("<b>Validation Type:</b>")
validation_type = RadioButtons(
    options=[
        ('Performance Benchmarking', 'benchmarking'),
        ('MPM Comparison', 'mpm'),
        ('Accuracy Validation', 'accuracy'),
        ('Statistical Validation', 'statistical'),
        ('Comprehensive Workflow', 'comprehensive')
    ],
    value='benchmarking',
    description='Type:',
    style={'description_width': 'initial'}
)

data_source_label = WidgetHTML("<b>Data Source:</b>")
data_source_mode = RadioButtons(
    options=[('Demo Data', 'demo'), ('MongoDB', 'mongodb')],
    value='demo',
    description='Source:',
    style={'description_width': 'initial'}
)

execute_button = Button(
    description='Execute Validation',
    button_style='success',
    icon='check',
    layout=Layout(width='180px')
)

export_button = Button(
    description='Export Report',
    button_style='',
    icon='download',
    layout=Layout(width='150px')
)

top_panel = VBox([
    HBox([validation_type_label, validation_type]),
    HBox([data_source_label, data_source_mode, execute_button, export_button])
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Left Panel: Configuration Accordion
# ============================================

# 1. Benchmarking Configuration
benchmarking_label = WidgetHTML("<b>Benchmarking Configuration:</b>")
benchmark_operation = Dropdown(
    options=[
        ('Quality Assessment', 'quality_assessment'),
        ('Signal Mapping', 'signal_mapping'),
        ('Data Fusion', 'data_fusion'),
        ('Query Operation', 'query')
    ],
    value='quality_assessment',
    description='Operation:',
    style={'description_width': 'initial'}
)

benchmark_data_size = Dropdown(
    options=[('Small', 'small'), ('Medium', 'medium'), ('Large', 'large')],
    value='medium',
    description='Data Size:',
    style={'description_width': 'initial'}
)

benchmark_iterations = IntSlider(
    value=5,
    min=1,
    max=50,
    step=1,
    description='Iterations:',
    style={'description_width': 'initial'}
)

benchmark_warmup = IntSlider(
    value=2,
    min=0,
    max=10,
    step=1,
    description='Warmup:',
    style={'description_width': 'initial'}
)

benchmarking_config = VBox([
    benchmarking_label,
    benchmark_operation,
    benchmark_data_size,
    benchmark_iterations,
    benchmark_warmup
], layout=Layout(padding='5px', border='1px solid #ddd'))

# 2. MPM Comparison Configuration
mpm_label = WidgetHTML("<b>MPM Comparison Configuration:</b>")
mpm_correlation_threshold = FloatSlider(
    value=0.85,
    min=0.0,
    max=1.0,
    step=0.05,
    description='Correlation Threshold:',
    style={'description_width': 'initial'}
)

mpm_max_error = FloatSlider(
    value=0.1,
    min=0.0,
    max=1.0,
    step=0.01,
    description='Max Relative Error:',
    style={'description_width': 'initial'}
)

mpm_metrics_select = SelectMultiple(
    options=['overall_quality_score', 'data_quality_score', 'signal_quality_score', 
             'alignment_score', 'completeness_score', 'completeness', 'snr', 'alignment_accuracy'],
    value=['overall_quality_score', 'completeness', 'snr'],
    description='Metrics:',
    style={'description_width': 'initial'}
)

mpm_config = VBox([
    mpm_label,
    mpm_correlation_threshold,
    mpm_max_error,
    mpm_metrics_select
], layout=Layout(padding='5px', border='1px solid #ddd'))

# 3. Accuracy Validation Configuration
accuracy_label = WidgetHTML("<b>Accuracy Validation Configuration:</b>")
accuracy_type = RadioButtons(
    options=[
        ('Signal Mapping', 'signal_mapping'),
        ('Spatial Alignment', 'spatial'),
        ('Temporal Alignment', 'temporal'),
        ('Quality Metrics', 'quality')
    ],
    value='signal_mapping',
    description='Type:',
    style={'description_width': 'initial'}
)

accuracy_max_error = FloatSlider(
    value=0.1,
    min=0.01,
    max=1.0,
    step=0.01,
    description='Max Acceptable Error:',
    style={'description_width': 'initial'}
)

accuracy_tolerance = FloatSlider(
    value=5.0,
    min=0.0,
    max=20.0,
    step=0.5,
    description='Tolerance (%):',
    style={'description_width': 'initial'}
)

accuracy_config = VBox([
    accuracy_label,
    accuracy_type,
    accuracy_max_error,
    accuracy_tolerance
], layout=Layout(padding='5px', border='1px solid #ddd'))

# 4. Statistical Validation Configuration
statistical_label = WidgetHTML("<b>Statistical Validation Configuration:</b>")
statistical_test = Dropdown(
    options=[
        ('T-test', 't_test'),
        ('Mann-Whitney U', 'mann_whitney'),
        ('Correlation Test', 'correlation'),
        ('ANOVA', 'anova'),
        ('Normality Test', 'normality')
    ],
    value='t_test',
    description='Test:',
    style={'description_width': 'initial'}
)

statistical_significance = FloatSlider(
    value=0.05,
    min=0.001,
    max=0.1,
    step=0.001,
    description='Significance Level (Œ±):',
    style={'description_width': 'initial'}
)

statistical_alternative = RadioButtons(
    options=[('Two-sided', 'two-sided'), ('Less', 'less'), ('Greater', 'greater')],
    value='two-sided',
    description='Alternative:',
    style={'description_width': 'initial'}
)

statistical_config = VBox([
    statistical_label,
    statistical_test,
    statistical_significance,
    statistical_alternative
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Combine into Accordion
config_accordion = Accordion(children=[
    benchmarking_config,
    mpm_config,
    accuracy_config,
    statistical_config
])

config_accordion.set_title(0, '‚è±Ô∏è Benchmarking')
config_accordion.set_title(1, 'üî¨ MPM Comparison')
config_accordion.set_title(2, 'üéØ Accuracy Validation')
config_accordion.set_title(3, 'üìä Statistical Validation')

left_panel = VBox([
    WidgetHTML("<h3>Validation Configuration</h3>"),
    config_accordion
], layout=Layout(width='300px', padding='10px', border='1px solid #ccc'))

# ============================================
# Center Panel: Visualization and Results
# ============================================

viz_mode = RadioButtons(
    options=[
        ('Benchmark Results', 'benchmark'),
        ('MPM Comparison', 'mpm'),
        ('Accuracy Metrics', 'accuracy'),
        ('Statistical Tests', 'statistical'),
        ('Comprehensive Report', 'report')
    ],
    value='benchmark',
    description='View:',
    style={'description_width': 'initial'}
)

main_output = Output(layout=Layout(height='600px', overflow='auto'))

center_panel = VBox([
    WidgetHTML("<h3>Validation Results</h3>"),
    viz_mode,
    main_output
], layout=Layout(flex='1 1 auto', padding='10px', border='1px solid #ccc'))

# ============================================
# Right Panel: Status and Summary
# ============================================

status_label = WidgetHTML("<b>Status:</b>")
status_display = WidgetHTML("Ready to validate")
status_section = VBox([
    status_label,
    status_display
], layout=Layout(padding='5px', border='2px solid #4CAF50'))

results_summary_label = WidgetHTML("<b>Results Summary:</b>")
results_summary_display = WidgetHTML("No validation executed yet")
results_summary_section = VBox([
    results_summary_label,
    results_summary_display
], layout=Layout(padding='5px'))

metrics_display_label = WidgetHTML("<b>Key Metrics:</b>")
metrics_display = WidgetHTML("No metrics available")
metrics_section = VBox([
    metrics_display_label,
    metrics_display
], layout=Layout(padding='5px'))

validation_status_label = WidgetHTML("<b>Validation Status:</b>")
validation_status_display = WidgetHTML("Not validated")
validation_status_section = VBox([
    validation_status_label,
    validation_status_display
], layout=Layout(padding='5px'))

right_panel = VBox([
    status_section,
    results_summary_section,
    metrics_section,
    validation_status_section
], layout=Layout(width='250px', padding='10px', border='1px solid #ccc'))

# ============================================
# Execute Validation Function
# ============================================

def execute_validation(b):
    """Execute validation based on selected type."""
    global operation_start_time
    operation_start_time = time.time()
    
    # Clear logs
    with validation_logs:
        clear_output(wait=True)
        display(HTML("<p><i>Validation logs will appear here...</i></p>"))
    
    with main_output:
        clear_output(wait=True)
        val_type = validation_type.value
        
        log_message(f"Starting {validation_type.label}...", 'info')
        update_status(f"Executing {validation_type.label}...", 0)
        
        try:
            if val_type == 'benchmarking':
                execute_benchmarking()
            elif val_type == 'mpm':
                execute_mpm_comparison()
            elif val_type == 'accuracy':
                execute_accuracy_validation()
            elif val_type == 'statistical':
                execute_statistical_validation()
            elif val_type == 'comprehensive':
                execute_comprehensive_workflow()
            
            log_message(f"{validation_type.label} completed successfully", 'success')
            update_status(f"{validation_type.label} complete", 100)
        except Exception as e:
            log_message(f"Error during {validation_type.label}: {str(e)}", 'error')
            import traceback
            log_message(f"Traceback: {traceback.format_exc()}", 'error')
            warning_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
            update_status(f"Error during {validation_type.label}", 0)

def execute_benchmarking():
    """Execute performance benchmarking."""
    log_message("Performance Benchmarking", 'info')
    log_message(f"Operation: {benchmark_operation.label}", 'info')
    log_message(f"Data Size: {benchmark_data_size.value}", 'info')
    log_message(f"Iterations: {benchmark_iterations.value}, Warmup: {benchmark_warmup.value}", 'info')
    update_status("Initializing benchmark...", 10)
    
    print("‚è±Ô∏è Performance Benchmarking")
    print("=" * 60)
    print(f"Operation: {benchmark_operation.label}")
    print(f"Data Size: {benchmark_data_size.value}")
    print(f"Iterations: {benchmark_iterations.value}")
    print(f"Warmup: {benchmark_warmup.value}")
    print()
    
    # Size mapping
    size_map = {'small': (20, 20, 5), 'medium': (50, 50, 10), 'large': (100, 100, 20)}
    shape = size_map[benchmark_data_size.value]
    
    def demo_operation():
        data = np.random.rand(*shape) * 100
        time.sleep(0.01 * (np.prod(shape) / 25000))
        return np.mean(data)
    
    if VALIDATION_AVAILABLE:
        try:
            update_status("Running benchmark iterations...", 30)
            log_message("Creating PerformanceBenchmarker...", 'info')
            benchmarker = PerformanceBenchmarker()
            
            log_message("Executing benchmark...", 'info')
            result = benchmarker.benchmark_operation(
                benchmark_operation.value,
                demo_operation,
                iterations=benchmark_iterations.value,
                warmup_iterations=benchmark_warmup.value
            )
            benchmark_results['latest'] = result
            
            update_status("Benchmark complete", 80)
            log_message(f"Average execution time: {result.execution_time:.4f}s", 'success')
            log_message(f"Memory usage: {result.memory_usage:.2f} MB", 'info')
            log_message(f"Throughput: {result.throughput:.2f} elements/s", 'info')
            
            print("‚úÖ Benchmark Complete!")
            print(f"\nüìä Results:")
            print(f"   Average Time: {result.execution_time:.4f}s")
            print(f"   Memory Usage: {result.memory_usage:.2f} MB")
            print(f"   Throughput: {result.throughput:.2f} elements/s")
            
            # Visualization
            fig, ax = plt.subplots(figsize=(10, 6))
            if hasattr(result, 'metadata') and 'min_time' in result.metadata:
                times = [result.metadata['min_time'], result.execution_time, result.metadata['max_time']]
                labels = ['Min', 'Average', 'Max']
                colors = ['green', 'blue', 'red']
                ax.bar(labels, times, color=colors, alpha=0.7)
                ax.set_ylabel('Time (seconds)')
                ax.set_title('Benchmark Results')
                ax.grid(True, alpha=0.3)
            plt.tight_layout()
            plt.show()
            
            # Update status
            status_display.value = f"‚úÖ Benchmark complete: {result.execution_time:.4f}s"
            results_summary_display.value = f"Operation: {result.operation_name}<br>Time: {result.execution_time:.4f}s<br>Memory: {result.memory_usage:.2f} MB"
            
        except Exception as e:
            print(f"‚ùå Benchmark failed: {e}")
            status_display.value = f"‚ùå Error: {e}"
    else:
        # Demo mode
        log_message("Running benchmark in demo mode...", 'warning')
        import time as time_module
        times = []
        for i in range(benchmark_iterations.value + benchmark_warmup.value):
            start = time_module.perf_counter()
            demo_operation()
            elapsed = time_module.perf_counter() - start
            if i >= benchmark_warmup.value:
                times.append(elapsed)
        
        avg_time = np.mean(times)
        min_time = np.min(times)
        max_time = np.max(times)
        
        log_message(f"Average execution time: {avg_time:.4f}s", 'success')
        log_message(f"Min time: {min_time:.4f}s, Max time: {max_time:.4f}s", 'info')
        
        print("‚úÖ Benchmark Complete! (Demo Mode)")
        print(f"\nüìä Results:")
        print(f"   Average Time: {avg_time:.4f}s")
        print(f"   Min Time: {min_time:.4f}s")
        print(f"   Max Time: {max_time:.4f}s")
        
        update_status("Benchmark complete (demo)", 80)
        status_display.value = f"‚úÖ Benchmark complete (demo): {avg_time:.4f}s"
        results_summary_display.value = f"Benchmark (Demo Mode):<br>Avg Time: {avg_time:.4f}s<br>Min: {min_time:.4f}s<br>Max: {max_time:.4f}s"

def execute_mpm_comparison():
    """Execute MPM comparison."""
    log_message("MPM Comparison", 'info')
    update_status("Generating demo metrics...", 10)
    
    framework_metrics = generate_demo_framework_metrics()
    mpm_metrics = generate_demo_mpm_metrics()
    
    log_message(f"Framework metrics: {len(framework_metrics)} metrics", 'info')
    log_message(f"MPM metrics: {len(mpm_metrics)} metrics", 'info')
    
    print("üî¨ MPM Comparison")
    print("=" * 60)
    
    print("Framework Metrics:")
    for key, value in framework_metrics.items():
        print(f"  {key}: {value:.3f}")
    print()
    print("MPM Metrics:")
    for key, value in mpm_metrics.items():
        print(f"  {key}: {value:.3f}")
    print()
    
    if VALIDATION_AVAILABLE:
        try:
            update_status("Creating MPM comparison engine...", 30)
            log_message(f"Correlation threshold: {mpm_correlation_threshold.value}", 'info')
            log_message(f"Max relative error: {mpm_max_error.value}", 'info')
            
            mpm_comparer = MPMComparisonEngine(
                correlation_threshold=mpm_correlation_threshold.value,
                max_relative_error=mpm_max_error.value
            )
            
            update_status("Comparing metrics...", 50)
            log_message("Comparing framework and MPM metrics...", 'info')
            results = mpm_comparer.compare_quality_metrics(framework_metrics, mpm_metrics)
            mpm_comparison_results['latest'] = results
            
            update_status("Comparison complete", 80)
            log_message(f"Compared {len(results)} metrics", 'success')
            
            print("‚úÖ Comparison Complete!")
            print("\nüìä Comparison Results:")
            
            comparison_data = []
            for metric_name, result in results.items():
                comparison_data.append({
                    'Metric': metric_name,
                    'Framework': result.framework_value,
                    'MPM': result.mpm_value,
                    'Difference': result.difference,
                    'Relative Error %': result.relative_error,
                    'Correlation': result.correlation,
                    'Valid': '‚úì' if result.is_valid else '‚úó'
                })
                
                if result.is_valid:
                    log_message(f"Metric '{metric_name}': Valid (corr={result.correlation:.3f}, err={result.relative_error:.2f}%)", 'success')
                else:
                    log_message(f"Metric '{metric_name}': Invalid (corr={result.correlation:.3f}, err={result.relative_error:.2f}%)", 'warning')
            
            df = pd.DataFrame(comparison_data)
            display(df.style.format({
                'Framework': '{:.3f}',
                'MPM': '{:.3f}',
                'Difference': '{:.4f}',
                'Relative Error %': '{:.2f}',
                'Correlation': '{:.3f}'
            }))
            
            valid_count = sum(1 for r in results.values() if r.is_valid)
            status_display.value = f"‚úÖ MPM comparison: {valid_count}/{len(results)} metrics valid"
            results_summary_display.value = f"MPM Comparison:<br>{valid_count}/{len(results)} metrics validated<br>Correlation threshold: {mpm_correlation_threshold.value}"
            
        except Exception as e:
            log_message(f"MPM comparison failed: {str(e)}", 'error')
            print(f"‚ùå MPM comparison failed: {e}")
            status_display.value = f"‚ùå Error: {e}"
            warning_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
    else:
        log_message("Running MPM comparison in demo mode...", 'warning')
        print("‚úÖ Comparison Complete! (Demo Mode)")
        update_status("MPM comparison complete (demo)", 80)
        status_display.value = "‚úÖ MPM comparison complete (demo)"
        results_summary_display.value = "MPM Comparison (Demo Mode):<br>Comparison completed with demo data"

def execute_accuracy_validation():
    """Execute accuracy validation."""
    log_message("Accuracy Validation", 'info')
    update_status("Generating ground truth data...", 10)
    
    ground_truth = generate_demo_ground_truth_signal()
    framework_signal = generate_demo_framework_signal(ground_truth)
    
    log_message(f"Ground truth shape: {ground_truth.shape}", 'info')
    log_message(f"Framework signal shape: {framework_signal.shape}", 'info')
    
    print("üéØ Accuracy Validation")
    print("=" * 60)
    
    if VALIDATION_AVAILABLE:
        try:
            update_status("Creating accuracy validator...", 30)
            log_message(f"Max acceptable error: {accuracy_max_error.value}", 'info')
            log_message(f"Tolerance: {accuracy_tolerance.value}%", 'info')
            
            validator = AccuracyValidator(
                max_acceptable_error=accuracy_max_error.value,
                tolerance_percent=accuracy_tolerance.value
            )
            
            update_status("Validating signal mapping...", 50)
            log_message("Calculating accuracy metrics...", 'info')
            result = validator.validate_signal_mapping(framework_signal, ground_truth, "demo_signal")
            accuracy_results['latest'] = result
            
            update_status("Accuracy validation complete", 80)
            log_message(f"RMSE: {result.rmse:.6f}", 'info')
            log_message(f"MAE: {result.mae:.6f}", 'info')
            log_message(f"R¬≤ Score: {result.r2_score:.4f}", 'success' if result.r2_score > 0.9 else 'warning')
            log_message(f"Within tolerance: {'Yes' if result.within_tolerance else 'No'}", 'success' if result.within_tolerance else 'warning')
            
            print("‚úÖ Validation Complete!")
            print(f"\nüìä Accuracy Metrics:")
            print(f"   RMSE: {result.rmse:.6f}")
            print(f"   MAE: {result.mae:.6f}")
            print(f"   R¬≤ Score: {result.r2_score:.4f}")
            print(f"   Max Error: {result.max_error:.6f}")
            print(f"   Within Tolerance: {'‚úì Yes' if result.within_tolerance else '‚úó No'}")
            
            status_display.value = f"‚úÖ Accuracy validation: R¬≤ = {result.r2_score:.4f}"
            metrics_display.value = f"RMSE: {result.rmse:.6f}<br>MAE: {result.mae:.6f}<br>R¬≤: {result.r2_score:.4f}<br>Max Error: {result.max_error:.6f}"
            validation_status_display.value = "‚úì Validated" if result.within_tolerance else "‚úó Outside Tolerance"
            
        except Exception as e:
            log_message(f"Accuracy validation failed: {str(e)}", 'error')
            print(f"‚ùå Accuracy validation failed: {e}")
            status_display.value = f"‚ùå Error: {e}"
            warning_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
    else:
        log_message("Running accuracy validation in demo mode...", 'warning')
        # Calculate demo metrics
        errors = (framework_signal - ground_truth).flatten()
        rmse = np.sqrt(np.mean(errors**2))
        mae = np.mean(np.abs(errors))
        r2 = 1 - (np.sum(errors**2) / np.sum((ground_truth.flatten() - np.mean(ground_truth))**2))
        max_error = np.max(np.abs(errors))
        
        log_message(f"RMSE: {rmse:.6f}, MAE: {mae:.6f}, R¬≤: {r2:.4f}", 'info')
        
        print("‚úÖ Validation Complete! (Demo Mode)")
        print(f"\nüìä Accuracy Metrics (Demo):")
        print(f"   RMSE: {rmse:.6f}")
        print(f"   MAE: {mae:.6f}")
        print(f"   R¬≤ Score: {r2:.4f}")
        print(f"   Max Error: {max_error:.6f}")
        
        update_status("Accuracy validation complete (demo)", 80)
        status_display.value = f"‚úÖ Accuracy validation complete (demo): R¬≤ = {r2:.4f}"
        metrics_display.value = f"RMSE: {rmse:.6f}<br>MAE: {mae:.6f}<br>R¬≤: {r2:.4f}<br>Max Error: {max_error:.6f}"

def execute_statistical_validation():
    """Execute statistical validation."""
    log_message("Statistical Validation", 'info')
    update_status("Generating test data...", 10)
    
    np.random.seed(42)
    baseline = np.random.normal(0.85, 0.05, 50)
    improved = np.random.normal(0.90, 0.05, 50)
    
    log_message(f"Baseline: mean={np.mean(baseline):.3f}, std={np.std(baseline):.3f}", 'info')
    log_message(f"Improved: mean={np.mean(improved):.3f}, std={np.std(improved):.3f}", 'info')
    
    print("üìä Statistical Validation")
    print("=" * 60)
    
    if VALIDATION_AVAILABLE:
        try:
            update_status("Creating statistical validator...", 30)
            log_message(f"Test: {statistical_test.label}", 'info')
            log_message(f"Significance level (Œ±): {statistical_significance.value}", 'info')
            log_message(f"Alternative: {statistical_alternative.value}", 'info')
            
            validator = StatisticalValidator(significance_level=statistical_significance.value)
            
            update_status("Performing statistical test...", 50)
            if statistical_test.value == 't_test':
                log_message("Running t-test...", 'info')
                result = validator.t_test(baseline, improved, alternative=statistical_alternative.value)
            elif statistical_test.value == 'correlation':
                log_message("Running correlation test...", 'info')
                x = np.linspace(0, 10, 50)
                y = x + np.random.normal(0, 0.5, 50)
                result = validator.correlation_test(x, y)
            elif statistical_test.value == 'mann_whitney':
                log_message("Running Mann-Whitney U test...", 'info')
                result = validator.mann_whitney_u_test(baseline, improved)
            elif statistical_test.value == 'anova':
                log_message("Running ANOVA test...", 'info')
                result = validator.anova_test([baseline, improved])
            else:
                result = validator.t_test(baseline, improved)
            
            statistical_results['latest'] = result
            
            update_status("Statistical test complete", 80)
            log_message(f"Test statistic: {result.test_statistic:.4f}", 'info')
            log_message(f"P-value: {result.p_value:.6f}", 'info')
            log_message(f"Significant: {'Yes' if result.is_significant else 'No'} (Œ± = {result.significance_level})", 'success' if result.is_significant else 'warning')
            
            print("‚úÖ Statistical Test Complete!")
            print(f"\nüìä Results:")
            print(f"   Test: {result.test_name}")
            print(f"   P-value: {result.p_value:.6f}")
            print(f"   Significant: {'‚úì Yes' if result.is_significant else '‚úó No'}")
            print(f"   Conclusion: {result.conclusion}")
            
            status_display.value = f"‚úÖ Statistical test: p = {result.p_value:.4f}"
            metrics_display.value = f"Test: {result.test_name}<br>P-value: {result.p_value:.6f}<br>Significant: {'Yes' if result.is_significant else 'No'}<br>Statistic: {result.test_statistic:.4f}"
            validation_status_display.value = "‚úì Significant" if result.is_significant else "‚úó Not Significant"
            
        except Exception as e:
            log_message(f"Statistical validation failed: {str(e)}", 'error')
            import traceback
            log_message(f"Traceback: {traceback.format_exc()}", 'error')
            print(f"‚ùå Statistical validation failed: {e}")
            status_display.value = f"‚ùå Error: {e}"
            warning_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
    else:
        log_message("Running statistical test in demo mode...", 'warning')
        try:
            from scipy import stats
            if statistical_test.value == 't_test':
                t_stat, p_value = stats.ttest_ind(baseline, improved)
                log_message(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.6f}", 'info')
            elif statistical_test.value == 'correlation':
                x = np.linspace(0, 10, 50)
                y = x + np.random.normal(0, 0.5, 50)
                corr, p_value = stats.pearsonr(x, y)
                t_stat = corr
                log_message(f"Correlation: {corr:.4f}, P-value: {p_value:.6f}", 'info')
            else:
                t_stat, p_value = stats.ttest_ind(baseline, improved)
                log_message(f"Demo test - T-statistic: {t_stat:.4f}, P-value: {p_value:.6f}", 'info')
        except ImportError:
            # Fallback if scipy not available
            p_value = 0.01
            t_stat = 2.5
            log_message("scipy not available, using demo values", 'warning')
        
        is_significant = p_value < statistical_significance.value
        log_message(f"Significant: {'Yes' if is_significant else 'No'} (p={p_value:.6f}, Œ±={statistical_significance.value})", 'success' if is_significant else 'warning')
        
        print("‚úÖ Statistical Test Complete! (Demo Mode)")
        print(f"\nüìä T-Test Results (Demo):")
        print(f"   T-statistic: {t_stat:.4f}")
        print(f"   P-value: {p_value:.6f}")
        print(f"   Significant: {'‚úì Yes' if is_significant else '‚úó No'} (Œ± = {statistical_significance.value})")
        
        update_status("Statistical test complete (demo)", 80)
        status_display.value = f"‚úÖ Statistical test complete (demo): p = {p_value:.4f}"
        metrics_display.value = f"Test: {statistical_test.label} (Demo)<br>P-value: {p_value:.6f}<br>Significant: {'Yes' if is_significant else 'No'}<br>Statistic: {t_stat:.4f}"
        validation_status_display.value = "‚úì Significant" if is_significant else "‚úó Not Significant"

def execute_comprehensive_workflow():
    """Execute comprehensive validation workflow."""
    log_message("Comprehensive Validation Workflow", 'info')
    log_message("Running all validation steps in sequence...", 'info')
    
    print("üöÄ Comprehensive Validation Workflow")
    print("=" * 60)
    print("\nRunning all validation steps...\n")
    
    # Step 1: Benchmark
    update_status("Step 1/4: Performance Benchmarking", 10)
    log_message("=" * 60, 'info')
    log_message("Step 1: Performance Benchmarking", 'info')
    print("Step 1: Performance Benchmarking")
    print("-" * 60)
    execute_benchmarking()
    print()
    
    # Step 2: MPM Comparison
    update_status("Step 2/4: MPM Comparison", 30)
    log_message("=" * 60, 'info')
    log_message("Step 2: MPM Comparison", 'info')
    print("Step 2: MPM Comparison")
    print("-" * 60)
    execute_mpm_comparison()
    print()
    
    # Step 3: Accuracy Validation
    update_status("Step 3/4: Accuracy Validation", 50)
    log_message("=" * 60, 'info')
    log_message("Step 3: Accuracy Validation", 'info')
    print("Step 3: Accuracy Validation")
    print("-" * 60)
    execute_accuracy_validation()
    print()
    
    # Step 4: Statistical Validation
    update_status("Step 4/4: Statistical Validation", 70)
    log_message("=" * 60, 'info')
    log_message("Step 4: Statistical Validation", 'info')
    print("Step 4: Statistical Validation")
    print("-" * 60)
    execute_statistical_validation()
    print()
    
    # Generate report
    update_status("Generating comprehensive report...", 90)
    log_message("=" * 60, 'info')
    log_message("Generating validation report...", 'info')
    
    if VALIDATION_AVAILABLE:
        try:
            v_client = ValidationClient()
            all_results = {
                'mpm_comparison': mpm_comparison_results.get('latest', {}),
                'accuracy': accuracy_results.get('latest'),
                'statistical': statistical_results.get('latest'),
                'benchmark': benchmark_results.get('latest')
            }
            report = v_client.generate_validation_report(all_results)
            validation_reports['latest'] = report
            log_message(f"Validation report generated ({len(report)} characters)", 'success')
        except Exception as e:
            log_message(f"Report generation: {e}", 'warning')
    
    log_message("=" * 60, 'info')
    log_message("‚úÖ Comprehensive Validation Workflow Complete!", 'success')
    print("=" * 60)
    print("‚úÖ Comprehensive Validation Workflow Complete!")
    print("=" * 60)
    
    update_status("Comprehensive workflow complete", 100)
    status_display.value = "‚úÖ Comprehensive workflow complete"
    results_summary_display.value = "All validation steps completed successfully"

# Connect button
execute_button.on_click(execute_validation)

# Update view based on validation type
def update_view(change):
    """Update visualization mode based on validation type."""
    val_type = change['new']
    if val_type == 'benchmarking':
        viz_mode.value = 'benchmark'
    elif val_type == 'mpm':
        viz_mode.value = 'mpm'
    elif val_type == 'accuracy':
        viz_mode.value = 'accuracy'
    elif val_type == 'statistical':
        viz_mode.value = 'statistical'
    elif val_type == 'comprehensive':
        viz_mode.value = 'report'

validation_type.observe(update_view, names='value')

# ============================================
# Export Button Functionality
# ============================================

def export_report(b):
    """Export validation report."""
    if not validation_reports.get('latest'):
        warning_display.value = "<span style='color: orange;'>‚ö†Ô∏è No validation report available. Run validation first.</span>"
        log_message("Export attempted but no report available", 'warning')
        return
    
    try:
        report = validation_reports['latest']
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f"validation_report_{timestamp}.txt"
        
        # In a real implementation, this would download the file
        # For now, we'll display it and log
        log_message(f"Report ready for export: {filename} ({len(report)} characters)", 'info')
        warning_display.value = f"<span style='color: green;'>‚úÖ Report ready: {filename}</span>"
        
        # Display first part of report
        with main_output:
            clear_output(wait=True)
            print("üìÑ Validation Report")
            print("=" * 60)
            print(report[:3000] + "..." if len(report) > 3000 else report)
        
    except Exception as e:
        log_message(f"Export failed: {str(e)}", 'error')
        warning_display.value = f"<span style='color: red;'>‚ùå Export failed: {str(e)}</span>"

export_button.on_click(export_report)

# ============================================
# Bottom Panel: Status, Progress, and Logs
# ============================================

# Current operation status
current_operation = WidgetHTML(value='<b>Status:</b> Ready to validate')

# Progress bar
progress_bar = widgets.IntProgress(
    value=0,
    min=0,
    max=100,
    description='Progress:',
    bar_style='info',
    layout=Layout(width='100%')
)

# Validation logs output
validation_logs = Output(layout=Layout(height='200px', border='1px solid #ccc', overflow_y='auto'))

# Initialize logs
with validation_logs:
    display(HTML("<p><i>Validation logs will appear here...</i></p>"))

# Bottom status bar (shows Status | Progress | Time)
bottom_status = WidgetHTML(value='<b>Status:</b> Ready | <b>Progress:</b> 0% | <b>Time:</b> 0:00')
bottom_progress = widgets.IntProgress(
    value=0,
    min=0,
    max=100,
    description='Overall:',
    bar_style='info',
    layout=Layout(width='100%')
)

# Warning display
warning_display = WidgetHTML("")

# Global time tracking
operation_start_time = None

# Bottom panel
bottom_panel = VBox([
    current_operation,
    progress_bar,
    WidgetHTML("<b>Validation Logs:</b>"),
    validation_logs,
    WidgetHTML("<hr>"),
    bottom_status,
    bottom_progress,
    warning_display
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Logging Functions
# ============================================

def log_message(message: str, level: str = 'info'):
    """Log a message to the validation logs with timestamp and emoji."""
    timestamp = datetime.now().strftime('%H:%M:%S')
    icons = {'info': '‚ÑπÔ∏è', 'success': '‚úÖ', 'warning': '‚ö†Ô∏è', 'error': '‚ùå'}
    icon = icons.get(level, '‚ÑπÔ∏è')
    with validation_logs:
        print(f"[{timestamp}] {icon} {message}")

def update_status(operation: str, progress: int = None):
    """Update the status display and progress."""
    global operation_start_time
    current_operation.value = f'<b>Status:</b> {operation}'
    if progress is not None:
        progress_bar.value = progress
        bottom_progress.value = progress
        if operation_start_time:
            elapsed = time.time() - operation_start_time
            bottom_status.value = f'<b>Status:</b> {operation} | <b>Progress:</b> {progress}% | <b>Time:</b> {time.strftime("%M:%S", time.gmtime(elapsed))}'
        else:
            bottom_status.value = f'<b>Status:</b> {operation} | <b>Progress:</b> {progress}% | <b>Time:</b> 0:00'
    else:
        if operation_start_time:
            elapsed = time.time() - operation_start_time
            bottom_status.value = f'<b>Status:</b> {operation} | <b>Progress:</b> {progress_bar.value}% | <b>Time:</b> {time.strftime("%M:%S", time.gmtime(elapsed))}'
        else:
            bottom_status.value = f'<b>Status:</b> {operation} | <b>Progress:</b> {progress_bar.value}% | <b>Time:</b> 0:00'

# ============================================
# Complete Interface Layout
# ============================================

main_layout = VBox([
    top_panel,
    HBox([left_panel, center_panel, right_panel], layout=Layout(width='100%', border='2px solid #333', padding='10px')),
    bottom_panel
])

display(main_layout)

VBox(children=(VBox(children=(HBox(children=(HTML(value='<b>Validation Type:</b>'), RadioButtons(description='‚Ä¶

## Summary and Next Steps

### Key Takeaways

You've learned how to:

1. **Benchmark Framework Operations**: Measure performance metrics (execution time, memory, throughput) with detailed progress tracking
2. **Compare with MPM Systems**: Validate framework outputs against Melt Pool Monitoring reference data
3. **Validate Accuracy**: Calculate error metrics (RMSE, MAE, R¬≤) against ground truth with real-time status updates
4. **Perform Statistical Tests**: Use hypothesis testing to validate improvements and significance
5. **Generate Comprehensive Reports**: Combine all validation results into detailed reports
6. **Monitor Validation Progress**: Use the status bar and logs section to track validation operations in real-time
7. **Interpret Logs**: Understand timestamped log messages with success/warning/error indicators

### Interface Features

The notebook provides a comprehensive validation interface with:

- **Status Progress Bar**: Visual indication of validation progress (0-100%)
- **Real-Time Status Display**: Shows current operation, progress percentage, and elapsed time
- **Detailed Logs Section**: Timestamped execution logs with emoji indicators:
  - ‚ÑπÔ∏è Information messages
  - ‚úÖ Success messages
  - ‚ö†Ô∏è Warning messages
  - ‚ùå Error messages (with full tracebacks)
- **Time Tracking**: Automatic tracking of execution time for all validation operations
- **Error Handling**: Comprehensive error messages displayed in both the logs and status sections

### Best Practices

- **Regular Benchmarking**: Benchmark operations regularly to track performance over time
- **Multiple Validation Methods**: Use multiple validation approaches for robust verification
- **Statistical Significance**: Always test statistical significance when comparing methods
- **Document Results**: Keep detailed validation reports for reproducibility (use the Export button)
- **Set Thresholds**: Define clear acceptance criteria (correlation, error tolerance, p-values)
- **Monitor Logs**: Check the logs section for detailed execution information and any warnings
- **Review Progress**: Use the status bar to monitor long-running validation operations

### Next Steps

- Explore the validation module API for advanced use cases
- Integrate validation into your analysis workflows
- Customize validation thresholds for your specific requirements
- Review validation logs to optimize performance and identify issues
- Export validation reports for documentation and sharing
- Contribute validation results to framework documentation

### Additional Resources

- Validation Module Documentation: `src/am_qadf/validation/`
- Test Examples: `tests/unit/validation/` and `tests/integration/validation/`
- Benchmarking Guide: `implementation_plans/BENCHMARKING_USAGE_GUIDE.md`
- Validation Test Plan: `implementation_plans/VALIDATION_TEST_PLAN.md`

---

**Congratulations!** You've completed the Validation and Benchmarking notebook. You now have the tools to validate framework accuracy, benchmark performance, and ensure reliability in your AM-QADF workflows. The real-time progress tracking and detailed logging features help you monitor and troubleshoot validation operations effectively. üéâ