# Statistical Analysis

## Purpose

This notebook teaches you how to perform comprehensive statistical analysis on voxel data. You'll learn descriptive statistics, correlation analysis, trend detection, pattern recognition, multivariate analysis, time series analysis, and regression with interactive widgets.

## Learning Objectives

By the end of this notebook, you will:
- ‚úÖ Calculate descriptive statistics (mean, median, std, percentiles, skewness, kurtosis)
- ‚úÖ Perform correlation analysis (Pearson, Spearman, Kendall)
- ‚úÖ Detect trends (temporal, spatial, linear, polynomial)
- ‚úÖ Identify patterns (clusters, periodicity, anomalies)
- ‚úÖ Apply multivariate techniques (PCA, clustering)
- ‚úÖ Analyze time series (trends, seasonality, autocorrelation)
- ‚úÖ Perform regression analysis (linear, polynomial, multiple)

## Estimated Duration

60-90 minutes

---

## Overview

Statistical analysis is essential for understanding data patterns and relationships. The AM-QADF framework provides comprehensive statistical analysis capabilities:

- üìä **Descriptive Statistics**: Mean, median, std, min, max, percentiles, skewness, kurtosis
- üîó **Correlation Analysis**: Pearson, Spearman, Kendall correlation methods
- üìà **Trend Analysis**: Temporal, spatial, linear, polynomial trends
- üîç **Pattern Detection**: Clusters, periodicity, anomalies
- üéØ **Multivariate Analysis**: PCA, clustering, dimensionality reduction
- ‚è±Ô∏è **Time Series Analysis**: Trends, seasonality, autocorrelation
- üìâ **Regression Analysis**: Linear, polynomial, multiple regression
- üìä **Non-Parametric Methods**: Methods that don't assume specific distributions

Use the interactive widgets below to perform statistical analysis - no coding required!


In [1]:
# Setup: Import required libraries
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add parent directory and src directory to path for imports
notebook_dir = Path().resolve()
project_root = notebook_dir.parent
src_dir = project_root / 'src'

# Add project root to path (for src.infrastructure imports)
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Add src directory to path (for am_qadf imports)
if str(src_dir) not in sys.path:
    sys.path.insert(0, str(src_dir))

# Core imports
import ipywidgets as widgets
from ipywidgets import (
    VBox, HBox, Accordion, Tab, Dropdown, RadioButtons, 
    Checkbox, Button, Output, Text, IntSlider, FloatSlider,
    Layout, Box, Label, FloatText, IntText, SelectMultiple
)
from IPython.display import display, Markdown, HTML, clear_output
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.stats import pearsonr, spearmanr, kendalltau
from datetime import datetime
from typing import Optional, Tuple, Dict, Any, List

# Load environment variables from development.env
import os
env_file = project_root / 'development.env'
if env_file.exists():
    with open(env_file, 'r') as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith('#') and '=' in line:
                key, value = line.split('=', 1)
                value = value.strip('"\'')
                os.environ[key] = value
    print("‚úÖ Environment variables loaded from development.env")

# Try to import statistical analysis classes
STATS_AVAILABLE = False
try:
    from am_qadf.analytics.statistical_analysis.client import AdvancedAnalyticsClient
    STATS_AVAILABLE = True
    print("‚úÖ Statistical analysis classes available")
except ImportError as e:
    print(f"‚ö†Ô∏è Statistical analysis classes not available: {e} - using demo mode")

# MongoDB connection setup
INFRASTRUCTURE_AVAILABLE = False
mongo_client = None
voxel_storage = None
stl_client = None

try:
    from src.infrastructure.config import MongoDBConfig
    from src.infrastructure.database import MongoDBClient
    from am_qadf.voxel_domain import VoxelGridStorage
    from am_qadf.query import STLModelClient
    
    # Initialize MongoDB connection
    config = MongoDBConfig.from_env()
    if not config.username:
        config.username = os.getenv('MONGO_ROOT_USERNAME', 'admin')
    if not config.password:
        config.password = os.getenv('MONGO_ROOT_PASSWORD', 'password')
    
    mongo_client = MongoDBClient(config=config)
    if mongo_client.is_connected():
        voxel_storage = VoxelGridStorage(mongo_client=mongo_client)
        stl_client = STLModelClient(mongo_client=mongo_client)
        INFRASTRUCTURE_AVAILABLE = True
        print(f"‚úÖ Connected to MongoDB: {config.database}")
    else:
        print("‚ö†Ô∏è MongoDB connection failed")
except Exception as e:
    print(f"‚ö†Ô∏è MongoDB not available: {e} - using demo mode")

print("‚úÖ Setup complete!")


‚úÖ Environment variables loaded from development.env


‚úÖ Statistical analysis classes available


Failed to connect to MongoDB: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30.0s, Topology Description: <TopologyDescription id: 6960128795023d1b8f82f838, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>


‚ö†Ô∏è MongoDB not available: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30.0s, Topology Description: <TopologyDescription id: 6960128795023d1b8f82f838, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]> - using demo mode
‚úÖ Setup complete!


## Interactive Statistical Analysis Interface

Use the widgets below to perform statistical analysis. Select analysis type, configure parameters, and visualize results interactively!


In [2]:
# Create Interactive Statistical Analysis Interface

# Global state
analysis_data = {}
analysis_results = {}
current_model_id = None
current_grid_id = None
loaded_grid_data = None
signal_arrays = {}

# ============================================
# Helper Functions for Demo Data
# ============================================

def generate_sample_signals():
    """Generate sample signal data for analysis."""
    np.random.seed(42)
    
    n_points = 1000
    
    # Signal 1: Temperature (with trend)
    x = np.linspace(0, 100, n_points)
    temperature = 200 + 50 * np.sin(2 * np.pi * x / 20) + 10 * x / 100 + np.random.normal(0, 5, n_points)
    
    # Signal 2: Power (correlated with temperature)
    power = 0.8 * temperature + 50 + np.random.normal(0, 10, n_points)
    
    # Signal 3: Density (periodic pattern)
    density = 7.8 + 0.5 * np.sin(2 * np.pi * x / 15) + np.random.normal(0, 0.1, n_points)
    
    # Signal 4: Stress (inverse correlation with temperature)
    stress = 300 - 0.5 * temperature + np.random.normal(0, 15, n_points)
    
    return {
        'temperature': temperature,
        'power': power,
        'density': density,
        'stress': stress,
        'x': x
    }

# ============================================
# Top Panel: Data Source and Grid Selection
# ============================================

# Data source mode
data_source_label = widgets.HTML("<b>Data Source:</b>")
data_source_mode = RadioButtons(
    options=[('MongoDB', 'mongodb'), ('Sample Data', 'sample')],
    value='mongodb',
    description='Source:',
    style={'description_width': 'initial'}
)

# Model selection (for MongoDB)
model_label = widgets.HTML("<b>Model:</b>")
model_options = [("‚îÅ‚îÅ‚îÅ Select Model ‚îÅ‚îÅ‚îÅ", None)]
if stl_client and mongo_client:
    try:
        models = stl_client.list_models(limit=100)
        model_options.extend([
            (f"{m.get('filename', m.get('original_stem', m.get('model_name', 'Unknown')))} ({m.get('model_id', '')[:8]}...)", m.get('model_id'))
            for m in models
        ])
    except Exception as e:
        print(f"‚ö†Ô∏è Error loading models: {e}")

model_dropdown = Dropdown(
    options=model_options,
    value=None,
    description='Model:',
    style={'description_width': 'initial'},
    layout=Layout(width='400px')
)

# Grid type filter
grid_type_label = widgets.HTML("<b>Grid Type:</b>")
grid_type_filter = Dropdown(
    options=[
        ('All Grids', 'all'),
        ('Fused', 'fused'),
        ('Corrected', 'corrected'),
        ('Processed', 'processed'),
        ('Signal-Mapped', 'signal_mapped'),
        ('Raw', 'raw')
    ],
    value='fused',  # Default to fused grids
    description='Type:',
    style={'description_width': 'initial'}
)

# Grid selection (for MongoDB)
grid_label = widgets.HTML("<b>Grid:</b>")
grid_dropdown = Dropdown(
    options=[("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)],
    value=None,
    description='Grid:',
    style={'description_width': 'initial'},
    layout=Layout(width='500px')
)

load_grid_button = Button(
    description='Load Grid',
    button_style='info',
    icon='folder-open',
    layout=Layout(width='120px')
)

# Analysis type
analysis_type = Dropdown(
    options=[
        ('Descriptive', 'descriptive'),
        ('Correlation', 'correlation'),
        ('Trends', 'trends'),
        ('Patterns', 'patterns'),
        ('Multivariate', 'multivariate'),
        ('Time Series', 'time_series'),
        ('Regression', 'regression'),
        ('Non-Parametric', 'nonparametric')
    ],
    value='descriptive',
    description='Analysis:',
    style={'description_width': 'initial'}
)

# Signal selector (will be updated when grid is loaded)
signal_label = widgets.HTML("<b>Signals:</b>")
signal_selector = SelectMultiple(
    options=[('Temperature', 'temperature'), ('Power', 'power'), ('Density', 'density'), ('Stress', 'stress')],
    value=('temperature', 'power'),
    description='Signals:',
    style={'description_width': 'initial'}
)

execute_button = Button(
    description='Execute Analysis',
    button_style='success',
    icon='play',
    layout=Layout(width='160px')
)

compare_button = Button(
    description='Compare Methods',
    button_style='',
    icon='copy',
    layout=Layout(width='160px')
)

top_panel = VBox([
    HBox([data_source_label, data_source_mode, analysis_type]),
    HBox([model_label, model_dropdown, grid_type_label, grid_type_filter]),
    HBox([grid_label, grid_dropdown, load_grid_button]),
    HBox([signal_label, signal_selector, execute_button, compare_button])
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Left Panel: Analysis Configuration
# ============================================

# Descriptive Statistics Section
desc_label = widgets.HTML("<b>Descriptive Statistics:</b>")
stat_mean = Checkbox(value=True, description='Mean', style={'description_width': 'initial'})
stat_median = Checkbox(value=True, description='Median', style={'description_width': 'initial'})
stat_std = Checkbox(value=True, description='Std', style={'description_width': 'initial'})
stat_min = Checkbox(value=True, description='Min', style={'description_width': 'initial'})
stat_max = Checkbox(value=True, description='Max', style={'description_width': 'initial'})
stat_percentiles = Checkbox(value=False, description='Percentiles', style={'description_width': 'initial'})
stat_skewness = Checkbox(value=False, description='Skewness', style={'description_width': 'initial'})
stat_kurtosis = Checkbox(value=False, description='Kurtosis', style={'description_width': 'initial'})

percentile_values = Text(
    value='25,50,75',
    description='Percentiles:',
    style={'description_width': 'initial'},
    layout=Layout(width='200px')
)

desc_section = VBox([
    desc_label,
    stat_mean,
    stat_median,
    stat_std,
    stat_min,
    stat_max,
    stat_percentiles,
    percentile_values,
    stat_skewness,
    stat_kurtosis
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Correlation Analysis Section
corr_label = widgets.HTML("<b>Correlation Analysis:</b>")
corr_method = RadioButtons(
    options=[('Pearson', 'pearson'), ('Spearman', 'spearman'), ('Kendall', 'kendall')],
    value='pearson',
    description='Method:',
    style={'description_width': 'initial'}
)
significance_level = FloatSlider(value=0.05, min=0.01, max=0.10, step=0.01, description='Significance:', style={'description_width': 'initial'})
show_heatmap = Checkbox(value=True, description='Show Heatmap', style={'description_width': 'initial'})

corr_section = VBox([
    corr_label,
    corr_method,
    significance_level,
    show_heatmap
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Trend Analysis Section
trend_label = widgets.HTML("<b>Trend Analysis:</b>")
trend_type = RadioButtons(
    options=[('Temporal', 'temporal'), ('Spatial', 'spatial'), ('Both', 'both')],
    value='temporal',
    description='Type:',
    style={'description_width': 'initial'}
)
trend_method = Dropdown(
    options=[('Linear', 'linear'), ('Polynomial', 'polynomial'), ('Moving Average', 'moving_avg')],
    value='linear',
    description='Method:',
    style={'description_width': 'initial'}
)
poly_degree = IntSlider(value=2, min=1, max=10, step=1, description='Degree:', style={'description_width': 'initial'})

trend_section = VBox([
    trend_label,
    trend_type,
    trend_method,
    poly_degree
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Pattern Detection Section
pattern_label = widgets.HTML("<b>Pattern Detection:</b>")
pattern_clusters = Checkbox(value=False, description='Clusters', style={'description_width': 'initial'})
pattern_periodicity = Checkbox(value=False, description='Periodicity', style={'description_width': 'initial'})
pattern_anomalies = Checkbox(value=False, description='Anomalies', style={'description_width': 'initial'})
clustering_method = Dropdown(
    options=[('K-Means', 'kmeans'), ('DBSCAN', 'dbscan'), ('Hierarchical', 'hierarchical')],
    value='kmeans',
    description='Clustering:',
    style={'description_width': 'initial'}
)
n_clusters = IntSlider(value=3, min=2, max=20, step=1, description='Clusters:', style={'description_width': 'initial'})

pattern_section = VBox([
    pattern_label,
    pattern_clusters,
    pattern_periodicity,
    pattern_anomalies,
    clustering_method,
    n_clusters
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Multivariate Analysis Section
multivar_label = widgets.HTML("<b>Multivariate Analysis:</b>")
multivar_method = RadioButtons(
    options=[('PCA', 'pca'), ('Clustering', 'clustering'), ('Dimensionality Reduction', 'dim_reduction')],
    value='pca',
    description='Method:',
    style={'description_width': 'initial'}
)
n_components = IntSlider(value=2, min=1, max=10, step=1, description='Components:', style={'description_width': 'initial'})

multivar_section = VBox([
    multivar_label,
    multivar_method,
    n_components
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Time Series Analysis Section
timeseries_label = widgets.HTML("<b>Time Series Analysis:</b>")
ts_trend = Checkbox(value=True, description='Trend', style={'description_width': 'initial'})
ts_seasonality = Checkbox(value=False, description='Seasonality', style={'description_width': 'initial'})
ts_autocorr = Checkbox(value=False, description='Autocorrelation', style={'description_width': 'initial'})
seasonal_period = IntSlider(value=24, min=1, max=365, step=1, description='Seasonal Period:', style={'description_width': 'initial'})

timeseries_section = VBox([
    timeseries_label,
    ts_trend,
    ts_seasonality,
    seasonal_period,
    ts_autocorr
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Regression Analysis Section
regression_label = widgets.HTML("<b>Regression Analysis:</b>")
regression_type = RadioButtons(
    options=[('Linear', 'linear'), ('Polynomial', 'polynomial'), ('Multiple', 'multiple')],
    value='linear',
    description='Type:',
    style={'description_width': 'initial'}
)
reg_poly_degree = IntSlider(value=2, min=1, max=10, step=1, description='Degree:', style={'description_width': 'initial'})

regression_section = VBox([
    regression_label,
    regression_type,
    reg_poly_degree
], layout=Layout(padding='5px', border='1px solid #ddd'))

# Show/hide sections based on analysis type
def update_analysis_sections(change):
    """Show/hide analysis sections based on type."""
    analysis = change['new']
    desc_section.layout.display = 'none'
    corr_section.layout.display = 'none'
    trend_section.layout.display = 'none'
    pattern_section.layout.display = 'none'
    multivar_section.layout.display = 'none'
    timeseries_section.layout.display = 'none'
    regression_section.layout.display = 'none'
    
    if analysis == 'descriptive':
        desc_section.layout.display = 'flex'
    elif analysis == 'correlation':
        corr_section.layout.display = 'flex'
    elif analysis == 'trends':
        trend_section.layout.display = 'flex'
    elif analysis == 'patterns':
        pattern_section.layout.display = 'flex'
    elif analysis == 'multivariate':
        multivar_section.layout.display = 'flex'
    elif analysis == 'time_series':
        timeseries_section.layout.display = 'flex'
    elif analysis == 'regression':
        regression_section.layout.display = 'flex'

analysis_type.observe(update_analysis_sections, names='value')
update_analysis_sections({'new': analysis_type.value})

left_panel = VBox([
    desc_section,
    corr_section,
    trend_section,
    pattern_section,
    multivar_section,
    timeseries_section,
    regression_section
], layout=Layout(width='300px', padding='10px', border='1px solid #ccc'))

# ============================================
# Center Panel: Visualization
# ============================================

viz_mode = RadioButtons(
    options=[('Results', 'results'), ('Comparison', 'comparison'), ('Distribution', 'distribution')],
    value='results',
    description='View:',
    style={'description_width': 'initial'}
)

viz_output = Output(layout=Layout(height='600px', overflow='auto'))

center_panel = VBox([
    widgets.HTML("<h3>Analysis Visualization</h3>"),
    viz_mode,
    viz_output
], layout=Layout(flex='1 1 auto', padding='10px', border='1px solid #ccc'))

# ============================================
# Right Panel: Results
# ============================================

# Statistical Summary
summary_label = widgets.HTML("<b>Statistical Summary:</b>")
summary_display = widgets.HTML("No analysis performed yet")
summary_section = VBox([
    summary_label,
    summary_display
], layout=Layout(padding='5px'))

# Correlation Results
corr_results_label = widgets.HTML("<b>Correlation Results:</b>")
corr_results_display = widgets.HTML("No correlation results")
corr_results_section = VBox([
    corr_results_label,
    corr_results_display
], layout=Layout(padding='5px'))

# Trend Results
trend_results_label = widgets.HTML("<b>Trend Results:</b>")
trend_results_display = widgets.HTML("No trend results")
trend_results_section = VBox([
    trend_results_label,
    trend_results_display
], layout=Layout(padding='5px'))

# Pattern Results
pattern_results_label = widgets.HTML("<b>Pattern Results:</b>")
pattern_results_display = widgets.HTML("No pattern results")
pattern_results_section = VBox([
    pattern_results_label,
    pattern_results_display
], layout=Layout(padding='5px'))

# Export Options
export_label = widgets.HTML("<b>Export:</b>")
export_results_button = Button(description='Export Results', button_style='', layout=Layout(width='150px'))
export_plots_button = Button(description='Export Plots', button_style='', layout=Layout(width='150px'))
export_report_button = Button(description='Export Report', button_style='', layout=Layout(width='150px'))
save_config_button = Button(description='Save Config', button_style='', layout=Layout(width='150px'))

export_section = VBox([
    export_label,
    export_results_button,
    export_plots_button,
    export_report_button,
    save_config_button
], layout=Layout(padding='5px'))

right_panel = VBox([
    summary_section,
    corr_results_section,
    trend_results_section,
    pattern_results_section,
    export_section
], layout=Layout(width='250px', padding='10px', border='1px solid #ccc'))

# ============================================
# Bottom Panel: Status and Progress
# ============================================

status_display = widgets.HTML("<b>Status:</b> Ready to analyze")
progress_bar = widgets.IntProgress(
    value=0,
    min=0,
    max=100,
    description='Progress:',
    bar_style='info',
    layout=Layout(width='100%')
)
info_display = widgets.HTML("")

bottom_panel = VBox([
    status_display,
    progress_bar,
    info_display
], layout=Layout(padding='10px', border='1px solid #ccc'))

# ============================================
# Analysis Functions
# ============================================

# ============================================
# Helper Functions for MongoDB
# ============================================

def update_grid_dropdown(change=None):
    """Update grid dropdown when model or grid type changes."""
    global current_model_id
    
    model_id = model_dropdown.value
    grid_type = grid_type_filter.value
    
    if not model_id:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)]
        return
    
    current_model_id = model_id
    
    if not voxel_storage:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ MongoDB not available ‚îÅ‚îÅ‚îÅ", None)]
        return
    
    try:
        # Get all grids for this model
        grids = voxel_storage.list_grids(model_id=model_id, limit=100)
        
        grid_options = [("‚îÅ‚îÅ‚îÅ Select Grid ‚îÅ‚îÅ‚îÅ", None)]
        for grid in grids:
            metadata = grid.get('metadata', {})
            config_meta = metadata.get('configuration_metadata', {})
            if not config_meta:
                config_meta = metadata
            
            # Determine grid type
            is_fused = config_meta.get('fusion_applied', False)
            is_corrected = config_meta.get('correction_applied', False)
            is_processed = config_meta.get('processing_applied', False)
            has_signals = len(grid.get('available_signals', [])) > 0
            
            grid_type_match = False
            if grid_type == 'all':
                grid_type_match = True
            elif grid_type == 'fused' and is_fused:
                grid_type_match = True
            elif grid_type == 'corrected' and is_corrected:
                grid_type_match = True
            elif grid_type == 'processed' and is_processed:
                grid_type_match = True
            elif grid_type == 'signal_mapped' and has_signals and not is_corrected and not is_processed and not is_fused:
                grid_type_match = True
            elif grid_type == 'raw' and not has_signals:
                grid_type_match = True
            
            if grid_type_match:
                grid_id = grid.get('grid_id', str(grid.get('_id', '')))
                grid_name = grid.get('grid_name', 'Unknown')
                n_signals = len(grid.get('available_signals', []))
                
                # Build status label
                status_parts = []
                if is_fused:
                    status_parts.append('fused')
                if is_corrected:
                    status_parts.append('corrected')
                if is_processed:
                    status_parts.append('processed')
                if has_signals and not status_parts:
                    status_parts.append('mapped')
                if not status_parts:
                    status_parts.append('raw')
                
                status_str = ', '.join(status_parts)
                label = f"{grid_name} ({n_signals} signal(s), {status_str}) ({grid_id[:8]}...)"
                grid_options.append((label, grid_id))
        
        if len(grid_options) == 1:
            grid_options.append(("No grids found matching filter", None))
        
        grid_dropdown.options = grid_options
    except Exception as e:
        grid_dropdown.options = [("‚îÅ‚îÅ‚îÅ Error loading grids ‚îÅ‚îÅ‚îÅ", None)]
        print(f"‚ö†Ô∏è Error loading grids: {e}")

def load_grid_from_mongodb(button):
    """Load selected grid from MongoDB."""
    global analysis_data, current_model_id, current_grid_id, loaded_grid_data, signal_arrays
    
    if not voxel_storage or not grid_dropdown.value:
        status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Please select a grid to load</span>"
        return
    
    grid_id = grid_dropdown.value
    current_grid_id = grid_id
    
    status_display.value = "<b>Status:</b> Loading grid from MongoDB..."
    progress_bar.value = 0
    
    try:
        # Load grid from MongoDB
        grid_data = voxel_storage.load_voxel_grid(grid_id=grid_id)
        
        if not grid_data:
            status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Failed to load grid</span>"
            return
        
        # Extract data from dictionary
        signal_arrays = grid_data.get('signal_arrays', {})
        metadata = grid_data.get('metadata', {})
        grid_name = grid_data.get('grid_name', 'Unknown')
        
        if not signal_arrays or len(signal_arrays) == 0:
            status_display.value = "<b>Status:</b> <span style='color: orange;'>‚ö†Ô∏è Grid has no signals</span>"
            signal_selector.options = []
            return
        
        # Store loaded data
        loaded_grid_data = {
            'grid_data': grid_data,
            'metadata': metadata,
            'signal_arrays': signal_arrays
        }
        
        # Update signal selector with available signals
        signal_options = [(name, name) for name in sorted(signal_arrays.keys())]
        signal_selector.options = signal_options
        if len(signal_options) > 0:
            # Select first few signals by default
            default_signals = signal_options[:min(4, len(signal_options))]
            signal_selector.value = tuple([s[1] for s in default_signals])
        
        progress_bar.value = 100
        status_display.value = f"<b>Status:</b> <span style='color: green;'>‚úÖ Loaded grid: {grid_name} ({len(signal_arrays)} signal(s))</span>"
        
    except Exception as e:
        status_display.value = f"<b>Status:</b> <span style='color: red;'>‚ùå Error loading grid: {str(e)}</span>"
        progress_bar.value = 0
        import traceback
        traceback.print_exc()

# Function to update UI based on data source mode
def update_data_source_mode(change):
    """Show/hide MongoDB widgets based on data source mode."""
    if change['new'] == 'mongodb':
        model_dropdown.layout.display = 'flex'
        grid_type_filter.layout.display = 'flex'
        grid_dropdown.layout.display = 'flex'
        load_grid_button.layout.display = 'flex'
    else:
        model_dropdown.layout.display = 'none'
        grid_type_filter.layout.display = 'none'
        grid_dropdown.layout.display = 'none'
        load_grid_button.layout.display = 'none'

# Connect events
data_source_mode.observe(update_data_source_mode, names='value')
update_data_source_mode({'new': data_source_mode.value})
model_dropdown.observe(update_grid_dropdown, names='value')
grid_type_filter.observe(update_grid_dropdown, names='value')
load_grid_button.on_click(load_grid_from_mongodb)

# ============================================
# Analysis Functions
# ============================================

def execute_analysis(button):
    """Execute statistical analysis based on current settings."""
    global analysis_data, analysis_results, loaded_grid_data, signal_arrays
    
    status_display.value = "<b>Status:</b> Analyzing data..."
    progress_bar.value = 0
    info_display.value = ""
    
    try:
        # Generate sample data
        # Load data based on mode
        if data_source_mode.value == 'mongodb':
            if not loaded_grid_data or not signal_arrays:
                status_display.value = "<b>Status:</b> <span style='color: red;'>‚ö†Ô∏è Please load a grid from MongoDB first</span>"
                return
            
            # Extract selected signals from loaded grid
            selected_signals = list(signal_selector.value)
            if not selected_signals:
                info_display.value = "<span style='color: orange;'>‚ö†Ô∏è Please select at least one signal</span>"
                return
            
            # Convert signal arrays to analysis format (flattened arrays)
            analysis_data = {}
            for signal_name in selected_signals:
                if signal_name in signal_arrays:
                    signal_array = signal_arrays[signal_name]
                    # Flatten 3D array to 1D for statistical analysis
                    analysis_data[signal_name] = signal_array.flatten()
                else:
                    info_display.value = f"<span style='color: orange;'>‚ö†Ô∏è Signal '{signal_name}' not found in grid</span>"
                    return
            
            # Add metadata for reference
            analysis_data['_metadata'] = loaded_grid_data.get('metadata', {})
            progress_bar.value = 20
        else:
            # Use sample data
            analysis_data = generate_sample_signals()
            selected_signals = list(signal_selector.value)
            if not selected_signals:
                info_display.value = "<span style='color: orange;'>‚ö†Ô∏è Please select at least one signal</span>"
                return
            progress_bar.value = 20
        
        analysis = analysis_type.value
        progress_bar.value = 40
        
        # Perform analysis based on type
        if analysis == 'descriptive':
            results = perform_descriptive_analysis(selected_signals)
        elif analysis == 'correlation':
            results = perform_correlation_analysis(selected_signals)
        elif analysis == 'trends':
            results = perform_trend_analysis(selected_signals)
        elif analysis == 'patterns':
            results = perform_pattern_analysis(selected_signals)
        elif analysis == 'multivariate':
            results = perform_multivariate_analysis(selected_signals)
        elif analysis == 'time_series':
            results = perform_timeseries_analysis(selected_signals)
        elif analysis == 'regression':
            results = perform_regression_analysis(selected_signals)
        else:
            results = {}
        
        analysis_results = results
        progress_bar.value = 80
        
        # Update displays
        update_results_display()
        update_visualization()
        
        progress_bar.value = 100
        status_display.value = "<b>Status:</b> <span style='color: green;'>‚úÖ Analysis completed</span>"
        info_display.value = f"<p>Analysis type: <b>{analysis}</b> | Signals: <b>{', '.join(selected_signals)}</b></p>"
        
    except Exception as e:
        info_display.value = f"<span style='color: red;'>‚ùå Error: {str(e)}</span>"
        status_display.value = f"<b>Status:</b> <span style='color: red;'>Error during analysis</span>"
        progress_bar.value = 0

def perform_descriptive_analysis(signals):
    """Perform descriptive statistics analysis."""
    results = {}
    for signal_name in signals:
        signal_data = analysis_data[signal_name]
        stats_dict = {}
        
        if stat_mean.value:
            stats_dict['mean'] = np.mean(signal_data)
        if stat_median.value:
            stats_dict['median'] = np.median(signal_data)
        if stat_std.value:
            stats_dict['std'] = np.std(signal_data)
        if stat_min.value:
            stats_dict['min'] = np.min(signal_data)
        if stat_max.value:
            stats_dict['max'] = np.max(signal_data)
        if stat_percentiles.value:
            percentiles = [float(p) for p in percentile_values.value.split(',')]
            stats_dict['percentiles'] = {p: np.percentile(signal_data, p) for p in percentiles}
        if stat_skewness.value:
            stats_dict['skewness'] = stats.skew(signal_data)
        if stat_kurtosis.value:
            stats_dict['kurtosis'] = stats.kurtosis(signal_data)
        
        results[signal_name] = stats_dict
    
    return results

def perform_correlation_analysis(signals):
    """Perform correlation analysis."""
    if len(signals) < 2:
        return {}
    
    results = {}
    corr_matrix = np.zeros((len(signals), len(signals)))
    
    for i, sig1 in enumerate(signals):
        for j, sig2 in enumerate(signals):
            data1 = analysis_data[sig1]
            data2 = analysis_data[sig2]
            
            if corr_method.value == 'pearson':
                corr, p_value = pearsonr(data1, data2)
            elif corr_method.value == 'spearman':
                corr, p_value = spearmanr(data1, data2)
            else:  # kendall
                corr, p_value = kendalltau(data1, data2)
            
            corr_matrix[i, j] = corr
            if i != j:
                results[f'{sig1}-{sig2}'] = {
                    'correlation': corr,
                    'p_value': p_value,
                    'significant': p_value < significance_level.value
                }
    
    results['matrix'] = corr_matrix
    results['signals'] = signals
    return results

def perform_trend_analysis(signals):
    """Perform trend analysis."""
    results = {}
    x = analysis_data['x']
    
    for signal_name in signals:
        signal_data = analysis_data[signal_name]
        
        if trend_method.value == 'linear':
            coeffs = np.polyfit(x, signal_data, 1)
            trend_line = np.polyval(coeffs, x)
            results[signal_name] = {
                'slope': coeffs[0],
                'intercept': coeffs[1],
                'trend_line': trend_line
            }
        elif trend_method.value == 'polynomial':
            coeffs = np.polyfit(x, signal_data, poly_degree.value)
            trend_line = np.polyval(coeffs, x)
            results[signal_name] = {
                'coefficients': coeffs,
                'trend_line': trend_line
            }
        else:  # moving average
            window = 50
            trend_line = np.convolve(signal_data, np.ones(window)/window, mode='same')
            results[signal_name] = {
                'trend_line': trend_line
            }
    
    results['x'] = x
    return results

def perform_pattern_analysis(signals):
    """Perform pattern detection analysis."""
    results = {}
    # Simplified pattern detection
    for signal_name in signals:
        signal_data = analysis_data[signal_name]
        results[signal_name] = {
            'has_clusters': pattern_clusters.value,
            'has_periodicity': pattern_periodicity.value,
            'has_anomalies': pattern_anomalies.value
        }
    return results

def perform_multivariate_analysis(signals):
    """Perform multivariate analysis."""
    from sklearn.decomposition import PCA
    from sklearn.cluster import KMeans
    
    # Prepare data matrix
    data_matrix = np.column_stack([analysis_data[s] for s in signals])
    
    if multivar_method.value == 'pca':
        pca = PCA(n_components=n_components.value)
        pca_result = pca.fit_transform(data_matrix)
        return {
            'method': 'PCA',
            'components': pca_result,
            'explained_variance': pca.explained_variance_ratio_,
            'n_components': n_components.value
        }
    elif multivar_method.value == 'clustering':
        kmeans = KMeans(n_clusters=n_clusters.value, random_state=42)
        clusters = kmeans.fit_predict(data_matrix)
        return {
            'method': 'Clustering',
            'clusters': clusters,
            'n_clusters': n_clusters.value
        }
    return {}

def perform_timeseries_analysis(signals):
    """Perform time series analysis."""
    results = {}
    x = analysis_data['x']
    
    for signal_name in signals:
        signal_data = analysis_data[signal_name]
        result = {}
        
        if ts_trend.value:
            coeffs = np.polyfit(x, signal_data, 1)
            result['trend_slope'] = coeffs[0]
        
        if ts_seasonality.value:
            # Simple seasonality detection
            result['seasonal_period'] = seasonal_period.value
        
        results[signal_name] = result
    
    return results

def perform_regression_analysis(signals):
    """Perform regression analysis."""
    if len(signals) < 1:
        return {}
    
    x = analysis_data['x']
    y = analysis_data[signals[0]]
    
    if regression_type.value == 'linear':
        coeffs = np.polyfit(x, y, 1)
        y_pred = np.polyval(coeffs, x)
        r2 = np.corrcoef(y, y_pred)[0, 1]**2
        return {
            'type': 'linear',
            'coefficients': coeffs,
            'r_squared': r2,
            'y_pred': y_pred
        }
    elif regression_type.value == 'polynomial':
        coeffs = np.polyfit(x, y, reg_poly_degree.value)
        y_pred = np.polyval(coeffs, x)
        r2 = np.corrcoef(y, y_pred)[0, 1]**2
        return {
            'type': 'polynomial',
            'degree': reg_poly_degree.value,
            'coefficients': coeffs,
            'r_squared': r2,
            'y_pred': y_pred
        }
    return {}

def update_results_display():
    """Update results displays."""
    global analysis_results
    
    if not analysis_results:
        return
    
    analysis = analysis_type.value
    
    # Statistical summary
    if analysis == 'descriptive':
        summary_html = "<table border='1' style='border-collapse: collapse; width: 100%;'><tr><th>Signal</th><th>Metric</th><th>Value</th></tr>"
        for signal, stats_dict in analysis_results.items():
            for metric, value in stats_dict.items():
                if metric != 'percentiles':
                    summary_html += f"<tr><td>{signal}</td><td>{metric}</td><td>{value:.4f}</td></tr>"
        summary_html += "</table>"
        summary_display.value = summary_html
    
    # Correlation results
    if analysis == 'correlation':
        corr_html = "<table border='1' style='border-collapse: collapse; width: 100%;'><tr><th>Pair</th><th>Correlation</th><th>P-value</th><th>Significant</th></tr>"
        for key, value in analysis_results.items():
            if key not in ['matrix', 'signals']:
                sig = 'Yes' if value['significant'] else 'No'
                corr_html += f"<tr><td>{key}</td><td>{value['correlation']:.4f}</td><td>{value['p_value']:.4f}</td><td>{sig}</td></tr>"
        corr_html += "</table>"
        corr_results_display.value = corr_html
    
    # Trend results
    if analysis == 'trends':
        trend_html = "<ul>"
        for signal, result in analysis_results.items():
            if signal != 'x':
                if 'slope' in result:
                    trend_html += f"<li><b>{signal}:</b> Slope = {result['slope']:.4f}</li>"
        trend_html += "</ul>"
        trend_results_display.value = trend_html

def update_visualization():
    """Update visualization display."""
    global analysis_results, analysis_data
    
    with viz_output:
        clear_output(wait=True)
        
        if not analysis_results:
            display(HTML("<p>Execute analysis to see visualization</p>"))
            return
        
        analysis = analysis_type.value
        mode = viz_mode.value
        
        if analysis == 'descriptive' and mode == 'results':
            fig, axes = plt.subplots(2, 2, figsize=(12, 10))
            signals = list(signal_selector.value)
            
            for idx, signal_name in enumerate(signals[:4]):
                ax = axes[idx // 2, idx % 2]
                signal_data = analysis_data[signal_name]
                ax.hist(signal_data, bins=50, alpha=0.7, edgecolor='black')
                ax.set_title(f'{signal_name.capitalize()} Distribution')
                ax.set_xlabel('Value')
                ax.set_ylabel('Frequency')
            
            plt.tight_layout()
            plt.show()
        
        elif analysis == 'correlation' and mode == 'results':
            if 'matrix' in analysis_results:
                fig, ax = plt.subplots(figsize=(8, 6))
                im = ax.imshow(analysis_results['matrix'], cmap='coolwarm', vmin=-1, vmax=1)
                signals = analysis_results['signals']
                ax.set_xticks(range(len(signals)))
                ax.set_yticks(range(len(signals)))
                ax.set_xticklabels(signals, rotation=45)
                ax.set_yticklabels(signals)
                ax.set_title('Correlation Matrix')
                plt.colorbar(im, ax=ax, label='Correlation')
                plt.tight_layout()
                plt.show()
        
        elif analysis == 'trends' and mode == 'results':
            fig, axes = plt.subplots(len(signal_selector.value), 1, figsize=(12, 4 * len(signal_selector.value)))
            if len(signal_selector.value) == 1:
                axes = [axes]
            
            x = analysis_results.get('x', analysis_data['x'])
            for idx, signal_name in enumerate(signal_selector.value):
                signal_data = analysis_data[signal_name]
                axes[idx].plot(x, signal_data, alpha=0.5, label='Data')
                if signal_name in analysis_results and 'trend_line' in analysis_results[signal_name]:
                    axes[idx].plot(x, analysis_results[signal_name]['trend_line'], 
                                  'r-', linewidth=2, label='Trend')
                axes[idx].set_title(f'{signal_name.capitalize()} Trend Analysis')
                axes[idx].set_xlabel('X')
                axes[idx].set_ylabel('Value')
                axes[idx].legend()
                axes[idx].grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.show()
        
        elif analysis == 'regression' and mode == 'results':
            if 'y_pred' in analysis_results:
                fig, ax = plt.subplots(figsize=(10, 6))
                signal_name = signal_selector.value[0]
                x = analysis_data['x']
                y = analysis_data[signal_name]
                ax.scatter(x, y, alpha=0.5, label='Data')
                ax.plot(x, analysis_results['y_pred'], 'r-', linewidth=2, 
                       label=f"Regression (R¬≤={analysis_results['r_squared']:.3f})")
                ax.set_xlabel('X')
                ax.set_ylabel('Y')
                ax.set_title('Regression Analysis')
                ax.legend()
                ax.grid(True, alpha=0.3)
                plt.tight_layout()
                plt.show()
        
        elif mode == 'distribution':
            fig, axes = plt.subplots(1, len(signal_selector.value), figsize=(5 * len(signal_selector.value), 4))
            if len(signal_selector.value) == 1:
                axes = [axes]
            
            for idx, signal_name in enumerate(signal_selector.value):
                signal_data = analysis_data[signal_name]
                axes[idx].boxplot(signal_data)
                axes[idx].set_title(f'{signal_name.capitalize()} Box Plot')
                axes[idx].set_ylabel('Value')
            
            plt.tight_layout()
            plt.show()

# Connect events
execute_button.on_click(execute_analysis)
viz_mode.observe(lambda x: update_visualization(), names='value')
analysis_type.observe(lambda x: update_visualization(), names='value')

# ============================================
# Main Layout
# ============================================

main_layout = VBox([
    top_panel,
    HBox([left_panel, center_panel, right_panel]),
    bottom_panel
])

# Display the interface
display(main_layout)


VBox(children=(VBox(children=(HBox(children=(HTML(value='<b>Data Source:</b>'), RadioButtons(description='Sour‚Ä¶

## Summary

Congratulations! You've learned how to perform comprehensive statistical analysis on voxel data.

### Key Takeaways

1. **Descriptive Statistics**: Mean, median, std, percentiles, skewness, kurtosis
2. **Correlation Analysis**: Pearson, Spearman, Kendall correlation methods
3. **Trend Analysis**: Temporal, spatial, linear, polynomial, moving average trends
4. **Pattern Detection**: Clusters, periodicity, anomalies
5. **Multivariate Analysis**: PCA, clustering, dimensionality reduction
6. **Time Series Analysis**: Trends, seasonality, autocorrelation
7. **Regression Analysis**: Linear, polynomial, multiple regression

### Next Steps

Proceed to:
- **10_Sensitivity_Analysis.ipynb** - Learn sensitivity analysis methods
- **11_Anomaly_Detection.ipynb** - Learn anomaly detection techniques

### Related Resources

- Statistical Analysis Documentation: `../docs/AM_QADF/05-modules/analytics.md`
- API Reference: `../docs/AM_QADF/06-api-reference/analytics-api.md`
- Examples: `../examples/`
