# Preprocessing and Data Cleaning

This notebook demonstrates how to preprocess and clean segmented XCT data before analysis. Learn how to:

- **Load segmented data** from CSV/Excel files
- **Filter objects** by volume, sphericity, spatial bounds, and aspect ratio
- **Remove edge objects** and artifacts
- **Analyze object properties** interactively
- **Fit statistical distributions** (Gaussian, Poisson)
- **Assess data quality** after filtering

## üéØ Learning Objectives

By the end of this notebook, you will be able to:
1. Load segmented data from various formats
2. Apply multiple filters to clean data
3. Analyze object properties (volume, sphericity, aspect ratio)
4. Fit statistical distributions to object properties
5. Evaluate filtering effectiveness

## ‚ö†Ô∏è Prerequisites

- **Notebook 01**: Basic understanding of loading and segmenting volumes
- **Required packages**: Same as Notebook 01
- **Segmented data**: CSV/Excel files with object coordinates or binary volumes

## üìñ Usage

1. Run all cells to initialize the widgets
2. Load segmented data (CSV/Excel or binary volume)
3. Configure filter parameters
4. Apply filters and view results
5. Analyze object properties
6. Fit distributions to filtered data


## 1. Setup and Imports


In [1]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys
import warnings
from typing import Dict, List, Optional, Tuple, Any

warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Check for ipywidgets
try:
    import ipywidgets as widgets
    from ipywidgets import HBox, VBox, Output, Tab, interactive
    from IPython.display import display, clear_output, HTML
    WIDGETS_AVAILABLE = True
except ImportError:
    WIDGETS_AVAILABLE = False
    print("‚ùå ipywidgets not available!")
    print("   Install with: pip install ipywidgets")

# Find project root
current_dir = Path().resolve()
if current_dir.name == 'notebooks':
    project_root = current_dir.parent
elif (current_dir / 'src').exists():
    project_root = current_dir
else:
    project_root = current_dir

# Add to path
sys.path.insert(0, str(project_root))
sys.path.insert(0, str(project_root / 'src'))

print("üì¶ Preprocessing and Data Cleaning")
print(f"   Project root: {project_root}")
print(f"   Widgets available: {WIDGETS_AVAILABLE}")


üì¶ Preprocessing and Data Cleaning
   Project root: /mnt/c/Users/kanha/Independent_Research/pbf-lbm-nosql-data-warehouse/XCT_Thermomagnetic_Analysis
   Widgets available: True


## 2. Load Framework Modules


In [None]:
# Load preprocessing modules
try:
    from src.analyzer import XCTAnalyzer
    from src.preprocessing.preprocessing import (
        filter_by_volume, filter_by_sphericity, filter_by_spatial_bounds,
        filter_by_aspect_ratio, remove_edge_objects, apply_filters,
        analyze_object_properties
    )
    from src.preprocessing.statistics import (
        fit_gaussian, fit_poisson, compare_fits, evaluate_fit_quality
    )
    from src.utils.utils import load_volume, load_segmented_data, normalize_path, normalize_path
    from scipy import ndimage
    
    print("‚úÖ All modules loaded successfully")
except ImportError as e:
    print(f"‚ùå Error loading modules: {e}")
    import traceback
    traceback.print_exc()
    raise


‚úÖ All modules loaded successfully


## 3. Interactive Preprocessing Dashboard

Use the interactive widgets below to load, filter, and analyze segmented data.


In [3]:
if not WIDGETS_AVAILABLE:
    print("‚ùå Cannot create widgets - ipywidgets not available")
else:
    print("üé® Creating interactive widgets...")
    
    # Initialize state
    analyzer = None
    current_volume = None
    filtered_volume = None
    object_properties = []
    filter_stats = {}
    
    # ============================================
    # Section 1: Data Loading
    # ============================================
    
    data_source = widgets.Dropdown(
        options=['CSV/Excel File', 'Binary Volume File'],
        value='CSV/Excel File',
        description='Data Source:',
        style={'description_width': 'initial'}
    )
    
    file_path_text = widgets.Text(
        value='',
        placeholder='Enter file path (e.g., ../data/segmented_data/Sample_01_segmented.csv)',
        description='File Path:',
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='500px')
    )
    
    # For CSV/Excel files
    coord_columns_text = widgets.Text(
        value='x,y,z',
        placeholder='x,y,z or x_col,y_col,z_col',
        description='Coordinate Columns:',
        style={'description_width': 'initial'},
        disabled=False
    )
    
    value_column_text = widgets.Text(
        value='value',
        placeholder='value or intensity',
        description='Value Column:',
        style={'description_width': 'initial'},
        disabled=False
    )
    
    # For binary volumes
    file_format_dropdown = widgets.Dropdown(
        options=['Auto-detect', 'DICOM', 'TIFF', 'RAW', 'NIfTI'],
        value='Auto-detect',
        description='Format:',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    # Voxel size
    voxel_size_x = widgets.FloatText(value=0.1, description='Voxel X (mm):', style={'description_width': 'initial'})
    voxel_size_y = widgets.FloatText(value=0.1, description='Voxel Y (mm):', style={'description_width': 'initial'})
    voxel_size_z = widgets.FloatText(value=0.1, description='Voxel Z (mm):', style={'description_width': 'initial'})
    
    load_button = widgets.Button(
        description='üìÇ Load Data',
        button_style='primary',
        layout=widgets.Layout(width='150px', height='40px')
    )
    
    data_info_display = widgets.HTML(
        value="<p><i>No data loaded</i></p>",
        layout=widgets.Layout(height='120px', overflow='auto')
    )
    
    # ============================================
    # Section 2: Filtering Controls
    # ============================================
    
    # Volume filter
    filter_volume_check = widgets.Checkbox(
        value=False,
        description='Filter by Volume',
        indent=False
    )
    min_volume = widgets.FloatText(
        value=0.0,
        description='Min Volume (mm¬≥):',
        style={'description_width': 'initial'},
        disabled=True
    )
    max_volume = widgets.FloatText(
        value=1000.0,
        description='Max Volume (mm¬≥):',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    # Sphericity filter
    filter_sphericity_check = widgets.Checkbox(
        value=False,
        description='Filter by Sphericity',
        indent=False
    )
    min_sphericity = widgets.FloatSlider(
        value=0.0,
        min=0.0,
        max=1.0,
        step=0.01,
        description='Min Sphericity:',
        style={'description_width': 'initial'},
        disabled=True
    )
    max_sphericity = widgets.FloatSlider(
        value=1.0,
        min=0.0,
        max=1.0,
        step=0.01,
        description='Max Sphericity:',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    # Spatial bounds filter
    filter_spatial_check = widgets.Checkbox(
        value=False,
        description='Filter by Spatial Bounds',
        indent=False
    )
    x_min = widgets.FloatText(value=0.0, description='X Min (mm):', style={'description_width': 'initial'}, disabled=True)
    x_max = widgets.FloatText(value=100.0, description='X Max (mm):', style={'description_width': 'initial'}, disabled=True)
    y_min = widgets.FloatText(value=0.0, description='Y Min (mm):', style={'description_width': 'initial'}, disabled=True)
    y_max = widgets.FloatText(value=100.0, description='Y Max (mm):', style={'description_width': 'initial'}, disabled=True)
    z_min = widgets.FloatText(value=0.0, description='Z Min (mm):', style={'description_width': 'initial'}, disabled=True)
    z_max = widgets.FloatText(value=100.0, description='Z Max (mm):', style={'description_width': 'initial'}, disabled=True)
    
    # Aspect ratio filter
    filter_aspect_ratio_check = widgets.Checkbox(
        value=False,
        description='Filter by Aspect Ratio',
        indent=False
    )
    max_aspect_ratio = widgets.FloatText(
        value=10.0,
        description='Max Aspect Ratio:',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    # Edge removal
    remove_edge_check = widgets.Checkbox(
        value=False,
        description='Remove Edge Objects',
        indent=False
    )
    edge_margin = widgets.IntText(
        value=1,
        description='Edge Margin (voxels):',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    apply_filters_button = widgets.Button(
        description='üîß Apply Filters',
        button_style='success',
        layout=widgets.Layout(width='150px', height='40px')
    )
    
    filter_stats_display = widgets.HTML(
        value="<p><i>No filters applied</i></p>",
        layout=widgets.Layout(height='150px', overflow='auto')
    )
    
    # ============================================
    # Section 3: Object Properties
    # ============================================
    
    analyze_properties_button = widgets.Button(
        description='üìä Analyze Properties',
        button_style='info',
        layout=widgets.Layout(width='150px')
    )
    
    properties_output = Output(layout=widgets.Layout(height='400px', overflow='auto'))
    
    # ============================================
    # Section 4: Statistical Fitting
    # ============================================
    
    variable_to_fit = widgets.Dropdown(
        options=['Select variable...'],
        value='Select variable...',
        description='Variable:',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    distribution_type = widgets.Dropdown(
        options=['Gaussian', 'Poisson', 'Compare All'],
        value='Gaussian',
        description='Distribution:',
        style={'description_width': 'initial'},
        disabled=True
    )
    
    fit_button = widgets.Button(
        description='üìà Fit Distribution',
        button_style='warning',
        layout=widgets.Layout(width='150px'),
        disabled=True
    )
    
    fitting_results_output = Output(layout=widgets.Layout(height='500px', overflow='auto'))
    
    # ============================================
    # Progress and Status
    # ============================================
    
    progress_bar = widgets.IntProgress(
        value=0,
        min=0,
        max=100,
        description='Progress:',
        style={'bar_color': '#2ecc71'},
        layout=widgets.Layout(width='400px')
    )
    
    status_display = widgets.HTML(
        value="<p>Ready</p>",
        layout=widgets.Layout(height='60px', overflow='auto')
    )
    
    print("‚úÖ Widgets created successfully!")


üé® Creating interactive widgets...
‚úÖ Widgets created successfully!


In [4]:
if WIDGETS_AVAILABLE:
    
    def update_data_source(change):
        """Enable/disable widgets based on data source"""
        if change['new'] == 'CSV/Excel File':
            coord_columns_text.disabled = False
            value_column_text.disabled = False
            file_format_dropdown.disabled = True
        else:
            coord_columns_text.disabled = True
            value_column_text.disabled = True
            file_format_dropdown.disabled = False
    
    def update_volume_filter(change):
        """Enable/disable volume filter controls"""
        if change['new']:
            min_volume.disabled = False
            max_volume.disabled = False
        else:
            min_volume.disabled = True
            max_volume.disabled = True
    
    def update_sphericity_filter(change):
        """Enable/disable sphericity filter controls"""
        if change['new']:
            min_sphericity.disabled = False
            max_sphericity.disabled = False
        else:
            min_sphericity.disabled = True
            max_sphericity.disabled = True
    
    def update_spatial_filter(change):
        """Enable/disable spatial filter controls"""
        if change['new']:
            x_min.disabled = False
            x_max.disabled = False
            y_min.disabled = False
            y_max.disabled = False
            z_min.disabled = False
            z_max.disabled = False
        else:
            x_min.disabled = True
            x_max.disabled = True
            y_min.disabled = True
            y_max.disabled = True
            z_min.disabled = True
            z_max.disabled = True
    
    def update_aspect_ratio_filter(change):
        """Enable/disable aspect ratio filter controls"""
        if change['new']:
            max_aspect_ratio.disabled = False
        else:
            max_aspect_ratio.disabled = True
    
    def update_edge_removal(change):
        """Enable/disable edge margin control"""
        if change['new']:
            edge_margin.disabled = False
        else:
            edge_margin.disabled = True
    
    def load_data_callback(button):
        """Load segmented data"""
        global analyzer, current_volume, filtered_volume
        
        file_path = file_path_text.value.strip()
        if not file_path:
            status_display.value = "<p style='color: red;'>Please enter a file path</p>"
            return
        
        file_path_obj = Path(file_path)
        if not file_path_obj.exists():
            data_path = project_root / 'data' / file_path
            if data_path.exists():
                file_path_obj = data_path
            else:
                status_display.value = f"<p style='color: red;'>File not found: {file_path}</p>"
                return
        
        status_display.value = "<p>Loading data...</p>"
        progress_bar.value = 20
        
        try:
            voxel_size = (float(voxel_size_x.value), float(voxel_size_y.value), float(voxel_size_z.value))
            analyzer = XCTAnalyzer(voxel_size=voxel_size, target_unit='mm')
            progress_bar.value = 40
            
            if data_source.value == 'CSV/Excel File':
                # Load CSV/Excel
                coord_cols = [c.strip() for c in coord_columns_text.value.split(',')]
                value_col = value_column_text.value.strip()
                
                current_volume, metadata = load_segmented_data(
                    str(file_path_obj),
                    coordinate_columns=coord_cols if len(coord_cols) == 3 else None,
                    value_column=value_col if value_col else None,
                    voxel_size=voxel_size
                )
            else:
                # Load binary volume
                analyzer.load_volume(str(file_path_obj), normalize=True)
                current_volume = analyzer.volume
            
            filtered_volume = None
            progress_bar.value = 80
            
            # Count objects
            labeled, num_objects = ndimage.label(current_volume > 0)
            
            info_html = f"""
            <h4>Data Information</h4>
            <p><b>Shape:</b> {current_volume.shape}</p>
            <p><b>Data Type:</b> {current_volume.dtype}</p>
            <p><b>Voxel Size:</b> {voxel_size} mm</p>
            <p><b>Number of Objects:</b> {num_objects}</p>
            <p><b>Volume Size:</b> {current_volume.nbytes / (1024**2):.2f} MB</p>
            """
            data_info_display.value = info_html
            
            progress_bar.value = 100
            status_display.value = "<p style='color: green;'>‚úÖ Data loaded successfully!</p>"
            
        except Exception as e:
            status_display.value = f"<p style='color: red;'>Error loading data: {e}</p>"
            progress_bar.value = 0
            import traceback
            traceback.print_exc()
    
    def apply_filters_callback(button):
        """Apply preprocessing filters"""
        global analyzer, current_volume, filtered_volume, filter_stats
        
        if current_volume is None:
            status_display.value = "<p style='color: red;'>Please load data first</p>"
            return
        
        status_display.value = "<p>Applying filters...</p>"
        progress_bar.value = 20
        
        try:
            filters = {}
            
            if filter_volume_check.value:
                filters['min_volume'] = float(min_volume.value) if min_volume.value > 0 else None
                filters['max_volume'] = float(max_volume.value) if max_volume.value < 1e10 else None
            
            if filter_sphericity_check.value:
                filters['min_sphericity'] = float(min_sphericity.value)
                filters['max_sphericity'] = float(max_sphericity.value)
            
            if filter_spatial_check.value:
                filters['x_range'] = (float(x_min.value), float(x_max.value))
                filters['y_range'] = (float(y_min.value), float(y_max.value))
                filters['z_range'] = (float(z_min.value), float(z_max.value))
            
            if filter_aspect_ratio_check.value:
                filters['max_aspect_ratio'] = float(max_aspect_ratio.value)
            
            if remove_edge_check.value:
                filters['remove_edge_objects'] = True
                filters['edge_margin'] = int(edge_margin.value)
            
            if not filters:
                status_display.value = "<p style='color: orange;'>No filters selected</p>"
                progress_bar.value = 0
                return
            
            progress_bar.value = 40
            
            # Apply filters
            voxel_size = (float(voxel_size_x.value), float(voxel_size_y.value), float(voxel_size_z.value))
            filtered_volume, filter_stats = apply_filters(current_volume, voxel_size, filters)
            
            progress_bar.value = 80
            
            # Update display
            stats_html = f"""
            <h4>Filtering Statistics</h4>
            <p><b>Initial Objects:</b> {filter_stats['initial_objects']}</p>
            <p><b>Final Objects:</b> {filter_stats['final_objects']}</p>
            <p><b>Removed Objects:</b> {filter_stats['removed_objects']}</p>
            <p><b>Removal Rate:</b> {(filter_stats['removed_objects']/filter_stats['initial_objects']*100):.1f}%</p>
            <p><b>Filters Applied:</b> {', '.join(filter_stats['filters_applied'])}</p>
            """
            filter_stats_display.value = stats_html
            
            progress_bar.value = 100
            status_display.value = f"<p style='color: green;'>‚úÖ Filters applied! Removed {filter_stats['removed_objects']} objects</p>"
            
        except Exception as e:
            status_display.value = f"<p style='color: red;'>Error applying filters: {e}</p>"
            progress_bar.value = 0
            import traceback
            traceback.print_exc()
    
    def analyze_properties_callback(button):
        """Analyze object properties"""
        global object_properties, filtered_volume, current_volume
        
        volume_to_analyze = filtered_volume if filtered_volume is not None else current_volume
        if volume_to_analyze is None:
            status_display.value = "<p style='color: red;'>Please load data first</p>"
            return
        
        status_display.value = "<p>Analyzing object properties...</p>"
        progress_bar.value = 20
        
        try:
            voxel_size = (float(voxel_size_x.value), float(voxel_size_y.value), float(voxel_size_z.value))
            object_properties = analyze_object_properties(volume_to_analyze, voxel_size)
            progress_bar.value = 80
            
            with properties_output:
                clear_output()
                if object_properties:
                    df = pd.DataFrame(object_properties)
                    print(f"üìä Object Properties ({len(df)} objects):")
                    print("=" * 80)
                    display(df.head(50))  # Show first 50 objects
                    if len(df) > 50:
                        print(f"\n... and {len(df) - 50} more objects")
                    
                    # Summary statistics
                    print(f"\nüìà Summary Statistics:")
                    print("=" * 80)
                    numeric_cols = df.select_dtypes(include=[np.number]).columns
                    display(df[numeric_cols].describe())
                    
                    # Update variable options for fitting
                    options = ['Select variable...']
                    for col in numeric_cols:
                        if col not in ['object_id']:
                            options.append(col)
                    variable_to_fit.options = options
                    variable_to_fit.disabled = False
                    distribution_type.disabled = False
                    fit_button.disabled = False
                else:
                    print("No objects found")
            
            progress_bar.value = 100
            status_display.value = "<p style='color: green;'>‚úÖ Properties analyzed!</p>"
            
        except Exception as e:
            status_display.value = f"<p style='color: red;'>Error analyzing properties: {e}</p>"
            progress_bar.value = 0
            import traceback
            traceback.print_exc()
    
    def fit_distribution_callback(button):
        """Fit distribution to selected variable"""
        global object_properties
        
        if variable_to_fit.value == 'Select variable...' or not object_properties:
            status_display.value = "<p style='color: red;'>Please analyze properties first and select a variable</p>"
            return
        
        status_display.value = "<p>Fitting distribution...</p>"
        progress_bar.value = 20
        
        try:
            df = pd.DataFrame(object_properties)
            data = df[variable_to_fit.value].values
            data_clean = data[np.isfinite(data)]
            
            if len(data_clean) == 0:
                status_display.value = "<p style='color: red;'>No valid data points</p>"
                progress_bar.value = 0
                return
            
            progress_bar.value = 50
            
            # Perform fitting
            if distribution_type.value == 'Gaussian':
                fit_result = fit_gaussian(data_clean)
            elif distribution_type.value == 'Poisson':
                fit_result = fit_poisson(data_clean)
            elif distribution_type.value == 'Compare All':
                fit_result = compare_fits(data_clean, distributions=['gaussian', 'poisson'])
            
            progress_bar.value = 80
            
            # Display results
            with fitting_results_output:
                clear_output()
                print(f"üìà Distribution Fit: {variable_to_fit.value}")
                print("=" * 80)
                
                if distribution_type.value == 'Compare All' and isinstance(fit_result, dict) and 'best_fit' in fit_result:
                    print(f"\n‚úÖ Best Fit: {fit_result.get('best_distribution', 'N/A')}")
                    best = fit_result['best_fit']
                    print(f"\nBest Fit Parameters:")
                    for key, value in best.items():
                        if key not in ['fitted', 'n_samples', 'distribution'] and isinstance(value, (int, float)):
                            print(f"  {key}: {value:.6f}" if isinstance(value, float) else f"  {key}: {value}")
                    
                    # Plot comparison
                    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
                    
                    # Histogram with fits
                    axes[0].hist(data_clean, bins=min(50, len(data_clean)//5), density=True, 
                               alpha=0.7, color='skyblue', edgecolor='black', label='Data')
                    
                    # Plot fitted distributions
                    x_range = np.linspace(data_clean.min(), data_clean.max(), 200)
                    if 'fits' in fit_result:
                        for dist_name, dist_fit in fit_result['fits'].items():
                            if dist_fit.get('fitted', False):
                                if dist_name == 'gaussian':
                                    y = stats.norm.pdf(x_range, dist_fit['mean'], dist_fit['std'])
                                    axes[0].plot(x_range, y, label=f"Gaussian (R¬≤={dist_fit.get('r_squared', 0):.3f})", linewidth=2)
                    
                    axes[0].set_xlabel(variable_to_fit.value)
                    axes[0].set_ylabel('Density')
                    axes[0].set_title(f'Distribution Fits: {variable_to_fit.value}')
                    axes[0].legend()
                    axes[0].grid(True, alpha=0.3)
                    
                    # Q-Q plot
                    from scipy import stats as scipy_stats
                    scipy_stats.probplot(data_clean, dist='norm', plot=axes[1])
                    axes[1].set_title('Q-Q Plot (Gaussian)')
                    axes[1].grid(True, alpha=0.3)
                    
                    plt.tight_layout()
                    plt.show()
                    
                else:
                    # Single distribution fit
                    print(f"\n‚úÖ Fit Parameters:")
                    for key, value in fit_result.items():
                        if key not in ['fitted', 'n_samples', 'distribution', 'params'] and isinstance(value, (int, float)):
                            print(f"  {key}: {value:.6f}" if isinstance(value, float) else f"  {key}: {value}")
                    
                    # Evaluate quality
                    quality = evaluate_fit_quality(fit_result, data_clean)
                    print(f"\nüìä Fit Quality: {quality.get('interpretation', 'N/A')}")
                    
                    # Plot fit
                    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
                    
                    # Histogram with fit
                    hist, bins, _ = axes[0].hist(data_clean, bins=min(50, len(data_clean)//5), 
                                                density=True, alpha=0.7, color='skyblue', 
                                                edgecolor='black', label='Data')
                    
                    # Overlay fitted distribution
                    x_range = np.linspace(data_clean.min(), data_clean.max(), 200)
                    if fit_result.get('distribution') == 'gaussian':
                        from scipy import stats as scipy_stats
                        y = scipy_stats.norm.pdf(x_range, fit_result['mean'], fit_result['std'])
                        axes[0].plot(x_range, y, 'r-', linewidth=2, 
                                   label=f"Gaussian Fit (R¬≤={fit_result.get('r_squared', 0):.3f})")
                    
                    axes[0].set_xlabel(variable_to_fit.value)
                    axes[0].set_ylabel('Density')
                    axes[0].set_title(f'{distribution_type.value} Fit: {variable_to_fit.value}')
                    axes[0].legend()
                    axes[0].grid(True, alpha=0.3)
                    
                    # Q-Q plot
                    from scipy import stats as scipy_stats
                    scipy_stats.probplot(data_clean, dist='norm', plot=axes[1])
                    axes[1].set_title('Q-Q Plot')
                    axes[1].grid(True, alpha=0.3)
                    
                    plt.tight_layout()
                    plt.show()
            
            progress_bar.value = 100
            status_display.value = f"<p style='color: green;'>‚úÖ Distribution fit complete!</p>"
            
        except Exception as e:
            status_display.value = f"<p style='color: red;'>Error fitting distribution: {e}</p>"
            progress_bar.value = 0
            import traceback
            traceback.print_exc()
    
    # Attach callbacks
    data_source.observe(update_data_source, names='value')
    filter_volume_check.observe(update_volume_filter, names='value')
    filter_sphericity_check.observe(update_sphericity_filter, names='value')
    filter_spatial_check.observe(update_spatial_filter, names='value')
    filter_aspect_ratio_check.observe(update_aspect_ratio_filter, names='value')
    remove_edge_check.observe(update_edge_removal, names='value')
    load_button.on_click(load_data_callback)
    apply_filters_button.on_click(apply_filters_callback)
    analyze_properties_button.on_click(analyze_properties_callback)
    fit_button.on_click(fit_distribution_callback)
    
    print("‚úÖ Callback functions attached!")


‚úÖ Callback functions attached!


## 5. Display Interactive Dashboard


In [5]:
if WIDGETS_AVAILABLE:
    
    # Create data loading panel
    loading_panel = widgets.VBox([
        widgets.HTML("<h2>üìÇ Load Segmented Data</h2>"),
        data_source,
        file_path_text,
        HBox([
            coord_columns_text,
            value_column_text
        ]),
        file_format_dropdown,
        HBox([
            widgets.HTML("<b>Voxel Size:</b>"),
            voxel_size_x,
            voxel_size_y,
            voxel_size_z
        ]),
        HBox([load_button, data_info_display])
    ])
    
    # Create filtering panel
    filtering_panel = widgets.VBox([
        widgets.HTML("<h3>üîß Filtering Options</h3>"),
        widgets.HTML("<b>Volume Filter:</b>"),
        filter_volume_check,
        HBox([min_volume, max_volume]),
        widgets.HTML("<b>Sphericity Filter:</b>"),
        filter_sphericity_check,
        HBox([min_sphericity, max_sphericity]),
        widgets.HTML("<b>Spatial Bounds Filter:</b>"),
        filter_spatial_check,
        HBox([x_min, x_max]),
        HBox([y_min, y_max]),
        HBox([z_min, z_max]),
        widgets.HTML("<b>Aspect Ratio Filter:</b>"),
        filter_aspect_ratio_check,
        max_aspect_ratio,
        widgets.HTML("<b>Edge Removal:</b>"),
        remove_edge_check,
        edge_margin,
        apply_filters_button,
        filter_stats_display
    ])
    
    # Create properties panel
    properties_panel = widgets.VBox([
        widgets.HTML("<h3>üìä Object Properties</h3>"),
        analyze_properties_button,
        properties_output
    ])
    
    # Create statistical fitting panel
    fitting_panel = widgets.VBox([
        widgets.HTML("<h3>üìà Statistical Fitting</h3>"),
        variable_to_fit,
        distribution_type,
        fit_button,
        fitting_results_output
    ])
    
    # Create tabs
    results_tabs = Tab(children=[filtering_panel, properties_panel, fitting_panel])
    results_tabs.set_title(0, 'üîß Filtering')
    results_tabs.set_title(1, 'üìä Properties')
    results_tabs.set_title(2, 'üìà Statistics')
    
    # Create main dashboard
    dashboard = widgets.VBox([
        widgets.HTML("<h1>üîß Preprocessing and Data Cleaning</h1>"),
        loading_panel,
        widgets.HTML("<hr>"),
        widgets.HTML("<h2>üìä Analysis</h2>"),
        results_tabs,
        widgets.HTML("<hr>"),
        progress_bar,
        status_display
    ])
    
    # Display the dashboard
    display(dashboard)
    print("\n‚úÖ Dashboard displayed! Start preprocessing your data.")
    print("\nüí° Tips:")
    print("   1. Load segmented data (CSV/Excel or binary volume)")
    print("   2. Configure filter parameters")
    print("   3. Apply filters to clean data")
    print("   4. Analyze object properties")
    print("   5. Fit distributions to understand data characteristics")
    
else:
    print("‚ùå Cannot display dashboard - ipywidgets not available")


VBox(children=(HTML(value='<h1>üîß Preprocessing and Data Cleaning</h1>'), VBox(children=(HTML(value='<h2>üìÇ Load‚Ä¶


‚úÖ Dashboard displayed! Start preprocessing your data.

üí° Tips:
   1. Load segmented data (CSV/Excel or binary volume)
   2. Configure filter parameters
   3. Apply filters to clean data
   4. Analyze object properties
   5. Fit distributions to understand data characteristics


## 6. Summary

### What We Learned

1. **Loading Segmented Data**:
   - CSV/Excel files with coordinate columns
   - Binary volume files
   - Automatic object detection

2. **Filtering Objects**:
   - Volume filtering (min/max)
   - Sphericity filtering
   - Spatial bounds filtering
   - Aspect ratio filtering
   - Edge object removal

3. **Object Property Analysis**:
   - Volume, surface area, sphericity
   - Aspect ratios
   - Spatial coordinates
   - Comprehensive property tables

4. **Statistical Fitting**:
   - Gaussian distribution fitting
   - Poisson distribution fitting
   - Distribution comparison
   - Goodness of fit evaluation

### Next Steps

- **Notebook 03**: Core Analysis - Morphology and Porosity
  - Filament diameter estimation
  - Channel width analysis
  - Porosity distribution

- **Notebook 04**: Experimental Analysis
  - Flow analysis
  - Thermal analysis
  - Energy conversion

### Resources

- [Framework Documentation](../docs/README.md)
- [Preprocessing Module](../docs/modules.md#preprocessing-modules)
- [Statistical Fitting](../docs/statistical_fitting.md)
