# EEG Memory Recognition Analysis - ICA Pipeline

This notebook implements Independent Component Analysis (ICA) for artifact rejection.

## ICA Strategy
- **Manual ICA**: 1 subject (sub-003) - Human review of components
- **Automated ICA**: 9 subjects - ICLabel for automatic artifact classification

## ICA Process
1. **Preparation**: High-pass filter data at 1 Hz for ICA stability
2. **ICA Fitting**: FastICA algorithm to decompose signals
3. **Component Selection**: Manual vs. Automated
4. **Application**: Remove selected components from data
5. **Save**: Save cleaned data for epoching

## 1. Setup and Imports

In [None]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import mne
import yaml
import logging
from tqdm import tqdm
from pathlib import Path

notebook_dir = Path.cwd()
project_root = (notebook_dir / "..").resolve()
src_dir = project_root / "src"

if str(src_dir) not in sys.path:
    sys.path.insert(0, str(src_dir))

try:
    from utils.pathing import ensure_src_on_path, project_paths
    ensure_src_on_path()
    from utils.data_loader import EEGDataLoader
    from utils.ica_plotting import plot_component_comprehensive
    from preprocessing.quality_assessment import EEGQualityAssessment
    from preprocessing.ica_pipeline import ICAPipeline
    print("‚úÖ Imports successful!")
except ImportError as e:
    print(f"‚ö†Ô∏è Import note: {e}")
    print("Will define pipeline class in next cell")

print(f"Project root: {project_root}")

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
sns.set_style("whitegrid")

print("‚úì Setup complete")
print(f"MNE version: {mne.__version__}")

‚ö†Ô∏è Import note: cannot import name 'ICAPipeline' from 'preprocessing.ica_pipeline' (/Users/leeyelim/Documents/EEG/src/preprocessing/ica_pipeline.py)
Will define pipeline class in next cell
Project root: /Users/leeyelim/Documents/EEG
‚úì Setup complete
MNE version: 1.8.0


## 2. Load Configuration

In [2]:
config_path = project_root / 'config' / 'analysis_config.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

selected_subjects = config['subjects']['selected']
manual_ica_subject = config['subjects']['manual_ica_subject']

print("‚úÖ Configuration loaded!")
print(f"- Selected subjects: {len(selected_subjects)}")
print(f"- Manual ICA subject: {manual_ica_subject}")

data_loader = EEGDataLoader(config_path=str(config_path))
print("\n‚úì Data loader initialized")

2025-11-12 22:46:14,557 - INFO - EEGDataLoader initialized
  Project root: /Users/leeyelim/Documents/EEG
  Config: /Users/leeyelim/Documents/EEG/config/analysis_config.yaml
  Raw dir: /Users/leeyelim/Documents/EEG/ds002680 (exists=True)
  Preprocessed dir: /Users/leeyelim/Documents/EEG/data/preprocessed (exists=True)
  Derivatives dir: /Users/leeyelim/Documents/EEG/data/derivatives (exists=True)


‚úÖ Configuration loaded!
- Selected subjects: 10
- Manual ICA subject: sub-003

‚úì Data loader initialized


## 3. ICA Pipeline

**Option 1**: Use the `src/preprocessing/ica_pipeline.py` module (recommended)  
**Option 2**: Implement ICA steps manually in this notebook

For this analysis, we'll iteratively run ICA on every run of the manual subject with interactive review, using the status/selection tools below to avoid re-processing completed runs.

### 3A. Run-Level Status & Selection

Use the cell below to list every session/run for the manual ICA subject, check which runs are already finished, and select the next pending run to process. If you want a different pending run, set `RUN_SELECTION_INDEX` in that cell before re-running it. After saving a run, re-run the status cell to refresh the table and move to the next run.


In [3]:
print(f"üîé Checking manual ICA runs for {manual_ica_subject}")

preprocessed_root = project_root / 'data' / 'preprocessed'
after_rereferencing_dir = preprocessed_root / 'after_rereferencing'
after_ica_dir = preprocessed_root / 'after_ica'
manual_after_reref_dir = after_rereferencing_dir / manual_ica_subject
manual_after_ica_dir = after_ica_dir / manual_ica_subject

current_run_info = None
manual_run_records = []
pending_runs = []
completed_runs = []

if not manual_after_reref_dir.exists():
    print(f"‚ùå Directory not found: {manual_after_reref_dir}")
else:
    for session_dir in sorted(manual_after_reref_dir.glob('ses-*')):
        fif_paths = sorted(session_dir.glob('*_preprocessed_after_rereferencing.fif'))
        for raw_path in fif_paths:
            file_stem = raw_path.stem
            run_token = next((part for part in file_stem.split('_') if part.startswith('run-')), None)
            run_id = run_token.split('-')[1] if run_token and '-' in run_token else (run_token or 'unknown')
            try:
                run_int = int(run_id)
            except (TypeError, ValueError):
                run_int = None
            output_session_dir = manual_after_ica_dir / session_dir.name
            cleaned_path = output_session_dir / raw_path.name.replace('after_rereferencing', 'ica_cleaned')
            annotated_path = output_session_dir / raw_path.name.replace('after_rereferencing', 'ica_cleaned_annotated')
            record = {
                'session': session_dir.name,
                'run': run_id,
                'run_int': run_int,
                'raw_path': raw_path,
                'cleaned_path': cleaned_path,
                'annotated_path': annotated_path,
                'ica_cleaned_exists': cleaned_path.exists(),
                'annotated_exists': annotated_path.exists(),
            }
            manual_run_records.append(record)

    manual_run_records = sorted(
        manual_run_records,
        key=lambda rec: (
            rec['session'],
            rec['run_int'] if rec['run_int'] is not None else float('inf'),
            rec['run']
        )
    )

    if manual_run_records:
        status_rows = []
        for rec in manual_run_records:
            if rec['annotated_exists']:
                status = "‚úÖ ICA + annotations saved"
            elif rec['ica_cleaned_exists']:
                status = "üü° ICA cleaned (annotations missing)"
            else:
                status = "‚è≥ Pending"
            status_rows.append({
                'session': rec['session'],
                'run': rec['run'],
                'status': status,
                'raw_file': rec['raw_path'].name
            })
        status_df = pd.DataFrame(status_rows)
        display(status_df)

        pending_runs = [rec for rec in manual_run_records if not rec['annotated_exists']]
        completed_runs = [rec for rec in manual_run_records if rec['annotated_exists']]

        print(f"Total runs found: {len(manual_run_records)}")
        print(f"‚úÖ Completed runs: {len(completed_runs)}")
        print(f"‚è≥ Pending runs: {len(pending_runs)}")

        try:
            RUN_SELECTION_INDEX
        except NameError:
            RUN_SELECTION_INDEX = 0

        if pending_runs:
            if RUN_SELECTION_INDEX < 0 or RUN_SELECTION_INDEX >= len(pending_runs):
                print(f"‚ö†Ô∏è RUN_SELECTION_INDEX {RUN_SELECTION_INDEX} is out of range for {len(pending_runs)} pending runs. Using 0.")
                RUN_SELECTION_INDEX = 0
            current_run_info = pending_runs[RUN_SELECTION_INDEX]
            print(f"\n‚û°Ô∏è Next run to process: {current_run_info['session']} / run-{current_run_info['run']}")
            print("   (Update RUN_SELECTION_INDEX in this cell to choose a different pending run.)")
            print("   Re-run this cell after saving a run to refresh the status table.")
        else:
            print("\nüéâ All runs already have ICA and annotated contaminated segments saved!")
    else:
        print(f"‚ö†Ô∏è No preprocessed runs found in {manual_after_reref_dir}")


üîé Checking manual ICA runs for sub-003


Unnamed: 0,session,run,status,raw_file
0,ses-01,1,‚úÖ ICA + annotations saved,sub-003_ses-01_run-1_preprocessed_after_rerefe...
1,ses-01,2,‚úÖ ICA + annotations saved,sub-003_ses-01_run-2_preprocessed_after_rerefe...
2,ses-01,3,‚úÖ ICA + annotations saved,sub-003_ses-01_run-3_preprocessed_after_rerefe...
3,ses-01,4,‚úÖ ICA + annotations saved,sub-003_ses-01_run-4_preprocessed_after_rerefe...
4,ses-01,5,‚úÖ ICA + annotations saved,sub-003_ses-01_run-5_preprocessed_after_rerefe...
5,ses-01,6,‚úÖ ICA + annotations saved,sub-003_ses-01_run-6_preprocessed_after_rerefe...
6,ses-01,7,‚úÖ ICA + annotations saved,sub-003_ses-01_run-7_preprocessed_after_rerefe...
7,ses-01,8,‚úÖ ICA + annotations saved,sub-003_ses-01_run-8_preprocessed_after_rerefe...
8,ses-01,9,‚úÖ ICA + annotations saved,sub-003_ses-01_run-9_preprocessed_after_rerefe...
9,ses-01,10,‚úÖ ICA + annotations saved,sub-003_ses-01_run-10_preprocessed_after_reref...


Total runs found: 25
‚úÖ Completed runs: 24
‚è≥ Pending runs: 1

‚û°Ô∏è Next run to process: ses-02 / run-12
   (Update RUN_SELECTION_INDEX in this cell to choose a different pending run.)
   Re-run this cell after saving a run to refresh the status table.


In [21]:
# Load preprocessed data for manual ICA subject
print(f"\nüî¨ Loading preprocessed data for {manual_ica_subject}...")

current_run_label = None

if 'current_run_info' not in globals() or current_run_info is None:
    print(f"‚úÖ All runs for {manual_ica_subject} already have ICA cleaning and annotated contaminated segments.")
    raw = None
    raw_file = None
    raw_ica = None
    raw_cleaned = None
    ica = None
    bad_components = []
else:
    raw_file = current_run_info['raw_path']
    session_label = current_run_info['session']
    run_label = current_run_info['run']
    current_run_label = f"{manual_ica_subject}_{session_label}_run-{run_label}"
    print(f"Processing session {session_label}, run-{run_label}")
    print(f"Loading: {raw_file}")
    raw = mne.io.read_raw_fif(str(raw_file), preload=True, verbose=False)
    print(f"‚úÖ Loaded: {raw.info['nchan']} channels, {raw.times[-1]:.1f}s")
    # Reset per-run intermediates to avoid carrying over previous results
    raw_ica = None
    raw_cleaned = None
    ica = None
    bad_components = []


üî¨ Loading preprocessed data for sub-003...
‚úÖ All runs for sub-003 already have ICA cleaning and annotated contaminated segments.


## 4. Prepare Data for ICA

In [None]:
if raw is not None:
    # High-pass filter for ICA (1 Hz)
    print("\nüîÑ Preparing data for ICA (1 Hz high-pass filter)...")
    raw_ica = raw.copy()
    raw_ica.filter(l_freq=1, h_freq=None, picks='eeg', 
                   method='iir', iir_params=dict(order=4, ftype='butter'),
                   verbose=False)
    print("‚úÖ Data prepared for ICA")
else:
    print("‚ö†Ô∏è Skipping: No data loaded")

‚ö†Ô∏è Skipping: No data loaded


## 5. Fit ICA

In [None]:
if raw is not None:
    # Fit ICA
    print("\nüîÑ Fitting ICA...")
    n_components = min(50, len(raw_ica.ch_names) - 1)
    
    ica = mne.preprocessing.ICA(
        n_components=n_components,
        method='fastica',
        random_state=42,
        max_iter=800,
        verbose=False
    )
    
    ica.fit(raw_ica, picks='eeg', verbose=False)
    print(f"‚úÖ ICA completed with {n_components} components")
else:
    print("‚ö†Ô∏è Skipping: No data loaded")

‚ö†Ô∏è Skipping: No data loaded


## 6. ICA component topographical map preview

In [24]:
if raw is not None:
    print("\nüìä ICA Component Review Interface")
    print("=" * 60)
    print("\nüîç COMPREHENSIVE COMPONENT VISUALIZATION")
    print("\nFor each component in the interactive browser, you will see:")
    print("  1. üó∫Ô∏è  Scalp topography (spatial distribution with variance explained)")
    print("  2. üìà Component time series (2.5s preview with event markers)")
    print("  3. üåä Power spectrum 3-40 Hz (Log Power Spectral Density 10*log10(¬µV¬≤/Hz))")
    print("  4. üåä Power spectrum 3-80 Hz (Log Power Spectral Density 10*log10(¬µV¬≤/Hz))")
    print("  5. üìä ERP image heatmap (trial-by-trial activity, RMS ¬µVolts per channel)")
    print("  6. üìà Average ERP (trial-averaged activity, ¬µV units)")
    
    print("\n" + "=" * 60)
    print("üëÅÔ∏è ARTIFACT IDENTIFICATION GUIDE:")
    print("=" * 60)
    print("\nüî¥ EYE BLINKS:")
    print("   - Topography: Strong frontal (FP1, FP2)")
    print("   - Time course: Large spikes at regular intervals")
    print("   - Power: Strong low frequency (<5 Hz)")
    print("   - ERP image: Vertical bands (synchronized events)")
    
    print("\nüíô EYE MOVEMENTS:")
    print("   - Topography: Lateral frontal (FP1/FP2 asymmetric)")
    print("   - Time course: Slower deflections")
    print("   - Power: Low frequency (<3 Hz)")
    
    print("\n‚ù§Ô∏è HEART/ECG:")
    print("   - Topography: Temporal or diffuse")
    print("   - Time course: Regular peaks ~1 Hz (60-80 bpm)")
    print("   - Power: Peak at ~1 Hz")
    print("   - ERP image: Diagonal stripes")
    
    print("\nüí™ MUSCLE ARTIFACTS:")
    print("   - Topography: Temporal, occipital edges")
    print("   - Time course: Irregular bursts")
    print("   - Power: Broad high frequency (>20 Hz)")
    print("   - ERP image: Scattered patches")
    
    print("\n‚ö° LINE NOISE:")
    print("   - Topography: Diffuse/widespread")
    print("   - Time course: Continuous")
    print("   - Power: Sharp peak at 50 Hz")
    
    print("\nüß† BRAIN SIGNALS (keep these!):")
    print("   - Topography: Focal, sensible locations")
    print("   - Time course: Smooth, varied")
    print("   - Power: Strong alpha (8-12 Hz) or theta (4-7 Hz)")
    print("   - ERP image: Structured patterns")
    
    print("\n" + "=" * 60)
    print("‚úÖ Ready for interactive component review!")
    print("   ‚Üí Use the Interactive Component Browser (Section 6C) below")
    print("=" * 60)
    
else:
    print("‚ö†Ô∏è Skipping: No data loaded")

‚ö†Ô∏è Skipping: No data loaded


## 6B. Review Individual Components (Optional)

Use this cell to review specific components in detail. Modify the `components_to_review` list to focus on specific components.


In [25]:
if raw is not None:
    print("\nüìä ICA Component Variance Information")
    print("=" * 60)
    
    # Show variance explained by each component
    if hasattr(ica, 'pca_explained_variance_'):
        print("\nüìà Variance Explained by Each Component:")
        print("-" * 50)
        
        total_var = 0
        for i in range(n_components):
            var_explained = ica.pca_explained_variance_[i]
            total_var += var_explained
            print(f"Component {i:2d}: {var_explained:6.2%}")
        
        print("-" * 50)
        print(f"Total variance explained: {total_var:6.2%}")
        print(f"Components extracted: {n_components}")
        
        # Show top 10 components by variance
        print(f"\nüèÜ Top 10 Components by Variance:")
        print("-" * 50)
        component_vars = [(i, ica.pca_explained_variance_[i]) for i in range(n_components)]
        component_vars.sort(key=lambda x: x[1], reverse=True)
        
        for i, (comp_idx, var) in enumerate(component_vars[:10]):
            print(f"{i+1:2d}. Component {comp_idx:2d}: {var:6.2%}")
    else:
        print("‚ö†Ô∏è Variance information not available")
    
    print("\n" + "=" * 60)
    print("‚úÖ Variance information displayed!")
    print("   ‚Üí Use the Interactive Component Browser (Section 6C) for detailed review")
    print("=" * 60)
    
else:
    print("‚ö†Ô∏è Skipping: No data loaded")


‚ö†Ô∏è Skipping: No data loaded


## 6C. Interactive Component Browser (Optional - Advanced)

This cell creates an interactive widget to browse components with a slider.

In [None]:
if raw is not None:
    try:
        from ipywidgets import interact, IntSlider, Button, VBox, HBox, Label, Output, HTML
        from IPython.display import display, clear_output
        
        # Initialize bad_components as a global variable
        bad_components = []
        
        print("üéÆ Interactive Component Browser with Categorization")
        print("=" * 60)
        print("Use the slider to browse all 50 components and buttons to categorize them")
        print(f"üìä Reviewing all {n_components} ICA components")
        
        print("\nüìã COMPONENT CATEGORIZATION GUIDE:")
        print("üß† Brain - KEEP (neural activity)")
        print("üí™ Muscle - REMOVE (muscle artifacts)")
        print("üëÅÔ∏è Eye - REMOVE (eye blinks/movements)")
        print("‚ö° Line - REMOVE (line noise/electrical interference)")
        print("üì° Channel - REMOVE (channel-specific artifacts)")
        print("‚ùî Other - REMOVE (other artifacts)")
        print()
        
        # Track component categories
        component_categories = {}  # {comp_idx: category}
        categories = ['Brain', 'Muscle', 'Eye', 'Line', 'Channel', 'Other']
        
        # Define which categories are marked as bad components (artifacts)
        bad_categories = ['Muscle', 'Eye', 'Line', 'Channel', 'Other']
        good_categories = ['Brain']
        
        # Load events for comprehensive plotting
        # Try to get events from the raw data that matches what we're analyzing
        events = None
        try:
            # Method 1: Try to find events in the raw object's annotations
            if hasattr(raw, 'annotations') and len(raw.annotations) > 0:
                events, event_id = mne.events_from_annotations(raw)
                print(f"‚úì Loaded {len(events)} events from raw.annotations")
            else:
                # Method 2: Try to load from data_loader
                events = data_loader.load_events()
                if events is not None:
                    if isinstance(events, tuple):
                        events_array, event_id = events
                        print(f"‚úì Loaded {len(events_array)} events from data_loader")
                    else:
                        print(f"‚úì Loaded {len(events)} events from data_loader")
        except Exception as e:
            print(f"‚ö†Ô∏è Could not load events: {e}")
            print("   ERP images and Average ERP will not be available")
            events = None
        
        # Create output widget for plots
        plot_output = Output()
        
        # Create status label
        status_label = HTML(value="<b>Status:</b> Select a component to begin")
        
        # Current component tracker
        current_comp = {'idx': 0}
        
        def update_plot(comp_idx):
            """Update the plot for the given component"""
            with plot_output:
                clear_output(wait=True)
                try:
                    # Use comprehensive plotting function
                    fig = plot_component_comprehensive(ica, raw_ica, comp_idx, 
                                                      events=events, data_loader=data_loader)
                    plt.show()
                except Exception as e:
                    print(f"Error plotting component {comp_idx}: {e}")
                    import traceback
                    traceback.print_exc()
        
        def update_status():
            """Update the status display"""
            comp_idx = current_comp['idx']
            category = component_categories.get(comp_idx, 'Uncategorized')
            
            # Count categories
            cat_counts = {cat: 0 for cat in categories}
            uncategorized = n_components
            for cat in component_categories.values():
                if cat in cat_counts:
                    cat_counts[cat] += 1
                    uncategorized -= 1
            
            # Create status HTML
            if category == 'Uncategorized':
                cat_color = 'gray'
                cat_icon = '‚ùì'
            elif category == 'Brain':
                cat_color = 'green'
                cat_icon = 'üß†'
            elif category == 'Muscle':
                cat_color = 'orange'
                cat_icon = 'üí™'
            elif category == 'Eye':
                cat_color = 'blue'
                cat_icon = 'üëÅÔ∏è'
            elif category == 'Line':
                cat_color = 'red'
                cat_icon = '‚ö°'
            elif category == 'Channel':
                cat_color = 'purple'
                cat_icon = 'üì°'
            else:  # Other
                cat_color = 'brown'
                cat_icon = '‚ùî'
            
            status_html = f"""
            <div style="padding: 10px; background-color: #f0f0f0; border-radius: 5px; margin: 10px 0;">
                <b>Component {comp_idx}:</b> 
                <span style="color: {cat_color}; font-weight: bold;">{cat_icon} {category}</span>
                <br><br>
                <b>Summary:</b><br>
                üß† Brain: {cat_counts['Brain']} | 
                üí™ Muscle: {cat_counts['Muscle']} | 
                üëÅÔ∏è Eye: {cat_counts['Eye']}<br>
                ‚ö° Line: {cat_counts['Line']} | 
                üì° Channel: {cat_counts['Channel']} | 
                ‚ùî Other: {cat_counts['Other']}<br>
                ‚ùì Uncategorized: {uncategorized}
            </div>
            """
            status_label.value = status_html
        
        def categorize_component(category):
            """Categorize the current component"""
            comp_idx = current_comp['idx']
            component_categories[comp_idx] = category
            update_status()
            print(f"‚úì Component {comp_idx} categorized as: {category}")
        
        def on_slider_change(change):
            """Handle slider value change"""
            current_comp['idx'] = change['new']
            update_plot(change['new'])
            update_status()
        
        # Create slider
        slider = IntSlider(min=0, max=n_components-1, step=1, value=0, 
                           description='Component:', continuous_update=False,
                           style={'description_width': '100px'},
                           layout={'width': '600px'})
        slider.observe(on_slider_change, names='value')
        
        # Create category buttons
        button_style_map = {
            'Brain': 'success',      # green
            'Muscle': 'warning',     # orange
            'Eye': 'info',           # blue
            'Line': 'danger',        # red
            'Channel': '',           # default
            'Other': ''              # default
        }
        
        buttons = []
        for category in categories:
            button = Button(description=category, 
                           button_style=button_style_map.get(category, ''),
                           layout={'width': '100px'})
            button.on_click(lambda b, cat=category: categorize_component(cat))
            buttons.append(button)
        
        # Create reset button
        def reset_category(b):
            comp_idx = current_comp['idx']
            if comp_idx in component_categories:
                del component_categories[comp_idx]
                update_status()
                print(f"‚úì Category reset for component {comp_idx}")
        
        reset_button = Button(description='Reset', button_style='',
                             layout={'width': '100px'})
        reset_button.on_click(reset_category)
        
        # Create export button
        def export_categories(b):
            global bad_components  # Allow modification of global variable
            
            print("\n" + "=" * 60)
            print("üìã COMPONENT CATEGORIZATION SUMMARY")
            print("=" * 60)
            
            # Group components by category
            by_category = {cat: [] for cat in categories}
            by_category['Uncategorized'] = []
            
            for i in range(n_components):
                cat = component_categories.get(i, 'Uncategorized')
                by_category[cat].append(i)
            
            for cat in categories + ['Uncategorized']:
                comps = by_category[cat]
                if comps:
                    print(f"\n{cat}: {comps}")
            
            # Generate bad components list (all categorized as artifacts)
            bad_comps = []
            for i in range(n_components):
                cat = component_categories.get(i, 'Uncategorized')
                if cat in bad_categories:
                    bad_comps.append(i)
            
            bad_components = sorted(bad_comps)  # Update global variable
            
            print("\n" + "=" * 60)
            print("üí° BAD COMPONENTS FOR REMOVAL:")
            print("=" * 60)
            print(f"bad_components = {bad_components}")
            print(f"\n‚úÖ AUTOMATIC UPDATE:")
            print(f"   ‚Ä¢ The 'bad_components' variable has been automatically updated")
            print(f"   ‚Ä¢ {len(bad_components)} components marked for removal")
            print(f"   ‚Ä¢ Components: {bad_components}")
            print(f"\nüìã NEXT STEPS:")
            print("1. Proceed to Section 7: Select Components to Remove")
            print("2. Run Section 7 to apply the component removal")
            print("3. The bad_components list will be automatically used")
            print("=" * 60)
        
        export_button = Button(description='Export Summary', button_style='primary',
                             layout={'width': '150px'})
        export_button.on_click(export_categories)
        
        # Layout
        button_row1 = HBox(buttons[:3])
        button_row2 = HBox(buttons[3:] + [reset_button])
        controls = VBox([
            slider,
            HTML(value="<b>Categorize this component:</b>"),
            button_row1,
            button_row2,
            export_button,
            status_label
        ])
        
        # Display interface
        display(controls)
        display(plot_output)
        
        # Initial plot
        update_plot(0)
        update_status()
        
        print("\n" + "=" * 60)
        print("üìù How to use:")
        print("   1. Use slider to navigate through components")
        print("   2. Click category buttons to classify each component")
        print("   3. Click 'Export Summary' when done to get bad_components list")
        print("=" * 60)
        
    except ImportError as e:
        print(f"‚ö†Ô∏è ipywidgets not available: {e}")
        print("   Install with: pip install ipywidgets")
        print("   Or use Cell 15 (6B) to review components manually")
else:
    print("‚ö†Ô∏è Skipping: No data loaded")

‚ö†Ô∏è Skipping: No data loaded


## 7. Select Components to Remove

**AUTOMATIC MODE**: If you used the Interactive GUI and clicked "Export Summary", the `bad_components` variable has been automatically set. Just run this cell to apply the removal.

**MANUAL MODE**: If you want to manually specify components, modify the `bad_components = []` line in the cell below.

### Component Categories:
- üß† **Brain** - KEEP (neural activity)
- üí™ **Muscle** - REMOVE (muscle artifacts)
- üëÅÔ∏è **Eye** - REMOVE (eye blinks/movements)
- ‚ö° **Line** - REMOVE (line noise/electrical interference)
- üì° **Channel** - REMOVE (channel-specific artifacts)
- ‚ùî **Other** - REMOVE (other artifacts)


In [None]:
if raw is not None:
    # ============================================================
    # COMPONENT SELECTION
    # ============================================================
    # This cell will use bad_components from the Interactive GUI if available
    # Or you can manually define it here
    # ============================================================
    
    # Check if bad_components was set by Interactive GUI
    if 'bad_components' not in globals() or bad_components is None:
        # Not set by GUI - allow manual input
        bad_components = []  # <-- MANUALLY set components here if not using GUI
    
    # Validation and feedback
    print("=" * 60)
    print("üìã COMPONENT SELECTION SUMMARY")
    print("=" * 60)
    
    if len(bad_components) == 0:
        print("\n‚ö†Ô∏è  No components selected for removal")
        print("    Please review the plots above and update bad_components list")
        print("\n    Common artifacts to look for:")
        print("    - Eye blinks (frontal, regular spikes)")
        print("    - Eye movements (lateral frontal)")
        print("    - Heart/ECG (regular ~1 Hz)")
        print("    - Muscle (high frequency, temporal)")
        print("    - Line noise (50 Hz peak)")
    else:
        # Validate component indices
        invalid = [c for c in bad_components if c >= n_components or c < 0]
        if invalid:
            print(f"\n‚ùå ERROR: Invalid component indices: {invalid}")
            print(f"   Valid range: 0 to {n_components-1}")
        else:
            ica.exclude = bad_components
            pct_removed = (len(bad_components) / n_components) * 100
            
            print(f"\n‚úÖ Selected {len(bad_components)} components for removal: {sorted(bad_components)}")
            print(f"   ({pct_removed:.1f}% of total components)")
            
            if pct_removed > 50:
                print("\n   ‚ö†Ô∏è  WARNING: Removing >50% of components")
                print("      Consider data quality - may need to exclude this recording")
            elif pct_removed > 30:
                print("\n   ‚ö†Ô∏è  Note: Removing >30% of components")
                print("      Ensure these are truly artifacts")
            else:
                print("\n   ‚úì Reasonable number of components to remove")
            
            print("\nüìä Next step: Run Cell 19 to apply ICA and save cleaned data")
    
    print("=" * 60)
    
else:
    print("‚ö†Ô∏è Skipping: No data loaded")


‚ö†Ô∏è Skipping: No data loaded


## 8. Apply ICA and Save

In [None]:
if raw is not None and len(bad_components) > 0:
    # Apply ICA to remove bad components
    print("\nüîÑ Applying ICA...")
    raw_cleaned = raw.copy()
    ica.apply(raw_cleaned)
    print("‚úÖ ICA applied")
    
    # Save cleaned data
    ica_dir = project_root / 'data' / 'preprocessed' / 'after_ica'
    subject_ica_dir = ica_dir / manual_ica_subject / raw_file.parent.name
    subject_ica_dir.mkdir(parents=True, exist_ok=True)
    
    cleaned_filename = raw_file.name.replace('after_rereferencing', 'ica_cleaned')
    cleaned_path = subject_ica_dir / cleaned_filename
    
    raw_cleaned.save(str(cleaned_path), overwrite=True, verbose=False)
    print(f"üíæ Saved: {cleaned_path}")
    
    # Save ICA object
    ica_filename = cleaned_filename.replace('.fif', '_ica.fif')
    ica_path = subject_ica_dir / ica_filename
    ica.save(str(ica_path), overwrite=True)
    print(f"üíæ Saved ICA object: {ica_path}")
    
elif raw is not None:
    print("‚ö†Ô∏è Skipping save: No components selected")
else:
    print("‚ö†Ô∏è Skipping: No data loaded")

‚ö†Ô∏è Skipping: No data loaded


### IMPORTANT: Enable Inline Interactive Plotting

Run the cell below to activate the notebook-friendly browser backend. This keeps all ICA review plots embedded directly in the notebook output while preserving interactivity.


### 9. Backend Setup for Visualization

In [None]:
# Configure plotting backends for ICA review and contaminated-section rejection
from mne.viz import set_browser_backend
from IPython import get_ipython

print("üîß Configuring plotting backends...")

ip = get_ipython()
COMPONENT_BROWSER_BACKEND = 'matplotlib'
ANNOTATION_BROWSER_BACKEND = 'qt'

if ip is not None:
    try:
        ip.run_line_magic('matplotlib', 'inline')
        print("‚úÖ Component browser: matplotlib inline backend")
    except Exception as err:
        print(f"‚ö†Ô∏è Could not switch to inline backend: {type(err).__name__}: {err}")
else:
    print("‚ö†Ô∏è IPython shell not detected; defaulting to inline backend")

try:
    set_browser_backend('matplotlib')
except Exception as err:
    print(f"‚ö†Ô∏è set_browser_backend('matplotlib') failed: {err}")

print("   Contaminated-section reviewer:")
print("   Before launching raw_cleaned.plot(...), run %matplotlib qt (or tk) in a new cell")
print("   The notebook no longer forces the ipympl widget backend so Qt windows can open")

globals()['COMPONENT_BROWSER_BACKEND'] = COMPONENT_BROWSER_BACKEND
globals()['ANNOTATION_BROWSER_BACKEND'] = ANNOTATION_BROWSER_BACKEND


üîß Configuring plotting backends...
‚úÖ Component browser: matplotlib inline backend
   Contaminated-section reviewer:
   Before launching raw_cleaned.plot(...), run %matplotlib qt (or tk) in a new cell
   The notebook no longer forces the ipympl widget backend so Qt windows can open


## 9. Mark and Reject Contaminated Sections

After ICA removes artifact components, some contaminated time segments may still remain. This section allows you to:
- Visually inspect the cleaned data
- Mark time periods with remaining artifacts
- Reject contaminated sections before epoching

**Why this step is important:**
- ICA removes systematic artifacts (eye blinks, heartbeat, etc.)
- But transient artifacts (sudden movements, electrode pops) may remain
- Manual inspection ensures highest data quality
- Better to reject contaminated sections than include bad trials in ERP averaging


### 9A. Load ICA-Cleaned Data

In [None]:
from pathlib import Path

print("=== Load ICA-Cleaned Data ===")

cleaned_path = None
if 'current_run_info' in globals() and current_run_info:
    candidate = current_run_info.get('cleaned_path')
    if candidate:
        candidate_path = Path(candidate)
        if candidate_path.exists():
            cleaned_path = candidate_path

if cleaned_path is None:
    cached = globals().get('loaded_cleaned_path')
    if cached:
        cached_path = Path(cached)
        if cached_path.exists():
            cleaned_path = cached_path

if cleaned_path is None:
    print("‚ö†Ô∏è No ICA-cleaned file found for the current run.")
    print("   Run Section 3A to choose a run, then Section 8 to apply ICA if needed.")
    raise SystemExit

cleaned_path = Path(cleaned_path)
previous_path = globals().get('loaded_cleaned_path')
needs_reload = (
    'raw_cleaned' not in globals()
    or raw_cleaned is None
    or (previous_path and Path(previous_path) != cleaned_path)
)

if needs_reload:
    print(f"üîÅ Loading {cleaned_path.name}")
    raw_cleaned = mne.io.read_raw_fif(str(cleaned_path), preload=True, verbose=False)
    globals()['raw_cleaned'] = raw_cleaned
else:
    raw_cleaned = globals()['raw_cleaned']
    print(f"‚úÖ Using in-memory data from {cleaned_path.name}")

globals()['loaded_cleaned_path'] = cleaned_path

if 'current_run_label' not in globals() or current_run_label is None:
    session = cleaned_path.parent.name
    run_token = cleaned_path.stem.split('_run-')[-1]
    run_id = run_token.split('_')[0]
    current_run_label = f"{manual_ica_subject}_{session}_run-{run_id}"
    globals()['current_run_label'] = current_run_label

print()
print("Next steps:")
print("1. In a new cell run: %matplotlib qt   (or %matplotlib tk if Qt is unavailable)")
print("2. Then run: raw_cleaned.plot(n_channels=30, duration=10.0, scalings='auto', title='ICA-Cleaned Data ‚Äì Mark Bad Sections', block=True)")
print("3. Annotate BAD segments in the pop-up window, close it, then run Section 9B.")


=== Load ICA-Cleaned Data ===
‚ö†Ô∏è No ICA-cleaned file found for the current run.
   Run Section 3A to choose a run, then Section 8 to apply ICA if needed.


SystemExit: 

### 9A-2. Open the window

In [None]:
%gui qt
%matplotlib qt
raw_cleaned.plot(n_channels=30, duration=10.0, scalings='auto',
                 title='ICA-Cleaned Data ‚Äì Mark Bad Sections',
                 block=True)

AttributeError: 'NoneType' object has no attribute 'plot'

### 9B. Save Annotated Run

In [16]:
from pathlib import Path

if 'raw_cleaned' not in globals() or raw_cleaned is None:
    print("‚ö†Ô∏è No cleaned data in memory. Run Section 9 first.")
    raise SystemExit

cleaned_path = globals().get('loaded_cleaned_path')
if not cleaned_path or not Path(cleaned_path).exists():
    print("‚ö†Ô∏è Missing reference to the ICA-cleaned file. Re-run Section 9 before saving.")
    raise SystemExit

cleaned_path = Path(cleaned_path)
annotated_path = cleaned_path.with_name(cleaned_path.name.replace('ica_cleaned', 'ica_cleaned_annotated'))

print("üìã Review Marked Bad Segments")
print("=" * 60)
if 'current_run_label' in globals() and current_run_label:
    print(f"Run: {current_run_label}")

bad_annots = [annot for annot in raw_cleaned.annotations if 'bad' in annot['description'].lower()]

if bad_annots:
    print("‚úÖ Found {len(bad_annots)} BAD segments marked:")
    print("-" * 60)
    total_duration = 0.0
    for idx, annot in enumerate(bad_annots, start=1):
        onset = annot['onset']
        duration = annot['duration']
        total_duration += duration
        print(f"  {idx}. Time: {onset:.2f}s ‚Äì {onset + duration:.2f}s (Duration: {duration:.2f}s)")

    print("-" * 60)
    data_duration = raw_cleaned.times[-1]
    pct_rejected = (total_duration / data_duration) * 100
    print(f"üìä Total contaminated time: {total_duration:.2f} seconds ({pct_rejected:.1f}% of recording)")
    if pct_rejected > 20:
        print("   ‚ö†Ô∏è  WARNING: >20% of data marked as bad ‚Äì evaluate data quality")
    elif pct_rejected > 10:
        print("   ‚ö†Ô∏è  Note: >10% of data marked as bad ‚Äì higher than typical")
    else:
        print("   ‚úì Reasonable amount of data rejected")

    print("üíæ Saving cleaned data with bad segment annotations‚Ä¶")
    raw_cleaned.save(str(annotated_path), overwrite=True, verbose=False)
    print(f"‚úÖ Saved annotated run: {annotated_path}")
else:
    print("‚úÖ No BAD segments marked")
    print("   Saving an annotated copy to record that this run was reviewed")
    raw_cleaned.save(str(annotated_path), overwrite=True, verbose=False)
    print(f"‚úÖ Saved reviewed run: {annotated_path}")

print("=" * 60)
print("üîÅ Re-run the run status cell to refresh the pending runs table.")

for name in ['current_run_info', 'current_run_label', 'raw', 'raw_ica', 'raw_cleaned', 'ica', 'bad_components', 'loaded_cleaned_path']:
    globals().pop(name, None)


‚ö†Ô∏è No cleaned data in memory. Run Section 9 first.


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
