# Goal

Post GitHub project that covers the following EEG data analysis results

- Before preprocessing and after preprocessing comparison on
    - channels where there is significant ERP difference between conditions
    - Before-and-after pre-processing comparison of absolute SNR and Cohen‚Äôs d
- ERP plots and data presentation that highly resemble those of the study.
- Have all of these results clearly showcased on my GitHub project.

# Plan

Will select 10 subjects with normal level noise on the 'Raw Data Quality Assessment‚Äô in pre-processing stage. 

ICA and artifact rejection will be performed on data of only 1 subjects. Autumated ICA approach will be used for the remaining 9 subjects. 

> **Notebook roles:** `02a_manual_ica_single_session.ipynb` handles the one-off manual ICA pass for the designated subject, while `02b_automated_ica_batch.ipynb` is the official third notebook in the run order and processes the remaining subjects with ICLabel + ARTIST, exporting QC metrics and summaries for GitHub.
>
> **Manual subject data inventory:**
> - **sub-001 (legacy test subject):** `data/preprocessed/after_rereferencing/sub-001/ses-0X/sub-001_ses-0X_run-*_preprocessed_after_rereferencing.fif` ‚Üí cleaned copies under `data/preprocessed/after_ica/sub-001/.../sub-001_ses-0X_run-*_preprocessed_ica_cleaned.fif`.
> - **sub-003 (current manual ICA subject):** per `02a`, each session lives under its own directory:
>   - Session 1: `data/preprocessed/after_rereferencing/sub-003/ses-01/sub-003_ses-01_run-*_preprocessed_after_rereferencing.fif` and cleaned versions `data/preprocessed/after_ica/sub-003/ses-01/sub-003_ses-01_run-*_preprocessed_ica_cleaned.fif`, plus the annotated aggregate `sub-003_ses-01_preprocessed_ica_cleaned_annotated.fif`.
>   - Session 2: same naming in `.../sub-003/ses-02/...` (e.g., `sub-003_ses-02_run-1_preprocessed_after_rereferencing.fif`, `sub-003_ses-02_run-1_preprocessed_ica_cleaned.fif`).
> - Keep sessions separate when saving; use the notebook helper‚Äôs merge step whenever we need a single continuous Raw object for Cohen‚Äôs d / SNR QC.

## Project structure

### GitHub Repository organization

EEG-Memory-Recognition-Analysis/
‚îú‚îÄ‚îÄ [README.md](http://readme.md/) (comprehensive showcase)
‚îú‚îÄ‚îÄ main_analysis.ipynb (your main results notebook)
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ preprocessing/
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ **init**.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ quality_assessment.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ ica_pipeline.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ artifact_rejection.py
‚îÇ   ‚îú‚îÄ‚îÄ analysis/
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ **init**.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ erp_analysis.py
‚îÇ   ‚îÇ   ‚îú‚îÄ‚îÄ statistical_tests.py
‚îÇ   ‚îÇ   ‚îî‚îÄ‚îÄ [visualization.py](http://visualization.py/)
‚îÇ   ‚îî‚îÄ‚îÄ utils/
‚îÇ       ‚îú‚îÄ‚îÄ **init**.py
‚îÇ       ‚îú‚îÄ‚îÄ data_loader.py
‚îÇ       ‚îî‚îÄ‚îÄ [helpers.py](http://helpers.py/)
‚îú‚îÄ‚îÄ notebooks/
‚îÇ   ‚îú‚îÄ‚îÄ 00_setup_and_exploration.ipynb ‚úÖ (COMPLETED)
‚îÇ   ‚îú‚îÄ‚îÄ 01_preprocessing_pipeline.ipynb (TO CREATE)
‚îÇ   ‚îú‚îÄ‚îÄ 02_manual_ica_review.ipynb ‚úÖ (COMPLETED)
‚îÇ   ‚îî‚îÄ‚îÄ 03_erp_analysis.ipynb (TO CREATE)
‚îú‚îÄ‚îÄ results/
‚îÇ   ‚îú‚îÄ‚îÄ figures/
‚îÇ   ‚îú‚îÄ‚îÄ preprocessed_data/
‚îÇ   ‚îî‚îÄ‚îÄ statistical_outputs/
‚îú‚îÄ‚îÄ docs/
‚îÇ   ‚îú‚îÄ‚îÄ preprocessing_report.md
‚îÇ   ‚îî‚îÄ‚îÄ [methodology.md](http://methodology.md/)
‚îî‚îÄ‚îÄ requirements.txt

## Local project structure

Project location: C:\Users\mints\Documents\EEG

## Pre-processing

1. **Data Loading and Initial Setup**
2. **Raw Data Quality Assessment:** The study says some participants have excessive noise ( 1 out of 14) and only 10 participants were put into use, so these participant shouldn‚Äôt be the subject for the manual ICA and artifact rejection. The pre-processing code filter the eligible participants based on ranking, only letting top 10 ranking participants in manual IAC and artifact rejection. 
    
    **Ranking Process**
    
    1. **Primary Criterion**:¬†Overall¬†Quality Score (descending)
    2. **Tie-Breaking Criterion**: ERP Signal¬†Power / Pre-stimulus Baseline SNR (descending)
        - Pre-stimulus Baseline SNR is averaged across ROI on each participant
        - **ERP/SNR**¬†(ERP signal power¬†/ pre-stimulus baseline SNR) is widely used on cognitive neuroscience EEG researches.
        
        üìã SNR COMPUTATION METHOD:
        
        - **Signal Power**: Event-related signal power¬†(ERP variance)
        - **Noise Power**: Pre-stimulus¬†baseline noise (variance)
        - **Formula**:¬†SNR =¬†10 √ó¬†log‚ÇÅ‚ÇÄ(ERP Signal¬†Power / Pre-stimulus Baseline Noise)
           ‚Ä¢ Units: Decibels (dB)
           ‚Ä¢ Implementation: Region-of-Interest (ROI) specific computation
        
        ÔøΩÔøΩ SIGNAL POWER ESTIMATION METHODS:
           ‚Ä¢ Event-Related Signal Power (Primary)
             - Extract epochs around stimulus events (-200 to 600 ms)
             - Compute ERP by averaging across trials
             - Calculate signal power as variance of ERP signal
           ‚Ä¢ Total Signal Variance (Alternative)
             - Compute variance of raw EEG signal in ROI channels
             - Includes both ERP and ongoing EEG components
        
        üîç NOISE POWER ESTIMATION METHODS:
        ‚Ä¢ Pre-stimulus Baseline (Primary)
        - Use -200 to 0 ms pre-stimulus period as noise baseline
        - Assumes this period represents background noise
        - Most accurate for ERP signal quality assessment
        ‚Ä¢ Inter-Trial Variability (Alternative)
        - Compute variance across trials at each time point
        - Represents trial-to-trial noise in ERP estimation
        - Provides measure of signal consistency
        ‚Ä¢ High-Frequency Noise (Fallback)
        - Filter to 50-100 Hz where ERP signal is minimal
        - Use variance of high-frequency components as noise  
    
    **Scoring Formula for Overall Quality Score**
    
    - **Base Score**: 100 points¬†(perfect quality)
    - **Range**: 0-100 (higher is better)
    - **Formula**:¬†Overall Quality Score¬†= 100 -¬†Œ£(Penalties)
        - **Penalties Applied in Overall Quality Score**
            1. **Flat Channels**: -10 points per channel
                - Channels with¬†zero variance (disconnected/malfunctioning electrodes)
            2. **High Variance Channels**:¬†-5¬†points per channel
                - Channels with excessive variance (95th¬†percentile threshold)
                - May indicate movement¬†artifacts or poor electrode contact
            3. **Excessive Line¬†Noise**: -15 points
                - When mean line noise power¬†> 1e-10 (50Hz¬†interference)
            4. **Low Channel Correlations**: -10 points
                - When mean correlation¬†across channels < 0.3
                - Indicates poor electrode contact or technical issues
    
    **Selection Threshold**
    
    - **Minimum¬†Quality Score**: >70/100¬†for inclusion
    - **Target Sample Size**: 10 subjects
    - **Manual ICA¬†Subject**: Median¬†quality among¬†selected subjects
3. **Channel Management and Montage**
4. **Filtering Strategy (**0.2-512 Hz)
5. **Line Noise Removal (**50 Hz notch filter)
6. **Re-referencing**
    - Save the EEG data with different name after this stage.
7. **Independent Component Analysis (ICA)**
    - **ICA component number**: 50 components
    - Manual work on 1 subjects, ICLabel for the other subjects
    - **Interactive GUI for artifact rejection**: 
        - No component preview before interactive GUI
        - Interactive GUI includes all information from comprehensive 6-panel visualization:
          * Scalp topography (with variance explained)
          * Component time series (2.5s preview with event markers)
          * Power spectrum 3-40 Hz (Log Power Spectral Density 10*log10(¬µV¬≤/Hz))
          * Power spectrum 3-80 Hz (Log Power Spectral Density 10*log10(¬µV¬≤/Hz))
          * ERP image heatmap (trial-by-trial activity, RMS ¬µVolts per channel)
          * Average ERP (trial-averaged activity, ¬µV units)
        - **Excludes**: Source localization (dipole fitting)
        - **Same units as reference**: ¬µV for time series, 10*log10(¬µV¬≤/Hz) for power spectra, RMS ¬µVolts for ERP image
        - **2025-11-11 follow-up**: Keep the ICA component browser on the inline backend. Before contaminated-section review, run %matplotlib qt (or tk) manually, then call raw_cleaned.plot(..., block=True) to open the windowed GUI.
    - **CRITICAL IMPORT REQUIREMENT**: Must import `plot_component_comprehensive` from `utils.ica_plotting` in Cell 2
    - Apply ICA to remove selected components
    - Save the EEG data with different name after ICA application
    - Document exact thresholds used and number of components removed per subject
8. **Mark and Reject Contaminated Sections** (NEW STEP - After ICA)
    - **CRITICAL REQUIREMENT**: Run the backend setup cell before Section 9 to configure inline widgets and the Qt window fallback; no manual `%matplotlib` calls needed unless troubleshooting.
    - **Manual visual inspection**: Browse through ICA-cleaned data to identify remaining artifacts
    - **Mark bad segments**: Use interactive plot to annotate contaminated time periods
    - **Artifact types to identify**:
      * Sudden amplitude jumps (movement artifacts)
      * Electrode pops/disconnections
      * Persistent high-frequency noise
      * Sections with unusual patterns not removed by ICA
    - **Interactive controls**:
      * Click and drag to select time range
      * Press 'a' to annotate as 'BAD'
      * Arrow keys to navigate, +/- to zoom
    - **Quality control**: Typically reject <10% of data; >20% indicates poor recording quality
    - **Save with annotations**: Bad segments saved as annotations for automatic exclusion during epoching
    - **Documentation**: Record number and duration of rejected segments per subject
9. **Bad Channel Interpolation**
10. **Epoching and Artifact Rejection**
    - Manual work on 1 subjects, ARTIST for the other subjects
    - Save the EEG data with different name after this stage.
    - Epoching: Divide each trial into epochs from 100 to 600 ms relative to stimulus onset. Each epoch should be baseline corrected by subtracting the average activity between 100 and 0 ms from each EEGdata channel.

### ICA & Artifact rejection

- **1 subjects**: Manual ICA and artifact rejection through human review
  - Manual component categorization using interactive GUI
  - Manual marking of contaminated time segments after ICA
  - Visual inspection and annotation of bad segments
- **9 subjects:** Fully automated ICA (ICLabel) and artifact rejection (ARTIST)
  - Automated component classification
  - Automated bad segment detection (amplitude thresholds)

## ERP analysis

- **Familiarity effect on all electrodes:** ERP comparison between familiar and new images, on all electrodes and timepoints, with t-test + FDR correction
    - Granuality: Familiar VS new, pool across participants and repetition
- **Repetition effect on ROI** (F3, Fz, F4, PO3, POz, PO4)
    - Granuality: 1st, 2nd, 3rd. Pool across participants
    
    > For the two regions of interest, the familiarity effect was computed for each of the three repetitions. A repeated measures ANOVA was then applied on each time point with
    FDR-correction.

    Saved outputs (03_erp_analysis.ipynb):
    - Repetition-wise Familiar ‚àí New figures: `results/figures/group_repetition_familiar_minus_new.png`
    - Familiarity effect stats (per-ROI, BH-FDR):
      - `results/statistical_outputs/erp_familiarity_stats_frontal_roi.csv`
      - `results/statistical_outputs/erp_familiarity_stats_parieto-occipital_roi.csv`
    - Repetition-wise repeated-measures ANOVA (per-ROI, per timepoint):
      - `results/statistical_outputs/erp_rep_anova_frontal_roi.csv`
      - `results/statistical_outputs/erp_rep_anova_parieto-occipital_roi.csv`
    - Group ERP timecourses (long format, by subject/session/ROI/condition):
      - `results/erp_timecourses.csv`
    > 
- **Category effects (animal vs non-animal) on ROI** (F3, Fz, F4, PO3, POz, PO4)
    
    > For each stimulus type (animal or non-animal), an familiarity difference was calculated (ERP familiar animal minus ERP new animal and ERP familiar non animal minus ERP new non-animal) for the two regions of
    interest. Paired t-tests were then performed with FDR correction.
    > 
- **Visualization:** Display ERP plots and data presentation that highly resemble those of the study on the main.ipynb notebook directly.

![image.png](attachment:155c189a-a5b1-42d9-8898-a20c3ae7ca6c:image.png)

![image.png](attachment:62c83612-d2c2-4882-b687-85ffb9f13bde:image.png)

![image.png](attachment:b56f4c73-5f0c-4250-b2c7-f8697135aad7:image.png)

![image.png](attachment:3c53d1c4-39e1-4073-a630-74ffb49677a8:image.png)

![image.png](attachment:7f7fccc4-1809-4916-928b-8f93c2f3cd38:image.png)

## Guide on Local Run

1. **Install Dependencies**:
    
    bash
    
    `cd "C:\Users\mints\Documents\EEG"
    pip install -r requirements.txt`
    
2. **Run Notebooks in Sequence**:
    - `00_setup_and_exploration.ipynb` - Data exploration & subject selection
    - `01_preprocessing_pipeline.ipynb` - Quality assessment and preprocessing
    - `02_manual_ica_review.ipynb` - Manual ICA and artifact rejection
    - Continue with ERP analysis
3. **Review Manual ICA Results**:
    - The code will identify the best subject for manual ICA review
    - Interactive plots will guide component rejection decisions
4. **Generate Final Results**:
    - Publication-quality ERP plots
    - Statistical significance maps
    - Before/after preprocessing comparisons

## Troubleshooting

### Common Issues and Solutions

#### **NameError: name 'plot_component_comprehensive' is not defined**
- **Cause**: Missing import in Cell 2 of `02_manual_ica_review.ipynb`
- **Solution**: Ensure Cell 2 contains: `from utils.ica_plotting import plot_component_comprehensive`
- **Prevention**: Always verify imports when modifying the notebook

#### **Interactive GUI shows wrong number of components**
- **Cause**: ICA component count not set to 50
- **Solution**: Ensure Cell 10 contains: `n_components = min(50, len(raw_ica.ch_names) - 1)`
- **Note**: Actual component count may be limited by available channels

#### **Component previews appear before Interactive GUI**
- **Cause**: Sections 6 and 6B contain plotting code
- **Solution**: Remove `ica.plot_components()` and `ica.plot_properties()` calls from these sections
- **Correct behavior**: Sections 6 and 6B should only show instructions and variance info

#### **NameError: name 'bad_categories' is not defined**
- **Cause**: Missing `bad_categories` variable definition in Interactive GUI
- **Solution**: Ensure Interactive GUI contains: `bad_categories = ['Muscle', 'Eye', 'Line', 'Channel', 'Other']`
- **Prevention**: Always verify variable definitions when modifying Interactive GUI code

#### **Export Summary doesn't update bad_components in Section 7**
- **Cause**: Missing `global bad_components` declaration in export function
- **Solution**: The export function now uses `global bad_components` to automatically update the variable
- **How it works**: After clicking "Export Summary", Section 7 will automatically use the categorized components
- **Manual override**: You can still manually set `bad_components = [...]` in Section 7 if needed

#### **Section 9 shows inline image instead of interactive GUI window**
- **Cause**: Matplotlib backend is set to inline mode (default in Jupyter)
- **Solution**: Run the cell with `%matplotlib qt` **before** running Section 9
- **Step-by-step**:
  1. There's a dedicated cell before Section 9 with backend switching code
  2. Run that cell first to enable interactive backend
  3. Then run Section 9 - a GUI window should open

#### **ImportError: Failed to import any Qt binding modules**
- **Cause**: PyQt5 (or other Qt bindings) not installed
- **Solution**: Install PyQt5 in your terminal:
  ```bash
  pip3 install PyQt5
  ```
- **Alternative backends**:
  - TkAgg: Usually comes with Python (use `%matplotlib tk`)
  - ipympl: Install with `pip3 install ipympl` (use `%matplotlib widget`)
- **Note**: The backend-switching cell will automatically try Qt first, then Tk as fallback
- **Headless environment**: If running on a server without display, you'll need X11 forwarding or use automated detection instead

#### **Can't drag to mark bad segments in Section 9 GUI**
- **Cause**: MNE's annotation mode is not automatically active
- **Solution - Method 1 (Interactive)**:
  1. In the plot window, look for the **'Annotate' button** or toolbar
  2. Click it to enable annotation mode
  3. OR right-click on the plot ‚Üí select annotation option
  4. Then click and drag to select time ranges
  5. Choose 'BAD' from the annotation menu
- **Solution - Method 2 (Automated)**:
  - Use Section 9A (Alternative) for automated bad segment detection
  - Based on amplitude thresholds (default: 150 ¬µV)
  - Automatically marks segments exceeding threshold
  - Can still review in interactive plot afterward
- **Controls in interactive plot**:
  - Arrow keys ‚Üê ‚Üí to navigate through time
  - +/- keys to zoom in/out
  - Press 'h' to see all keyboard shortcuts

## GitHub presentation

### **Best Strategy: Multi-Approach**

**1. Primary: GitHub Direct Display**

- ‚úÖ Will work perfectly for your matplotlib ERP plots
- ‚úÖ Shows code alongside results
- ‚úÖ Integrated with repository structure

**2. Backup: NBViewer Link**

- Include nbviewer link in your README
- Better rendering quality
- More professional presentation