# 03 EEG Preprocess RANSAC

## Overview
This notebook identifies **bad (noisy) EEG channels** across all recording sessions using the RANSAC algorithm from PyPREP. 

**Purpose:**
- Automatically detect malfunctioning or noisy electrodes that would contaminate the EEG signal
- Identify channels with poor contact, excessive noise, or artifacts
- Generate a summary report showing which channels are problematic across sessions

**What it does:**
1. Loads preprocessed raw EEG data from Notebook 02 (`session_XX-EEG-raw.pkl`)
2. Applies high-pass filtering (1 Hz) to remove slow drifts
3. Uses RANSAC (Random Sample Consensus) to detect bad channels by:
   - Comparing each channel against predictions from neighboring channels
   - Identifying channels that deviate significantly from expected patterns
4. Generates summary statistics showing which electrodes are frequently bad across sessions

**Output:**
- DataFrame showing bad channels per session
- Counter showing how often each electrode is marked as bad
- This information is used in Notebook 04 to exclude bad channels from further analysis

**Next step:** Based on the results, you'll manually remove consistently bad channels before running Notebook 04 (ICA preprocessing).

**Code Attribution:**
- Original EEG preprocessing code adapted from: Chiossi, F., Mayer, S., & Ou, C. (2024). MobileHCI 2024 Papers - Submission 7226.
- OSF Repository: https://osf.io/fncj4/overview (Created: Sep 11, 2023)
- License: GNU General Public License (GPL) 3.0
- Code has been modified for this study's session-based structure and experimental design.

## 1. Import Libraries

In [12]:
import pandas as pd
import numpy as np
import mne
import pyprep
from collections import Counter

## 2. Load Session Mapping

In [13]:
# All 64 EEG channels (excluding Time, TimeLsl, and accelerometer data)
chan_names = ['Fp1', 'Fz', 'F3', 'F7', 'F9', 'FC5', 'FC1', 'C3', 'T7', 'CP5', 'CP1', 'Pz', 'P3', 'P7', 'P9', 'O1', 'Oz', 'O2', 'P10', 'P8', 'P4', 'CP2', 'CP6', 'T8', 'C4', 'Cz', 'FC2', 'FC6', 'F10', 'F8', 'F4', 'Fp2', 'AF7', 'AF3', 'AFz', 'F1', 'F5', 'FT7', 'FC3', 'C1', 'C5', 'TP7', 'CP3', 'P1', 'P5', 'PO7', 'PO3', 'Iz', 'POz', 'PO4', 'PO8', 'P6', 'P2', 'CPz', 'CP4', 'TP8', 'C6', 'C2', 'FC4', 'FT8', 'F6', 'F2', 'AF4', 'AF8']

# Load session mapping
df_sessions = pd.read_csv('./session_mapping.csv')
df_matched = df_sessions[df_sessions['eeg_file'] != 'NO MATCH'].copy()

# Find all available preprocessed pickle files
from pathlib import Path
import re

preprocessed_dir = Path('./preprocessed')
available_sessions = []

for pkl_file in sorted(preprocessed_dir.glob('session_*-EEG-raw.pkl')):
    # Extract session number from filename (e.g., session_00-EEG-raw.pkl -> 0)
    match = re.search(r'session_(\d+)-EEG-raw\.pkl', pkl_file.name)
    if match:
        session_id = int(match.group(1))
        available_sessions.append(session_id)

session_ids = sorted(available_sessions)

print(f"Channel names: {len(chan_names)}")
print(f"Total matched sessions: {len(df_matched)}")
print(f"Available preprocessed sessions: {len(session_ids)}")
print(f"Session IDs: {session_ids}")

Channel names: 64
Total matched sessions: 17
Available preprocessed sessions: 17
Session IDs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]


# 3. Test Bad Channels after Preprocessing the Data
### Once bad channels have been identified, run the preprocessing above again without them. If the bad channel detection fails for one participant, remove the participant.

In [14]:
def eeg_getbads(session_id, chan_names, n_jobs=10):
    """Identify bad channels for a session."""
    
    dfEEG = pd.read_pickle(f"./preprocessed/session_{session_id:02d}-EEG-raw.pkl")
    
    # Convert to float64 and handle NaN/Inf values
    eeg_data = dfEEG[chan_names].values.T / 1000000
    eeg_data = np.nan_to_num(eeg_data, nan=0.0, posinf=0.0, neginf=0.0)
    eeg_data = eeg_data.astype(np.float64)
    
    info = mne.create_info(ch_names=chan_names, sfreq=500, ch_types='eeg', verbose=False)
    info.set_montage('standard_1020')
    info['subject_info'] = {"id": session_id}
    info['subject_info'] = {"his_id": str(session_id)}
    raw = mne.io.RawArray(eeg_data, info, verbose=False)
    
    raw = raw.filter(l_freq=1, h_freq=None, fir_design='firwin2', verbose=False, n_jobs=n_jobs) 

    # Identify bad channels
    try:
        nc = pyprep.find_noisy_channels.NoisyChannels(raw, random_state=42)
        nc.find_all_bads()
        return nc.get_bads(verbose=False)
    except Exception as e:
        print(f"  ⚠️ Error during RANSAC: {e}")
        return []


In [15]:
lst = []

In [18]:
for i, session_id in enumerate(session_ids):
    # check if file already exists (z.B. als Pickle-Datei)
    result_path = preprocessed_dir / f"session_{session_id:02d}-EEG-raw.pkl"
    if result_path.exists():
        print(f"Skipping {session_id} (already processed)")
        continue

    print(f"Processing {i+1}/{len(session_ids)}: Session {session_id}...")
    x = eeg_getbads(session_id, chan_names)
    lst.append([session_id, x])
    print(f"  Found bad channels: {x}")

    # save result
    pd.to_pickle(x, result_path)


Skipping 0 (already processed)
Skipping 1 (already processed)
Skipping 2 (already processed)
Skipping 3 (already processed)
Skipping 4 (already processed)
Skipping 5 (already processed)
Skipping 6 (already processed)
Skipping 7 (already processed)
Skipping 8 (already processed)
Skipping 9 (already processed)
Skipping 10 (already processed)
Skipping 11 (already processed)
Skipping 12 (already processed)
Skipping 13 (already processed)
Skipping 14 (already processed)
Skipping 15 (already processed)
Skipping 16 (already processed)


In [19]:
dfBad = pd.DataFrame(lst)
dfBad.columns = ['SessionID', 'Electrodes']


dfBadCounter = pd.DataFrame.from_dict(Counter(np.concatenate(dfBad.Electrodes.values).ravel()), orient='index').reset_index()
dfBadCounter = dfBadCounter.rename(columns={'index':'Electrode', 0:'Count'})
dfBadCounter = dfBadCounter.sort_values("Count")
dfBadCounter

Unnamed: 0,Electrode,Count
1,F4,1
3,P9,1
13,Pz,1
27,P5,1
31,F2,1
30,FC4,1
29,TP8,1
28,CP4,1
24,C4,1
25,PO3,1


In [20]:
dfBad

Unnamed: 0,SessionID,Electrodes
0,0,"[F7, F4, POz, P9, FT7, AF8, F3]"
1,1,"[F10, FC5, C3, O1, C1, C5, Pz, F9, CP5]"
2,2,"[FC3, P2, AF4, C6, FT7, C1, Fp1, AF7, FT8, F6,..."
3,3,"[Fp1, F10, FC5, Iz, C3, Fp2, FC6, AF7, AF4, Fz..."
4,4,"[F10, C3, CP6, F9, CP5]"
5,5,"[F10, C3, CP6, F9, CP5]"
6,6,"[Fp1, FC5, C3, FC6, T8, FT8, O1, FC1, AF3]"
7,7,"[Cz, Fp2, P1, Iz]"
8,8,"[Fp1, F10, CP5, Iz, CP3, AF7, P10, TP7, F9, F3]"
9,9,"[Fp1, F10, O2, C3, Fp2, FC6, F1, P10, F3, AF3,..."
