# 03 EEG Preprocess RANSAC

## Overview
This notebook identifies **bad (noisy) EEG channels** across all recording sessions using the RANSAC algorithm from PyPREP. 

**Purpose:**
- Automatically detect malfunctioning or noisy electrodes that would contaminate the EEG signal
- Identify channels with poor contact, excessive noise, or artifacts
- Generate a summary report showing which channels are problematic across sessions

**What it does:**
1. Loads preprocessed raw EEG data from Notebook 02 (`session_XX-EEG-raw.pkl`)
2. Applies high-pass filtering (1 Hz) to remove slow drifts
3. Uses RANSAC (Random Sample Consensus) to detect bad channels by:
   - Comparing each channel against predictions from neighboring channels
   - Identifying channels that deviate significantly from expected patterns
4. Generates summary statistics showing which electrodes are frequently bad across sessions

**Output:**
- DataFrame showing bad channels per session
- Counter showing how often each electrode is marked as bad
- This information is used in Notebook 04 to exclude bad channels from further analysis

**Next step:** Based on the results, you'll manually remove consistently bad channels before running Notebook 04 (ICA preprocessing).

**Code Attribution:**
- Original EEG preprocessing code adapted from: Chiossi, F., Mayer, S., & Ou, C. (2024). MobileHCI 2024 Papers - Submission 7226.
- OSF Repository: https://osf.io/fncj4/overview (Created: Sep 11, 2023)
- License: GNU General Public License (GPL) 3.0
- Code has been modified for this study's session-based structure and experimental design.

## 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np
import mne
import pyprep
from collections import Counter

## 2. Load Session Mapping

In [2]:
# All 64 EEG channels (excluding Time, TimeLsl, and accelerometer data)
chan_names = ['Fp1', 'Fz', 'F3', 'F7', 'F9', 'FC5', 'FC1', 'C3', 'T7', 'CP5', 'CP1', 'Pz', 'P3', 'P7', 'P9', 'O1', 'Oz', 'O2', 'P10', 'P8', 'P4', 'CP2', 'CP6', 'T8', 'C4', 'Cz', 'FC2', 'FC6', 'F10', 'F8', 'F4', 'Fp2', 'AF7', 'AF3', 'AFz', 'F1', 'F5', 'FT7', 'FC3', 'C1', 'C5', 'TP7', 'CP3', 'P1', 'P5', 'PO7', 'PO3', 'Iz', 'POz', 'PO4', 'PO8', 'P6', 'P2', 'CPz', 'CP4', 'TP8', 'C6', 'C2', 'FC4', 'FT8', 'F6', 'F2', 'AF4', 'AF8']

# Load session mapping
df_sessions = pd.read_csv('./session_mapping.csv')
df_matched = df_sessions[df_sessions['eeg_file'] != 'NO MATCH'].copy()

# Find all available preprocessed pickle files
from pathlib import Path
import re

preprocessed_dir = Path('./preprocessed')
available_sessions = []

for pkl_file in sorted(preprocessed_dir.glob('session_*-EEG-raw.pkl')):
    # Extract session number from filename (e.g., session_00-EEG-raw.pkl -> 0)
    match = re.search(r'session_(\d+)-EEG-raw\.pkl', pkl_file.name)
    if match:
        session_id = int(match.group(1))
        available_sessions.append(session_id)

session_ids = sorted(available_sessions)

print(f"Channel names: {len(chan_names)}")
print(f"Total matched sessions: {len(df_matched)}")
print(f"Available preprocessed sessions: {len(session_ids)}")
print(f"Session IDs: {session_ids}")

Channel names: 64
Total matched sessions: 11
Available preprocessed sessions: 11
Session IDs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


# 3. Test Bad Channels after Preprocessing the Data
### Once bad channels have been identified, run the preprocessing above again without them. If the bad channel detection fails for one participant, remove the participant.

In [3]:
def eeg_getbads(session_id, chan_names, n_jobs=10):
    """Identify bad channels for a session."""
    
    dfEEG = pd.read_pickle(f"./preprocessed/session_{session_id:02d}-EEG-raw.pkl")
    
    # Use all EEG data
    
    info = mne.create_info(ch_names=chan_names, sfreq=500, ch_types='eeg', verbose=False)
    info.set_montage('standard_1020')
    info['subject_info'] = {"id":session_id}
    info['subject_info'] = {"his_id":str(session_id)}
    raw = mne.io.RawArray(dfEEG[chan_names].values.T/1000000, info, verbose=False)
    
    raw = raw.filter(l_freq=1, h_freq=None, fir_design='firwin2', verbose=False, n_jobs=n_jobs) 

    # Identify bad channels
    nc = pyprep.find_noisy_channels.NoisyChannels(raw)
    nc.find_all_bads()
    
    return nc.get_bads(verbose=False)


In [4]:
lst = []

In [5]:
for i, session_id in enumerate(session_ids):
    print(f"Processing {i+1}/{len(session_ids)}: Session {session_id}...")
    x = eeg_getbads(session_id, chan_names)
    lst.append([session_id, x])
    print(f"  Found bad channels: {x}")


Processing 1/11: Session 0...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Executing RANSAC
This may take a while, so be patient...
Executing RANSAC
This may take a while, so be patient...


  from .autonotebook import tqdm as notebook_tqdm
100%|██████████|  : 761/761 [03:08<00:00,    4.05it/s]


RANSAC done!





  Found bad channels: [np.str_('AF8'), 'FT7', np.str_('F7'), np.str_('POz'), np.str_('F4'), np.str_('F3')]
Processing 2/11: Session 1...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Executing RANSAC
This may take a while, so be patie

100%|██████████|  : 421/421 [01:26<00:00,    4.86it/s]
100%|██████████|  : 421/421 [01:26<00:00,    4.86it/s]



RANSAC done!
  Found bad channels: ['FC5', 'CP5', 'F10', np.str_('C5'), 'Pz', 'F9', 'C3', 'C1', np.str_('Fz'), 'O1']
Processing 3/11: Session 2...
  Found bad channels: ['FC5', 'CP5', 'F10', np.str_('C5'), 'Pz', 'F9', 'C3', 'C1', np.str_('Fz'), 'O1']
Processing 3/11: Session 2...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition 

100%|██████████|  : 412/412 [00:43<00:00,    9.58it/s]
100%|██████████|  : 412/412 [00:43<00:00,    9.58it/s]



RANSAC done!
  Found bad channels: ['FT8', 'CP3', 'F1', np.str_('FC2'), 'TP8', 'C2', 'PO3', 'C5', 'PO8', 'F2', 'TP7', 'AF8', 'PO4', 'CP4', 'AF3', 'AFz', 'FT7', 'FC3', np.str_('C4'), 'Iz', 'P6', 'P2', np.str_('FC6'), 'FC4', 'F5', 'F6', 'POz', 'C1', 'P5', 'F3', 'C6', 'AF7', np.str_('Cz'), 'PO7', 'P1', np.str_('T8'), 'AF4', 'CPz']
Processing 4/11: Session 3...
  Found bad channels: ['FT8', 'CP3', 'F1', np.str_('FC2'), 'TP8', 'C2', 'PO3', 'C5', 'PO8', 'F2', 'TP7', 'AF8', 'PO4', 'CP4', 'AF3', 'AFz', 'FT7', 'FC3', np.str_('C4'), 'Iz', 'P6', 'P2', np.str_('FC6'), 'FC4', 'F5', 'F6', 'POz', 'C1', 'P5', 'F3', 'C6', 'AF7', np.str_('Cz'), 'PO7', 'P1', np.str_('T8'), 'AF4', 'CPz']
Processing 4/11: Session 3...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00

100%|██████████|  : 262/262 [00:44<00:00,    5.87it/s]


RANSAC done!





  Found bad channels: ['Iz', 'FC5', 'AF7', 'F10', 'FC6', 'Cz', 'F9', np.str_('CP6'), np.str_('Fz'), 'Fp1', 'AF4', 'C3', 'Fp2']
Processing 5/11: Session 4...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Executing RANSAC
This may take 

100%|██████████|  : 702/702 [02:16<00:00,    5.16it/s]
100%|██████████|  : 702/702 [02:16<00:00,    5.16it/s]



RANSAC done!
  Found bad channels: ['F10', 'CP5', 'F9', np.str_('CP6'), np.str_('C2'), 'C3']
Processing 6/11: Session 5...
  Found bad channels: ['F10', 'CP5', 'F9', np.str_('CP6'), np.str_('C2'), 'C3']
Processing 6/11: Session 5...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50

100%|██████████|  : 702/702 [02:22<00:00,    4.92it/s]
100%|██████████|  : 702/702 [02:22<00:00,    4.92it/s]



RANSAC done!
  Found bad channels: ['F10', 'CP5', 'F9', np.str_('CP6'), 'C3']
Processing 7/11: Session 6...
  Found bad channels: ['F10', 'CP5', 'F9', np.str_('CP6'), 'C3']
Processing 7/11: Session 6...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 sam

100%|██████████|  : 494/494 [01:38<00:00,    5.00it/s]




RANSAC done!
  Found bad channels: [np.str_('FT8'), 'FC5', 'FC6', np.str_('F9'), np.str_('CP6'), np.str_('C2'), 'C3', 'T8', np.str_('FC1'), 'O1']
Processing 8/11: Session 7...
  Found bad channels: [np.str_('FT8'), 'FC5', 'FC6', np.str_('F9'), np.str_('CP6'), np.str_('C2'), 'C3', 'T8', np.str_('FC1'), 'O1']
Processing 8/11: Session 7...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband a

100%|██████████|  : 378/378 [01:19<00:00,    4.74it/s]
100%|██████████|  : 378/378 [01:19<00:00,    4.74it/s]



RANSAC done!
  Found bad channels: [np.str_('Cz'), 'Fp2']
Processing 9/11: Session 8...
  Found bad channels: [np.str_('Cz'), 'Fp2']
Processing 9/11: Session 8...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Executing RANSAC
This ma

100%|██████████|  : 256/256 [00:48<00:00,    5.28it/s]


RANSAC done!





  Found bad channels: ['Iz', 'F10', 'AF7', np.str_('CP5'), np.str_('CP3'), 'F9', 'P10', 'Fp1', np.str_('F3')]
Processing 10/11: Session 9...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Executing RANSAC
This may take a while, so be p

100%|██████████|  : 450/450 [01:19<00:00,    5.63it/s]
100%|██████████|  : 450/450 [01:19<00:00,    5.63it/s]



RANSAC done!
  Found bad channels: ['CP5', 'F10', 'FC6', np.str_('F3'), 'F9', np.str_('P1'), np.str_('F1'), 'P10', 'C3', np.str_('AF3'), np.str_('Fz'), 'O2', 'Fp2']
Processing 11/11: Session 10...
  Found bad channels: ['CP5', 'F10', 'FC6', np.str_('F3'), 'F9', np.str_('P1'), np.str_('F1'), 'P10', 'C3', np.str_('AF3'), np.str_('Fz'), 'O2', 'Fp2']
Processing 11/11: Session 10...
Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Filter length: 1651 samples (3.302 s)

Setting up high-pass filter at 1 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal highpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0

100%|██████████|  : 265/265 [01:07<00:00,    3.95it/s]


RANSAC done!





  Found bad channels: ['CP5', 'F10', 'CP2', np.str_('F2'), 'FC6', 'P3', np.str_('F9'), 'C3', 'O2']


In [6]:
dfBad = pd.DataFrame(lst)
dfBad.columns = ['SessionID', 'Electrodes']


dfBadCounter = pd.DataFrame.from_dict(Counter(np.concatenate(dfBad.Electrodes.values).ravel()), orient='index').reset_index()
dfBadCounter = dfBadCounter.rename(columns={'index':'Electrode', 0:'Count'})
dfBadCounter = dfBadCounter.sort_values("Count")
dfBadCounter

Unnamed: 0,Electrode,Count
2,F7,1
4,F4,1
10,Pz,1
29,AFz,1
20,TP8,1
22,PO3,1
30,FC3,1
31,C4,1
25,TP7,1
26,PO4,1


In [7]:
dfBad

Unnamed: 0,SessionID,Electrodes
0,0,"[AF8, FT7, F7, POz, F4, F3]"
1,1,"[FC5, CP5, F10, C5, Pz, F9, C3, C1, Fz, O1]"
2,2,"[FT8, CP3, F1, FC2, TP8, C2, PO3, C5, PO8, F2,..."
3,3,"[Iz, FC5, AF7, F10, FC6, Cz, F9, CP6, Fz, Fp1,..."
4,4,"[F10, CP5, F9, CP6, C2, C3]"
5,5,"[F10, CP5, F9, CP6, C3]"
6,6,"[FT8, FC5, FC6, F9, CP6, C2, C3, T8, FC1, O1]"
7,7,"[Cz, Fp2]"
8,8,"[Iz, F10, AF7, CP5, CP3, F9, P10, Fp1, F3]"
9,9,"[CP5, F10, FC6, F3, F9, P1, F1, P10, C3, AF3, ..."
