# EEG Data Processing for Parkinson's Disease Classification

This notebook processes raw EEG data and creates balanced train/validation/test splits.

**Dataset Information:**
- 55 subjects total (27 Control + 28 Parkinson's)
- Preprocessing selecting 46 subjects
- 59 EEG channels
- 500 Hz sampling rate
- 60 seconds per subject (eyes closed resting state)
- Pre-processed with 1 Hz high-pass filter

## 1. Import Libraries

In [1]:

import numpy as np
import pandas as pd
from pathlib import Path
import json
import joblib
import sys

# Add modules to path
sys.path.append(str(Path.cwd() / 'modules'))

from modules.path_module import *
from modules.processing_module import *

## 2. Setup Paths and Parameters

In [2]:
raw_path = raw_datapath()
processed_path = processed_datapath()

print(f"Raw data path: {raw_path}")
print(f"Processed data path: {processed_path}")

# Parameters
SAMPLING_RATE = 500  # Hz
SEGMENT_LENGTH = 5  # seconds
DATA_SPLIT = (0.6, 0.2, 0.2)  # Train, Val, Test
RANDOM_SEED = 42


Raw data path: c:\Users\KSWes\MemSNNforPD\LightCNNforPD-master\data\raw
Processed data path: c:\Users\KSWes\MemSNNforPD\MemSNNforPD-master\data\processed


## 3. Load Selected Subject Files

In [3]:
# Get all CSV files
control_files = sorted(raw_path.glob('C*.csv'))
pd_files = sorted(raw_path.glob('P*.csv'))

print(f" {len(control_files)} Control subjects")
print(f" {len(pd_files)} Parkinson's subjects")
print(f"Total: {len(control_files) + len(pd_files)} subjects")

# Combine files and create group labels
all_files = control_files + pd_files
groups = ['C'] * len(control_files) + ['P'] * len(pd_files)

 27 Control subjects
 28 Parkinson's subjects
Total: 55 subjects


## 4. Load and Inspect Sample Data

In [4]:
sample_data = pd.read_csv(control_files[0], header=None)

print(f"\nSample data shape: {sample_data.shape}")
print(f"Channels: {sample_data.shape[0]}")
print(f"Samples: {sample_data.shape[1]}")
print(f"Duration: {sample_data.shape[1] / SAMPLING_RATE} seconds")


Sample data shape: (59, 30000)
Channels: 59
Samples: 30000
Duration: 60.0 seconds


## 5. Load All Data

In [5]:
print("Loading all EEG data...")
dataset = data_load(all_files, groups, Fs=SAMPLING_RATE)

print(f"\nLoaded {len(dataset)} subjects")

# Count subjects per group
control_count = sum(1 for eeg in dataset.values() if eeg.label == 0)
pd_count = sum(1 for eeg in dataset.values() if eeg.label == 1)

Loading all EEG data...

Loaded 55 subjects


## 6. Split patients, then Segment

In [6]:
train_data, val_data, test_data, patient_metadata = data_prepare_patient_level(
    dataset,
    seg_length=SEGMENT_LENGTH,
    data_split=DATA_SPLIT,
    seed=RANDOM_SEED
)



Patient split:
  Train: 32 patients
  Val:   10 patients
  Test:  13 patients

Segment counts:
  Train: 384
  Val:   120
  Test:  156


## 7. Save Processed Data

In [7]:
train_path = processed_path / 'train_data.pkl'
val_path   = processed_path / 'val_data.pkl'
test_path  = processed_path / 'test_data.pkl'

print("\nSaving processed data...")
joblib.dump(train_data, train_path)
joblib.dump(val_data, val_path)
joblib.dump(test_data, test_path)

print("Data saved successfully!")
print(f"Train: {train_path}")
print(f"Val:   {val_path}")
print(f"Test:  {test_path}")



Saving processed data...
Data saved successfully!
Train: c:\Users\KSWes\MemSNNforPD\MemSNNforPD-master\data\processed\train_data.pkl
Val:   c:\Users\KSWes\MemSNNforPD\MemSNNforPD-master\data\processed\val_data.pkl
Test:  c:\Users\KSWes\MemSNNforPD\MemSNNforPD-master\data\processed\test_data.pkl


## 8. Create Metadata File

In [8]:
metadata = {
    'num_channels': train_data[0].data.shape[0],
    'sampling_rate': SAMPLING_RATE,
    'segment_length': SEGMENT_LENGTH,
    'segment_samples': train_data[0].data.shape[1],
    'data_split': DATA_SPLIT,
    'random_seed': RANDOM_SEED,
    'split_method': 'patient_level',
    'train_size': len(train_data),
    'val_size': len(val_data),
    'test_size': len(test_data),
    'train_control': sum(1 for x in train_data if x.label == 0),
    'train_pd': sum(1 for x in train_data if x.label == 1),
    'val_control': sum(1 for x in val_data if x.label == 0),
    'val_pd': sum(1 for x in val_data if x.label == 1),
    'test_control': sum(1 for x in test_data if x.label == 0),
    'test_pd': sum(1 for x in test_data if x.label == 1),
    'patient_split': patient_metadata['patient_counts'],
    'patient_ids': patient_metadata['patient_ids']
}

metadata_path = processed_path / 'metadata.json'

with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=4)

print("\nMetadata saved!")
print(f"Metadata path: {metadata_path}")



Metadata saved!
Metadata path: c:\Users\KSWes\MemSNNforPD\MemSNNforPD-master\data\processed\metadata.json
