# EEG Motor Movement/Imagery Classification Using Random Forest and Convolutional Neural Networks


## Project Overview

In the following project, I explore the EEG Motor Movement/Imagery Dataset from PhysioNet. The goal is to classify different motor movements and imagery tasks using machine learning techniques, specifically Random Forest and Convolutional Neural Networks (CNNs).


## Dataset Description

> **[EEG Motor Movement/Imagery Dataset](https://physionet.org/content/eegmmidb/1.0.0/)**
> 
> Data set consists of over 1500 one- and two-minute EEG recordings, obtained from 109 volunteers, as described below.
> 
> Subjects performed different motor/imagery tasks while 64-channel EEG were recorded using the BCI2000 system. 
>
> The experimental runs were:
> 1. Baseline, eyes open
> 2. Baseline, eyes closed
> 3. Task 1 (open and close left or right fist)
> 4. Task 2 (imagine opening and closing left or right fist)
> 5. Task 3 (open and close both fists or both feet)
> 6. Task 4 (imagine opening and closing both fists or both feet)
> 7. Task 1
> 8. Task 2
> 9. Task 3
> 10. Task 4
> 11. Task 1
> 12. Task 2
> 13. Task 3
> 14. Task 4
> 
> Each annotation includes one of three codes (**T0**, **T1**, or **T2**):
> - **T0** corresponds to rest
> - **T1** corresponds to onset of motion (real or imagined) of
>     the left fist (in runs 3, 4, 7, 8, 11, and 12)
>     both fists (in runs 5, 6, 9, 10, 13, and 14)
> - **T2** corresponds to onset of motion (real or imagined) of
    the right fist (in runs 3, 4, 7, 8, 11, and 12)
    both feet (in runs 5, 6, 9, 10, 13, and 14)


## Development Report

### Plan changes

Initial intention of the project was to compare performance of Random Forest and CNN model on the EEG Motor Movement/Imagery Dataset. Assumptions were that CNN would outperform Random Forest due to its ability to capture spatial and temporal patterns in the EEG data. However, during the research and implementation phase, I faced models that fit the domain better and provided more reliable results, thus it was decided to add a third instance for comparison: ***.

### Preprocessing

EEG data is represented as multichannel time series data with annotations. The following preprocessing steps were applied:

- **Bandpass filtering**. Scalp EEG activity shows oscillations at a variety of frequencies. Several of these oscillations have characteristic frequency ranges, spatial distributions and are associated with different states of brain functioning. Alpha (8-12 Hz) and Beta (13-30 Hz) rhythms are particularly relevant for motor imagery tasks. A bandpass filter was applied to retain frequencies between 8-30 Hz, removing irrelevant frequency components and noise.

- **Channel selection**. Dataset aquired using 64-channel EEG cap. For motor imagery tasks, channels over frontocentral, central, and centroparietal regions (e.g., FC3, FCz, Cz, CPz) are the most informative. Channels irrelevant to motor imagery were excluded to reduce dimensionality and focus on pertinent signals.

- **Epoching**. Continuous EEG data was segmented into fixed-length epochs of 2 seconds aligned to task onsets.

<!-- ### Random Forest Model-->

<!-- ### CNN -->

# Code

### Setting variables

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()

INPUT_DIR = os.getenv("INPUT_DIR", "./data/raw")
OUTPUT_DIR = os.getenv("OUTPUT_DIR", "./result")

records_file = os.path.join(INPUT_DIR, "RECORDS")
print(f"Records file: {records_file}")

SFREQ = 160.0
WINDOW_SEC = 2
OVERLAP = 0.5

FILTER_BANDWIDTH = True
MIN_FREQ = 8.0
MAX_FREQ = 30.0

FILTER_MOTOR_CHANNELS = True
# Keeping Frontocentral, Central and Centroparietal channels
picks = ['FC5', 'FC3', 'FC1', 'FCZ', 'FC2', 'FC4', 'FC6',
        'C5', 'C3', 'C1', 'CZ', 'C2', 'C4', 'C6',
        'CP5', 'CP3', 'CP1', 'CPZ', 'CP2', 'CP4', 'CP6']

CHANNELS = len(picks) if FILTER_MOTOR_CHANNELS else 64

EXECUTION_RUNS = [4, 6, 8, 10, 12, 14]

DEBUG = False

## 1. Data Preparation

### 1.1. Loading RECORDS file

In [None]:
with open(records_file, "r") as f:
    records = [line.strip() for line in f if line.strip()]
print(f"Number of RECORDS entries: {len(records)}")

edf_paths = [os.path.join(INPUT_DIR, f"{record}") for record in records]
print(f"\nResolved EDF files: {len(edf_paths)}")

### 1.2. EDF file loading and preprocessing

In [None]:
import mne
import pandas as pd
import re

if DEBUG:
    mne.set_log_level('DEBUG')
else:
    mne.set_log_level('ERROR')

all_dfs = []
for p in edf_paths:
    # Read raw EDF file
    raw = mne.io.read_raw_edf(p, preload=True, verbose='ERROR')

    # Extract subject and run ID
    match = re.search(r'S(\d+)R(\d+)', p)
    if match:
        subject_id = int(match.group(1))
        run_id = int(match.group(2))
    else:
        subject_id, run_id = None, None

    # Skip if edf only contains T0 (baseline)
    events, event_id = mne.events_from_annotations(raw)
    active_event_ids = {k: v for k, v in event_id.items() if k != 'T0'}
    if not active_event_ids:
        continue

    # Bandwidth filter, leaving only alpha and beta bands
    if FILTER_BANDWIDTH:
        raw.filter(MIN_FREQ, MAX_FREQ, fir_design="firwin", skip_by_annotation="edge")

    # Filtering channels to relevant ones
    if FILTER_MOTOR_CHANNELS:
        raw.rename_channels(lambda x: x.strip('.').upper())
        raw.pick(picks)

    # Resample to target frequency (just in case)
    if raw.info['sfreq'] != SFREQ:
        raw.resample(SFREQ)
        if DEBUG:
            print(f"Resampling from {raw.info['sfreq']} to {SFREQ} Hz: {p}")
    
    # Read epochs, filtering baseline
    epochs = mne.Epochs(
        raw, 
        events, 
        event_id=active_event_ids, 
        tmin=0, 
        tmax=WINDOW_SEC,
        baseline=None, 
        preload=True, 
        verbose='ERROR'
    )

    # Convert to DF and add to our list
    df_temp = epochs.to_data_frame().copy()

    # Use descriptive strings or Booleans instead of 0/1
    df_temp['subject'] = f"S{subject_id:03d}"
    df_temp['run'] = run_id
    df_temp['is_executed'] = run_id in EXECUTION_RUNS
    df_temp['task_type'] = 'execution' if run_id in EXECUTION_RUNS else 'imagery'
    all_dfs.append(df_temp)

    if DEBUG:
        print(f"Processing file: {raw.info}")
        print(f"Task label: {raw.annotations.description[0]}")

    del raw
    del epochs

raws = pd.concat(all_dfs, ignore_index=True)