## EEG Data Preprocessing and EEGNet Model Training
### Introduction

This documentation provides a step-by-step guide on how to preprocess EEG data from EDF files, segment the data, and train an EEGNet model using TensorFlow/Keras. The dataset used is from the PhysioNet EEG recordings of subjects before and during mental arithmetic tasks.



In [1]:
import os
import pyedflib
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from scipy.signal import resample
import matplotlib.pyplot as plt


## Step 1: Loading EDF Files and Extracting Signals
### Objective: Load EEG data from EDF files and extract the signals into numpy arrays.

Explanation: EDF (European Data Format) files are commonly used for storing EEG data. The MNE library in Python is a powerful tool for loading and manipulating EEG data from EDF files. In this step, we read the EDF files from the specified folder, extract the EEG signals, and store them in a list.

In [3]:
# Step 1: Load and Extract Signals from EDF Files
def load_edf_files(folder_path):
    edf_files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.edf')]
    all_signals = []
    max_samples = 0
    
    # Determine the maximum number of samples across all files
    for edf_file in edf_files:
        edf_reader = pyedflib.EdfReader(edf_file)
        max_samples = max(max_samples, edf_reader.getNSamples()[0])
        edf_reader.close()
    
    for edf_file in edf_files:
        edf_reader = pyedflib.EdfReader(edf_file)
        num_channels = edf_reader.signals_in_file
        signals = np.zeros((num_channels, max_samples))
        
        for i in range(num_channels):
            signal = edf_reader.readSignal(i)
            if len(signal) < max_samples:
                # Pad with zeros if the signal is shorter than the max_samples
                signal = np.pad(signal, (0, max_samples - len(signal)), 'constant')
            signals[i, :] = signal
        
        all_signals.append(signals)
        edf_reader.close()
    
    return np.array(all_signals), edf_files

folder_path = "C:/Users/kames/Downloads/eeg-during-mental-arithmetic-tasks-1.0.0/eeg-during-mental-arithmetic-tasks-1.0.0"
X_raw, edf_files = load_edf_files(folder_path)
print(f"X_raw shape: {X_raw.shape}")

X_raw shape: (72, 21, 94000)


In [4]:
import pandas as pd

# Load subject information
subject_info = pd.read_csv("C:/Users/kames/Downloads/subject-info.csv")

# Extract labels (assuming 'Count quality' is the column for labels)
labels_dict = {str(row['Subject']): row['Count quality'] for _, row in subject_info.iterrows()}

# Extract labels from file names
def get_labels(edf_files, labels_dict):
    labels = []
    for file in edf_files:
        subject_id = os.path.basename(file).split('_')[0]
        if subject_id in labels_dict:
            labels.append(labels_dict[subject_id])
        else:
            labels.append(None)  # Handle missing labels if necessary
    return np.array(labels)

y_raw = get_labels(edf_files, labels_dict)
print(f"y_raw shape: {y_raw.shape}")
print(f"y_raw: {y_raw}")


y_raw shape: (72,)
y_raw: [0 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1
 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1]


In [5]:
from sklearn.preprocessing import StandardScaler

def preprocess_signals(signals):
    # Transpose to have shape (samples, channels, time_points)
    signals = np.transpose(signals, (0, 2, 1))
    
    # Reshape for CNN input (samples, channels, time_points, 1)
    samples, time_points, channels = signals.shape
    signals = signals.reshape(samples, time_points, channels, 1)
    
    # Normalize
    scaler = StandardScaler()
    signals_scaled = scaler.fit_transform(signals.reshape(-1, channels)).reshape(signals.shape)
    
    return signals_scaled

X_scaled = preprocess_signals(X_raw)
print(f"X_scaled shape: {X_scaled.shape}")


X_scaled shape: (72, 94000, 21, 1)


## Step 2: Downsampling the Signals
### Objective: Reduce the sampling rate of the EEG signals to make the data more manageable for processing.

Explanation: EEG data often comes with a high sampling rate, such as 1000 Hz, which means there are 1000 data points per second. Downsampling reduces the number of data points, making it easier to handle computationally. Here, we downsample the signals from 1000 Hz to 250 Hz.

In [6]:
from scipy.signal import resample

def downsample_signals(signals, target_length):
    downsampled_signals = []
    for signal in signals:
        num_channels = signal.shape[0]
        downsampled_signal = np.zeros((num_channels, target_length))
        for i in range(num_channels):
            downsampled_signal[i, :] = resample(signal[i, :], target_length)
        downsampled_signals.append(downsampled_signal)
    return np.array(downsampled_signals)

# Assuming we want to downsample to 250 Hz from 1000 Hz and the original length is 94000 (for 60 seconds at 1000 Hz)
original_length = 94000
target_length = int(original_length * 250 / 1000)

X_downsampled = downsample_signals(X_raw, target_length)
print(f"X_downsampled shape: {X_downsampled.shape}")


X_downsampled shape: (72, 21, 23500)


## Step 3: Segmenting the Signals
### Objective: Split the continuous EEG signals into shorter, fixed-length segments.

Explanation: Segmenting the EEG data into shorter chunks (e.g., 1-second segments) can improve the performance of machine learning models by providing more training samples. Overlapping segments can also be used to increase the dataset size further. In this step, we split the signals into 1-second segments with a 50% overlap.

In [7]:
def segment_signals(signals, segment_length, overlap):
    segments = []
    for signal in signals:
        num_channels, num_time_points = signal.shape
        for start in range(0, num_time_points - segment_length + 1, segment_length // overlap):
            segments.append(signal[:, start:start + segment_length])
    return np.array(segments)

segment_length = 250  # 1 second segments
overlap = 2  # 50% overlap

X_segmented = segment_signals(X_downsampled, segment_length, overlap)
print(f"X_segmented shape: {X_segmented.shape}")


X_segmented shape: (13464, 21, 250)


## Step 4: Preprocessing the Signals
### Objective: Normalize and reshape the segmented signals to prepare them for input into the neural network.

Explanation: Normalization ensures that the features have similar scales, which helps in faster convergence during training. The segmented signals are reshaped to match the input shape expected by the EEGNet model, which is (samples, time points, channels, 1).

In [8]:
def preprocess_signals(signals):
    samples, channels, time_points = signals.shape
    signals = signals.transpose((0, 2, 1)).reshape(samples, time_points, channels, 1)
    
    scaler = StandardScaler()
    signals_scaled = scaler.fit_transform(signals.reshape(-1, channels)).reshape(signals.shape)
    
    return signals_scaled

X_preprocessed = preprocess_signals(X_segmented)
print(f"X_preprocessed shape: {X_preprocessed.shape}")

X_preprocessed shape: (13464, 250, 21, 1)


## Step 5: Creating Labels
### Objective: Ensure that each segmented signal has a corresponding label.

Explanation: Since we have segmented the data, we need to repeat the labels for each segment. This step ensures that each segment has a label corresponding to its original signal.



In [9]:
# Repeat labels to match the number of segments per original sample
num_segments_per_sample = X_segmented.shape[0] // len(X_downsampled)
y_repeated = np.repeat(y_raw, num_segments_per_sample)
print(f"y_repeated shape: {y_repeated.shape}")



y_repeated shape: (13464,)


## Step 6: Splitting the Dataset
### Objective: Split the dataset into training and testing sets.

Explanation: To evaluate the performance of the model, we need to separate the data into training and testing sets. The training set is used to train the model, while the testing set is used to evaluate its performance on unseen data.

In [10]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_preprocessed, y_repeated, test_size=0.2, random_state=42)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")


X_train shape: (10771, 250, 21, 1)
X_test shape: (2693, 250, 21, 1)
y_train shape: (10771,)
y_test shape: (2693,)


## Step 7: Building the EEGNet Model
### Objective: Define the architecture of the EEGNet model.

Explanation: EEGNet is a compact convolutional neural network (CNN) architecture designed for EEG signal classification. The model consists of several convolutional layers, followed by depthwise and separable convolutions, which help in learning spatial and temporal features from the EEG data.



In [11]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, DepthwiseConv2D, AveragePooling2D, SeparableConv2D, Dropout, Flatten, Dense

def EEGNet(nb_classes, Chans=21, Samples=250, dropoutRate=0.5, kernLength=64, F1=8, D=2, F2=16, norm_rate=0.25):
    input1 = Input(shape=(Samples, Chans, 1))

    block1 = Conv2D(F1, (kernLength, 1), padding='same', use_bias=False)(input1)
    block1 = BatchNormalization()(block1)
    block1 = DepthwiseConv2D((1, Chans), use_bias=False, depth_multiplier=D, depthwise_constraint=tf.keras.constraints.max_norm(1.))(block1)
    block1 = BatchNormalization()(block1)
    block1 = tf.keras.layers.Activation('elu')(block1)
    block1 = AveragePooling2D((4, 1))(block1)
    block1 = Dropout(dropoutRate)(block1)

    block2 = SeparableConv2D(F2, (16, 1), use_bias=False, padding='same')(block1)
    block2 = BatchNormalization()(block2)
    block2 = tf.keras.layers.Activation('elu')(block2)
    block2 = AveragePooling2D((8, 1))(block2)
    block2 = Dropout(dropoutRate)(block2)

    flatten = Flatten(name='flatten')(block2)

    dense = Dense(nb_classes, name='dense', kernel_constraint=tf.keras.constraints.max_norm(norm_rate))(flatten)
    softmax = tf.keras.layers.Activation('softmax', name='softmax')(dense)

    return Model(inputs=input1, outputs=softmax)

# Parameters
nb_classes = len(np.unique(y_train))  # Number of classes

# Create the model
eegnet = EEGNet(nb_classes=nb_classes, Chans=21, Samples=250, dropoutRate=0.5, kernLength=64, F1=8, D=2, F2=16, norm_rate=0.25)

# Compile the model
eegnet.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Print the model summary
eegnet.summary()


## Step 8: Compiling, Training, and Evaluating the EEGNet Model
### Objective: Compile, train, and evaluate the EEGNet model for classification tasks using EEG data.

Explanation: This step involves creating, compiling, training, and evaluating an EEGNet model using TensorFlow/Keras. EEGNet is a convolutional neural network architecture designed for processing EEG signals and performing classification tasks.

In [12]:
# Train the model
history = eegnet.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))

# Evaluate the model
score = eegnet.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Epoch 1/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m75s[0m 102ms/step - accuracy: 0.6985 - loss: 0.6171 - val_accuracy: 0.7230 - val_loss: 0.5551
Epoch 2/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 59ms/step - accuracy: 0.7541 - loss: 0.5285 - val_accuracy: 0.8099 - val_loss: 0.4558
Epoch 3/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 60ms/step - accuracy: 0.7827 - loss: 0.4911 - val_accuracy: 0.8303 - val_loss: 0.4299
Epoch 4/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 59ms/step - accuracy: 0.8058 - loss: 0.4478 - val_accuracy: 0.8414 - val_loss: 0.3879
Epoch 5/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 64ms/step - accuracy: 0.8080 - loss: 0.4370 - val_accuracy: 0.8485 - val_loss: 0.3632
Epoch 6/100
[1m337/337[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 65ms/step - accuracy: 0.8101 - loss: 0.4311 - val_accuracy: 0.8593 - val_loss: 0.3554
Epoch 7/1

In [19]:
print('Test accuracy:', score[1])

Test accuracy: 0.8915707468986511
