### Prerequisite

You should have already ran `preprocess.ipynb` on the raw data in order to generate labelled, cleaned, and segmented data (epochs) in the `processed` directory. The following section will load the preprocessed data and extract features and save them as `eeg_features.npy`.

In [12]:
# If you are running as a notebook add the path
import sys
sys.path.append('../')

from preprocessors.eeg_loader import EEGDataLoader
import numpy as np

First, load the processed data.

In [None]:
%%capture
FOLDER_PATH = "../processed/eeg_data"
eeg = EEGDataLoader(dir_path=FOLDER_PATH)

> [WARNING]
> This script will use **a lot of memory** in order to extract features since the dataset is quite large. Make sure you have enough memory available.

Now we extract the features for each epoch and save them into a numpy array file.

In [4]:
%%capture
eeg.extract_features()

### Training

This section will load extracted features and you can then take these feautres to train a model of your choice. Here, an example with XGBoost is shown.

In [6]:
from models.xgb_model import XGBoostClassifier

In [None]:
# If you saved the features before load directly
features = np.load("eeg_features.npy")
eeg = EEGDataLoader(dir_path="processed/eeg_data")
eeg.features = features

Perform some basic splitting and verification.

In [None]:
X_train, X_test, y_train, y_test = eeg.train_test_split()
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

# Remove NaN samples from X_train and y_train
mask_train = ~np.isnan(X_train).any(axis=(1, 2))
X_train_clean = X_train[mask_train]
y_train_clean = y_train[mask_train]

# Remove NaN samples from X_test and y_test
mask_test = ~np.isnan(X_test).any(axis=(1, 2))
X_test_clean = X_test[mask_test]
y_test_clean = y_test[mask_test]

print(f"Cleaned X_train shape: {X_train_clean.shape}")
print(f"Cleaned y_train shape: {y_train_clean.shape}")
print(f"Cleaned X_test shape: {X_test_clean.shape}")
print(f"Cleaned y_test shape: {y_test_clean.shape}")

In [None]:
xgb = XGBoostClassifier()
# Train, test, and save the model
xgb.fit(X_train_clean, y_train_clean, X_test_clean, y_test_clean)