### Experiment: Pre-processing and frequency sampling

**Question**: Is it possible to train a model on unpreprocessed EEG data and still attain similar performance levels?

**Hypothesis**: The model will perform worse, but if still similar then the added value of not having to (manually) preprocess EEG data is very valuable and opens up a multitude of applications.

**Result**:

#### Part 1: Preparing data
To use hmp.utils.read_mne_data() and epoch the information, the files should be in .fif format, this replicates automated preprocessing as done in https://github.com/GWeindel/hsmm_mvpy/blob/main/tutorials/sample_data/eeg/0022.ipynb excepting resampling to 100Hz

In [1]:
import mne
from pathlib import Path
import hsmm_mvpy as hmp
import pandas as pd
import numpy as np
import xarray as xr
from hmpai.data import add_stage_dimension

In [2]:
# Set up paths and file locations
data_path = Path("/mnt/d/thesis/sat1/")
behavioral_data_path = data_path / "ExperimentData/ExperimentData"
output_path = Path("../data/sat1/unpreprocessed")

subj_ids = [
    subj_id.name.split("-")[1][:4] for subj_id in (data_path / "eeg4").glob("*.vhdr")
]
subj_files = [
    str(output_path / f"unprocessed_{subj_id}_epo.fif") for subj_id in subj_ids
]
behavioral_files = [
    str(behavioral_data_path / f"{subj_id}-cnv-sat3_ET.csv") for subj_id in subj_ids
]

In [None]:
# Replacing preprocessing done in https://github.com/GWeindel/hsmm_mvpy/blob/main/tutorials/sample_data/eeg/0022.ipynb
# with only the necessary (non-manual) parts, like adding metadata for processing in HMP package, more info in link above
for subject_id in subj_ids:
    print(f"Processing subject: {subject_id}")
    subject_id_short = subject_id.replace("0", "")
    raw = mne.io.read_raw_brainvision(
        data_path / "eeg4" / f"MD3-{subject_id}.vhdr", preload=False
    )
    raw.set_channel_types(
        {"EOGh": "eog", "EOGv": "eog", "A1": "misc", "A2": "misc"}
    )  # Declare type to avoid confusion with EEG channels
    raw.rename_channels({"FP1": "Fp1", "FP2": "Fp2"})  # Naming convention
    raw.set_montage("standard_1020")  # Standard 10-20 electrode montage
    raw.rename_channels({"Fp1": "FP1", "Fp2": "FP2"})

    behavioral_path = behavioral_data_path / f"{subject_id}-cnv-sat3_ET.csv"
    behavior = pd.read_csv(behavioral_path, sep=";")[
        [
            "stim",
            "resp",
            "RT",
            "cue",
            "movement",
        ]
    ]
    behavior["movement"] = behavior.apply(
        lambda row: "stim_left"
        if row["movement"] == -1
        else ("stim_right" if row["movement"] == 1 else np.nan),
        axis=1,
    )
    behavior["resp"] = behavior.apply(
        lambda row: "resp_left"
        if row["resp"] == 1
        else ("resp_right" if row["resp"] == 2 else np.nan),
        axis=1,
    )
    # Merging together the exeperimental conditions info to have the format condition/stimulus/response
    behavior["trigger"] = (
        behavior["cue"] + "/" + behavior["movement"] + "/" + behavior["resp"]
    )
    # Filtering out < 300 and > 3000 Reaction times
    behavior["RT"] = behavior.apply(
        lambda row: 0
        if row["RT"] < 300
        else (0 if row["RT"] > 3000 else float(row["RT"]) / 1000),
        axis=1,
    )
    epochs = mne.io.read_epochs_fieldtrip(
        data_path / "eeg1" / f"data{subject_id_short}.mat", info=raw.info
    )
    epochs.rename_channels({"FP1": "Fp1", "FP2": "Fp2"})  # Naming convention
    epochs.set_montage("easycap-M1")
    epochs.filter(1, 35)  # Bandwidth filter from van Maanen, Portoles & Borst (2021)
    epochs.crop(tmin=-0.250)
    epochs.set_eeg_reference("average")
    epochs.metadata = behavior
    epochs.save(
        output_path / f"unprocessed_{subject_id}_epo.fif", overwrite=True, verbose=False
    )  # Saving EEG mne format

In [4]:
output_path_data = Path("../data/sat1/data_unprocessed_500hz.nc")
# Run if data_unprocessed.nc does not exist or should be rewritten
data = hmp.utils.read_mne_data(
    subj_files,
    epoched=True,
    lower_limit_RT=0.2,
    upper_limit_RT=2,
    verbose=False,
    subj_idx=subj_ids,
    rt_col="RT",
)
data.to_netcdf(output_path_data)

Processing participant data/sat1/unpreprocessed/unprocessed_0001_epo.fif's epoched eeg
198 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0001_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0002_epo.fif's epoched eeg
200 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0002_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0003_epo.fif's epoched eeg
191 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0003_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0004_epo.fif's epoched eeg
200 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0004_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0005_epo.fif's epoched eeg
190 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0005_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0006_epo.fif's epoched eeg
200 trials were retaine

In [5]:
output_path_data = Path("../data/sat1/data_unprocessed_100hz.nc")
# Run if data_unprocessed.nc does not exist or should be rewritten
data = hmp.utils.read_mne_data(
    subj_files,
    epoched=True,
    lower_limit_RT=0.2,
    upper_limit_RT=2,
    sfreq=100,
    verbose=False,
    subj_idx=subj_ids,
    rt_col="RT",
)
data.to_netcdf(output_path_data)

Processing participant data/sat1/unpreprocessed/unprocessed_0001_epo.fif's epoched eeg
198 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0001_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0002_epo.fif's epoched eeg
200 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0002_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0003_epo.fif's epoched eeg
191 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0003_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0004_epo.fif's epoched eeg
200 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0004_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0005_epo.fif's epoched eeg
190 trials were retained for participant data/sat1/unpreprocessed/unprocessed_0005_epo.fif
Processing participant data/sat1/unpreprocessed/unprocessed_0006_epo.fif's epoched eeg
200 trials were retaine

#### Use information from stage_data to split unprocessed data

##### 500Hz

In [7]:
data_path = Path("../data/sat1/stage_data.nc")
merge_dataset = xr.load_dataset(Path("../data/sat1/data_unprocessed_500hz.nc"))
output_data = add_stage_dimension(data_path, merge_dataset)

Finding stage changes
Combining segments


In [None]:
output_path = Path("../data/sat1/split_stage_data_unprocessed_500hz.nc")
output_data.to_netcdf(output_path)

##### 100Hz

In [4]:
data_path = Path("../data/sat1/stage_data.nc")
merge_dataset = xr.load_dataset(Path("../data/sat1/data_unprocessed_100hz.nc"))
output_data = add_stage_dimension(data_path, merge_dataset)

Finding stage changes


Combining segments


In [None]:
output_path = Path("../data/sat1/split_stage_data_unprocessed_100hz.nc")
output_data.to_netcdf(output_path)

### Part 2: Experiment

In [1]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import xarray as xr
from hmpai.utilities import print_results
from hmpai.pytorch.models import *
from hmpai.pytorch.training import k_fold_cross_validate
from hmpai.normalization import *

2023-10-22 12:42:41.782883: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


env: TF_FORCE_GPU_ALLOW_GROWTH=true
env: TF_GPU_ALLOCATOR=cuda_malloc_async


In [2]:
logs_path = Path("../logs/exp_preprocessing/")
n_folds = 25

##### 2a: Processed 100Hz (control)

In [3]:
data_path = Path("../data/sat1/split_stage_data.nc")
data = xr.load_dataset(data_path)

In [None]:
model = SAT1Base
model_kwargs = {
    "n_channels": len(data.channels),
    "n_samples": len(data.samples),
    "n_classes": len(data.labels),
}
train_kwargs = {
    "logs_path": logs_path,
    "additional_info": {"preprocessing": "default_100hz"},
    "additional_name": f"preprocessing-default_100hz",
}
results = k_fold_cross_validate(
    model, model_kwargs, data, n_folds, train_kwargs=train_kwargs
)
print_results(results)

##### 2b: Unprocessed 100Hz

In [3]:
data_path = Path("../data/sat1/split_stage_data_unprocessed_100hz.nc")
data = xr.load_dataset(data_path)

In [None]:
model = SAT1Base
model_kwargs = {
    "n_channels": len(data.channels),
    "n_samples": len(data.samples),
    "n_classes": len(data.labels),
}
train_kwargs = {
    "logs_path": logs_path,
    "additional_info": {"preprocessing": "unprocessed_100hz"},
    "additional_name": f"preprocessing-unprocessed_100hz",
}
results = k_fold_cross_validate(
    model, model_kwargs, data, n_folds, train_kwargs=train_kwargs
)
print_results(results)

##### 2c: Unprocessed 500Hz

In [None]:
data_path = Path("../data/sat1/split_stage_data_unprocessed_500hz.nc")
data = xr.load_dataset(data_path)

In [None]:
model = SAT1Deep
model_kwargs = {
    "n_channels": len(data.channels),
    "n_samples": len(data.samples),
    "n_classes": len(data.labels),
}
train_kwargs = {
    "logs_path": logs_path,
    "additional_info": {"preprocessing": "unprocessed_500hz"},
    "additional_name": f"preprocessing-unprocessed_500hz",
}
results = k_fold_cross_validate(
    model, model_kwargs, data, n_folds, train_kwargs=train_kwargs
)
print_results(results)

In [13]:
# View results in Tensorboard
! tensorboard --logdir logs/exp_preprocessing/


NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.13.0 at http://localhost:6006/ (Press CTRL+C to quit)
^C


: 