# EEG Dataset Pre-Processing for State-based Paradigm Analysis

Author: Konstantinos Patlatzoglou

A simple python pipeline for:

1) pre-processing EEG data using the MNE library

2) extracting a dataset suitable for training/testing machine learning models


## Packages required:
* numpy 
* scipy 
* natsort 
* matplotlib 
* pandas
* scikit-learn
* mne == 0.19

In [None]:
!pip install numpy
!pip install scipy
!pip install natsort
!pip install matplotlib
!pip install pandas
!pip install scikit-learn
!pip install mne==0.19

In [None]:
import sys
import numpy as np
import json
from pathlib import Path

sys.path.append(str(Path.cwd().parent / 'DL-EEG')) # Add datasetUtils package

from datasetUtils import EEG
from datasetUtils import utils

## Dataset Description
This pipeline has been developed to work with EEG datasets incorporating state-based experimental paradigms. In this example, we will use an open anesthesia dataset, which includes resting-state recordings during two anesthetic states: *Wakefulness* and *Sedation* (The full dataset can be downloaded [here](https://www.repository.cam.ac.uk/handle/1810/252736))

Firstly, we need to create and load a '*dataset_info.json*' file that describes the main properties of the dataset:

**Dataset Info**:
* *Subjects* (list of subject names)
* *States* (list of state names per subject)
* *EEG Files* (list of EEG files per subject and state)
* *Export* (dict of export information) - (Optional)
* *Montage File* (str of montage pathfile) - (Optional)

In [None]:
dataset_path = Path.cwd().parent / 'data' / 'Anesthesia Dataset'

dataset_info = json.load(open(str(dataset_path / 'dataset_info.json')))

utils.check_dataset_consistency(dataset_info)

In this example, *dataset_info.json* has the following structure:

In [None]:
dataset_info

*'Export'* here includes the drug levels (ng/ml) of propofol per subject and state, which can be used as potential training/testing targets.

In [None]:
subjects = dataset_info['Subjects']
EEG_files = dataset_info['EEG Files']
states = dataset_info['States']
export = dataset_info['Export']
montage_file = Path(dataset_info['Montage File'])

## Dataset Export Parameters

We can select the parameters we want to extract from our dataset (e.g. *subjects*, *states*, *export information* (Optional), and *Other* (Optional)  ), as well as the parameters of the EEG pre-processing pipeline (e.g. *channel selection*, *filtering*, *epoching window*, etc.). 

* *EEG_DATASET*:
 * *Study* - study name
 * *Subjects* - list of subjects
 * *States* - list of states
 * *Export* - list of export keys
 * *Other* - dict (Optional)
* *EEG_PARAMETERS*:
 * *Channels* - name of channel selection
 * *Sfreq* - sampling frequency (Hz)
 * *Epoch Size* - epoch window size (sec)
 * *Reference* - reference montage name
 * *Topomap* - topomap representation (boolean) (Optional)
 * *...* (Optional)

An export directory is set to store these parameters (*EEG_DATASET_PARAMETERS.json*), the preprocessed EEG data, and other subject information.


In [None]:
EXPORT_NAME = 'Anesthesia Dataset Export' # Name of Export Directory


EEG_DATASET = {'Study': 'Cambridge Anesthesia',
               'Subjects': ['S1', 'S2', 'S3'],
               'States': ['Wakefulness', 'Sedation'],
               'Export': ['Drug Levels'],  # list (export keys) (Optional)
               'Other':  # Dict (Optional)
                   {'Drug': 'Propofol'}
               }

EEG_PARAMETERS = {'Montage': 'GSN-HydroCel-129',  # 'GSN-HydroCel-129' (mne argument)
                  'Channels': '10-20',  # 'All', '10-20'
                  'Crop Time': None,  # [tmin, tmax] (sec)
                  'Filtering': [0.5, 40],  # [Low-cut freq, High-cut freq]
                  'Notch Freq': None,  # 50/60 Hz (Notch filter frequency, including harmonics)
                  'Sfreq': 100,  # Hz
                  'Epoch Size': 1,  # (sec)
                  'Epoch Overlap': 0,  # % overlap (0%-99%)
                  'Epoch Baseline Correction': False,
                  'Epoch Peak-to-Peak Threshold': 800e-6,  # peak-to-peak amplitude threshold (V) or None
                  'Interpolate Bad Channels': True, # Bad if signal is flat, or If 20% of the signal exceeds p-t-p threshold
                  'Epoch Rejection': True,  # Reject epoch if p-t-p threshold exceeds in more than 20% of the channels
                  'Reference': 'Average',  # 'Default' (Cz), 'Average', 'Cz', 'Frontal' (Fp1-F3-Fz-F4-Fp2)
                  }

In [None]:
# Create 'Export Data' Directory
export_path = dataset_path.parent / EXPORT_NAME
utils.create_directory(export_path)

# Create JSON file with EEG Dataset EXPORT PARAMETERS
EEG_DATASET_PARAMETERS = {'EEG_DATASET': EEG_DATASET, 'EEG_PARAMETERS': EEG_PARAMETERS}
json.dump(EEG_DATASET_PARAMETERS, open(str(export_path / 'EEG_DATASET_PARAMETERS.json'), 'w'), indent=4)

## Dataset Pre-Processing Pipeline

The dataset pre-processing pipeline is given below. For each selected subject and state, we acquire a raw and epochs MNE object, based on which we extract the epoched EEG data and other subject information (*'States'*, *'Channels'*, *'Export'*).

Finally, we export the epoched data (*subject_eeg_data.npy*) and subject info (*subject_info.json*) in the selected directory.

In [None]:
# For each Selected Subject
for i, subject in enumerate(subjects):
    if subject not in EEG_DATASET['Subjects']: # Filter Subject Selection
        continue

    # Subject Data and Info
    eeg_data = np.empty((len(EEG_DATASET['States']),), dtype=object)
    selected_states = []
    channels = []
    selected_export = None
    if 'Export' in EEG_DATASET: # Check for Export
        selected_export = {key: [] for key in EEG_DATASET['Export']}


    # For each Selected State
    for s, state in enumerate(states[i]):
        if state not in EEG_DATASET['States']: # Filter State Selection
            continue

        # Get EEG File and Montage File (Optional)
        EEG_file_path = dataset_path / EEG_files[i][s]

        if montage_file is not None:
            montage_file = dataset_path / montage_file

        # MNE RAW Object
        original_raw, montage = EEG.load_EEG_data(EEG_file_path, EEG_PARAMETERS['Montage'],
                                                  montage_file=montage_file)

        # MNE Preprocessed RAW and EPOCHS
        preprocessed_raw, epochs = EEG.process_EEG_data(original_raw.copy(), EEG_PARAMETERS)

        # Numpy Array
        epoched_data = epochs.get_data()

        eeg_data[s] = epoched_data
        selected_states.append(state)
        channels = epochs.ch_names
        if 'Export' in dataset_info and 'Export' in EEG_DATASET: # Check for Export
            for key in export.keys():
                if key in EEG_DATASET['Export']:
                    selected_export[key].append(export[key][i][s])

    # Subject Info
    subject_info = {'States': selected_states,
                    'Channels': channels,
                    'Export': selected_export
                    }

    # Export EEG Data as numpy array
    np.save(str(export_path / (subject + '_eeg_data.npy')), eeg_data, allow_pickle=True)
    # Export Subject Info as a json file
    json.dump(subject_info, open(str(export_path / (subject + '_info.json')), 'w'), indent=4)