# Model Testing for State-based EEG Decoding Tasks

Author: Konstantinos Patlatzoglou

A simple python pipeline for:
1) Loading a deep learning model for EEG decoding (Tensorflow/Keras support)

2) Testing a model based on several selected EEG dataset parameters (multi-study integration support)


## Packages required:
* numpy 
* pathlib 
* natsort 
* pandas 
* h5py
* scikit-learn
* tensorflow>=2.6

In [None]:
!pip install numpy
!pip install pathlib
!pip install natsort
!pip install pandas
!pip install h5py
!pip install scikit-learn
!pip install matplotlib
!pip install tensorflow==2.10

In [None]:
import os, sys
from pathlib import Path

import numpy as np
import json

sys.path.append(str(Path.cwd().parent / 'DL-EEG')) # Add utils package

from utils import utils
from utils import EEG
from utils import NN

# Model Testing Parameters

This pipeline has been developed to be compatible with the EEG dataset specifications found in *'export_dataset.py'*. 

Briefly, a number of EEG datasets can be selected based on the *EEG_DATASET* parameters and the *'EEG_DATASET_PARAMETERS.json'* specification file of the corresponding dataset directory (+ *subject_eeg_data.npy*, *subject_info.json*), which includes the available *subjects*, *states*, and *export* variables that can be incorporated as training/testing targets.

**Dataset Directory:**

*EEG_DATASET_PARAMETERS.json*:
* *EEG_DATASET*:
 * *Study* - study name
 * *Subjects* - list of subjects
 * *States* - list of states
 * *Export* - list of export keys
 * *Other* - dict (Optional)
* *EEG_PARAMETERS*:
 * *Channels* - name of channel selection
 * *Sfreq* - sampling frequency (Hz)
 * *Epoch Size* - epoch window size (sec)
 * *Reference* - reference montage name
 * *Topomap* - topomap representation (boolean) (Optional)
 * *...* (Optional)

*subject_eeg_data.npy* - Epoched EEG Data (ndarray)

*subject_info.json*:
 * *States* - list of states
 * *Channels* - list of channel names
 * *Export* - dict of export values per key and state

In [None]:
MODEL_DIR_NAME = 'Model Training (Regression)'  # Name of Model Training Directory
RESULTS_DIR_NAME = 'Model Testing (Regression)'  # Name of Model Testing Directory


NO_OF_DATASETS = 1

EEG_DATASET_1 = {'Study': 'Cambridge Anesthesia',
                 'Subjects': ['S3'],
                 'States': ['Wakefulness', 'Sedation'],
                 'Targets': ['Drug Levels'], # '1-hot' (Classifcation), list [target1, target2, ] (Regression)
                 'Target Values': 'Read_from_info_file', # None, 'Read_from_info_file', list [[], [], ...]
                 }

SEPARATE_DATASET_RESULTS = False  # Export Results in Separate Directories for each Dataset

# Load Model

First, we load the model found in the given directory name (*MODEL_DIR_NAME*). This directory includes the specified model training parameters in the files *'EEG_DATASET_PARAMETERS.json'*, *'MODEL_PARAMETERS.json'*, *'MODEL_INFO.json'* and *'CLASSES.json'* (Optional).

**Model Training Directory:**

*EEG_DATASET_PARAMETERS.json*:
* *EEG_DATASET*:
 * *Study* - study name
 * *Subjects* - list of subjects
 * *States* - list of states
 * *Targets* - str ('1-hot') or list of targets
 * *Target Values* - str ('Read_from_info_file'), list of targets, or None
* *EEG_PARAMETERS*

*MODEL_PARAMETERS.json*:
 * *Model* - model name
 * *EEG Normalization* - str of epoch normalization method
 * *Sample Weights* - training instance weights (boolean)
 * *Target Weights* - training target weights (boolean)
 * *Shuffle Samples* - (boolean)
 * *Optimizer* - optimizer name (tensorflow arg)
 * *Learning Rate* - learning rate (tensorflow arg)
 * *Loss* - name of loss function (tensorflow arg)
 * *Metrics* - list of metrics (tensorflow arg)
 * *Batch Size* - training batch size (tensorflow arg)
 * *Epochs* - number of training epochs (tensorflow arg)

*MODEL_INFO.json*:
* *Input shape* - EEG data input shape
* *Sfreq* - EEG sampling frequency (Hz)
* *Output shape* - target output shape
* *Targets* - str ('1-hot') or list of targets

In [None]:
current_path = Path.cwd()

model_path = current_path.parent / 'results' / MODEL_DIR_NAME

# Load MODEL_PARAMETERS and MODEL_INFO Structures
MODEL_PARAMETERS = json.load(open(str(model_path / 'MODEL_PARAMETERS.json')))
model_info = json.load(open(str(model_path / 'MODEL_INFO.json')))

CLASSES = None
if model_info['Targets'] == '1-hot':
    CLASSES = json.load(open(str(model_path / 'CLASSES.json')))

# Load Model
model = NN.get_model(str(model_path / 'Model.tf'))
model.summary()

model_memory_usage = NN.get_model_memory_usage(model, MODEL_PARAMETERS['Batch Size'])
print('Model Memory Usage:' + str(model_memory_usage) + ' GB')
print()

MODEL_CHECKPOINT = False
checkpoint_models = None
training_history = None

# Check for Model Checkpoint (Load Epoch Models + Training history)
if os.path.exists(model_path / 'Model Checkpoint'):
    MODEL_CHECKPOINT = True
    checkpoint_models = NN.get_checkpoint_models(model_path / 'Model Checkpoint')
    training_history = json.load(open(str(model_path / 'Model Checkpoint' / 'training_history.json')))

# Import EEG Datasets

We import the selected EEG datasets and concatenate them into a single *eeg_dataset* dictionary. Datasets are integrated after checking consistency across the EEG sampling frequency, channel names, reference montage, epoch size and targets. 

In [None]:
datasets_path = current_path.parent / 'data'

eeg_datasets = []
for i in range(NO_OF_DATASETS): # For each specified Dataset

    EEG_DATASET = globals()['EEG_DATASET_' + str(i+1)]
    dataset_path = EEG.find_dataset_path(datasets_path, EEG_DATASET)

    if dataset_path is None:
        print('No dataset with the given parameters found')
        exit(1)

    # Get the Dataset from the corresponding directory
    # Dict ('EEG Dataset Parameters', 'Subjects', 'EEG Data', 'Info')
    raw_eeg_dataset = EEG.get_dataset(dataset_path)

    # Select the Data specified in EEG_DATASET with respect to Subjects, States and Targets
    # Dict ('EEG Dataset Parameters', 'Subjects', 'Dataset ID', 'EEG Data', 'States', 'Channels', 'Targets',
    #       'Target Values')
    eeg_dataset = EEG.select_data_from_dataset(raw_eeg_dataset, EEG_DATASET, id=i)
    eeg_datasets.append(eeg_dataset)

# Check Datasets Consistency (Sampling Frequency, Reference Montage, Channel names, Epoch Size, Targets)
if not EEG.check_datasets_consistency(eeg_datasets):
    print('Datasets cannot be integrated')
    exit(2)

# Concatenate All Datasets
eeg_dataset = EEG.concatenate_datasets(eeg_datasets)
del eeg_datasets

*eeg_dataset*:
* *eeg_dataset_parameters* (list): List of EEG_DATASET_PARAMETERS for each dataset
* *subjects* (list): list of subjects (subject,)
* *dataset_id* (list): list of dataset ids (subject,)
* *eeg_data_list* (list): list of eeg data (subject,) (state,) (epoch, channel, sample)
* *states_list* (list): list of states (subject,) (state,)
* *channels_list* (list): list of channel names (subject,)
* *targets* (str or list): target name or list of target names
* *targets_list* (list or None): list of target values (subject,) (state,) (epoch, target)

In [None]:
eeg_dataset_parameters = eeg_dataset['EEG Dataset Parameters']
subjects = eeg_dataset['Subjects']
dataset_id = eeg_dataset['Dataset ID']
eeg_data_list = eeg_dataset['EEG Data']
states_list = eeg_dataset['States']
channels_list = eeg_dataset['Channels']
targets = eeg_dataset['Targets']
targets_list = eeg_dataset['Target Values']

del eeg_dataset

# EEG Pre-Processing

The EEG data need to be pre-processed before we feed them into the deep learning networks. This process includes the *normalization* of the epoch instances (as EEG data are typically in μV) and *reshaping* the dimensions, which is a requirement for DL-based EEG models (e.g. with respect to channel information)

In [None]:
# Check Topomap Model Compatibility
if MODEL_PARAMETERS['Model'] == 'cNN_topomap':
    if not EEG.check_EEG_model_compatibility(eeg_data_list, 'cNN_topomap'):
        print('EEG data shape is incompatible with selected Model')
        exit(3)

# Perform EEG Normalization (e.g. epoch-wise Standardization)
eeg_data_list = EEG.normalize_EEG_data_list(eeg_data_list, MODEL_PARAMETERS['EEG Normalization'])

# Reshape EEG data according to Model + Add Kernel Dimension
# e.g. If Toy Model (1D), reshape EEG data into (epoch, sample)
# e.g. If cNN_3D model (3D), reshape EEG data into (epoch, channel, channel, sample, 1)
eeg_data_list = EEG.reshape_EEG_data_list(eeg_data_list, channels_list, MODEL_PARAMETERS['Model'])

# EEG Data / Model Consistency Check

We can check the consistency of our EEG data with the selected model, with respect to *input shape*, *output shape*, *EEG sampling frequency* and *learning algorithm*.

In [None]:
# Model Info
input_shape = eeg_data_list[0][0].shape[1:]  # Model Input Shape
sfreq = eeg_dataset_parameters[0]['EEG_PARAMETERS']['Sfreq']  # EEG Sampling Frequency

if targets == '1-hot':  # Classification
    classification = True
else:  # Regression
    classification = False

# If Target Values
if targets_list is None:
    output_shape = tuple(model_info['Output shape'])
else:
    output_shape = targets_list[0][0].shape[1:]  # Model Output Shape

# Check Model Consistency
if input_shape != tuple(model_info['Input shape']) \
    or output_shape != tuple(model_info['Output shape']) \
    or sfreq != model_info['Sfreq'] \
    or targets != model_info['Targets']:

    print('EEG Data are inconsistent with selected model!')
    exit(4)

# Create Results Directory

A results directory is created to store the testing EEG dataset results, along with the specified training and testing parameters (*EEG_DATASET_PARAMETERS*, *MODEL_PARAMETERS*, *MODEL_INFO*, *CLASSES* (Optional))

In [None]:
result_path = current_path.parent / 'results' / RESULTS_DIR_NAME
utils.create_directory(result_path)

# For each Dataset, save EEG_DATASET_PARAMETERS (EEG_DATASET + EEG_PARAMETERS)
for i in range(NO_OF_DATASETS):

    EEG_DATASET = globals()['EEG_DATASET_' + str(i + 1)]
    if 'Other' in eeg_dataset_parameters[i]['EEG_DATASET']:  # Concatenate 'Other' in EEG_DATASET
        EEG_DATASET['Other'] = eeg_dataset_parameters[i]['EEG_DATASET']['Other']
    EEG_PARAMETERS = eeg_dataset_parameters[i]['EEG_PARAMETERS']

    EEG_DATASET_PARAMETERS = {'EEG_DATASET': EEG_DATASET, 'EEG_PARAMETERS':EEG_PARAMETERS}
    json.dump(EEG_DATASET_PARAMETERS, open(str(result_path / ('EEG_DATASET_PARAMETERS_' + str(i + 1) + '.json')),
                                           'w'), indent=4)

# Save MODEL_PARAMETERS
json.dump(MODEL_PARAMETERS, open(str(result_path / 'MODEL_PARAMETERS.json'), 'w'), indent=4)

# Save MODEL_INFO
json.dump(model_info, open(str(result_path / 'MODEL_INFO.json'), 'w'), indent=4)

# If Classification, Save CLASSSES
if model_info['Targets'] == '1-hot':
    json.dump(CLASSES, open(str(result_path / 'CLASSES.json'), 'w'), indent=4)

# Model Testing and Results Export

For each subject in our test dataset, we concatenate the states and create our input and output (if available) tensors (*Xte_t*, (*Yte_t*)). The model predictions are then calculated for the given subject.

If *'Target Values'* are provided in EEG_DATASET parameters, a subject score (*loss/metric*) is evaluated and stored in the results.

If *MODEL_CHECKPOINT* is True, a score is calculated for each training epoch of the respective model.

Finally, we can save the subject predictions and other subject info (*'States'*, *'Channels'*, *'Target Values'*, *'Score'*, *'History'*) in our results directory for further analysis (*'subject_predictions.npy'*, *'subject_info.json'*).

In [None]:
print('Subjects' + str(subjects))

for test_index, test_subject in enumerate(subjects):

    print('Test Subject: ' + test_subject)

    # Concatenate States
    Xte, Yte, _ = utils.concatenate_data(eeg_data_list, targets_list=targets_list)

    # Testing Data
    Xte_t, Yte_t = utils.get_tensor_maps(Xte, Y=Yte, classification=classification)

    # Model Prediction
    Ypred = model.predict(Xte_t, batch_size=MODEL_PARAMETERS['Batch Size'])
    if isinstance(Ypred, list): # If Multiple Regression Targets
        Ypred = utils.merge_targets(Ypred)

    score = None
    history = None

    # Model Evalution (Score)
    if Yte_t is not None:  # If Target Values
        score = model.evaluate(Xte_t, Yte_t, batch_size=MODEL_PARAMETERS['Batch Size'], verbose=0)
        score = dict(zip(model.metrics_names, score))

        # Model Checkpoint (History)
        if MODEL_CHECKPOINT:
            metrics_names = [('val_' + metric_name) for metric_name in model.metrics_names]
            history = []

            for epoch in range(MODEL_PARAMETERS['Epochs']): # For each Epoch
                epoch_score = checkpoint_models[epoch].evaluate(Xte_t, Yte_t,
                                                                batch_size=MODEL_PARAMETERS['Batch Size'],
                                                                verbose=0)
                history.append(epoch_score)
            history = [list(np.array(history)[:, metric]) for metric in range(len(model.metrics_names))]
            history = dict(zip(metrics_names, history))
            history.update(training_history) # Add Training History

    # --------------------- Export Subject Results ------------------------------
    # Select Subject Export Path
    if SEPARATE_DATASET_RESULTS:
        dataset_export_path = result_path / ('EEG DATASET ' + str(dataset_id[test_index]+1))
    else:
        dataset_export_path = result_path / 'EEG DATASET'

    if not dataset_export_path.exists():
        utils.create_directory(dataset_export_path)


    # Calculate per-state Predictions and Target Values
    state_epochs = [state.shape[0] for state in eeg_data_list[test_index]]
    predictions = utils.reshape_predictions(Ypred, state_epochs)
    target_values = None

    if targets_list is not None: # If Target Values
        target_values = [list(target[0]) for target in targets_list[test_index]]

    # Subject Info
    subject_info = {'States': list(states_list[test_index]),
                    'Channels': list(channels_list[test_index]),
                    'Target Values': target_values,
                    'Score': score,
                    'History': history
                    }

    # Export Predictions as numpy array
    np.save(str(dataset_export_path / (test_subject + '_predictions.npy')), predictions, allow_pickle=True)
    # Export Subject Info as a json file
    json.dump(subject_info, open(str(dataset_export_path / (test_subject + '_info.json')), 'w'), indent=4)

# Predictions Visualization and Other Metrics

We can visualize the predictions of the model and other available metrics (e.g. *loss/metric history*, *confusion matrix*) to get a better sense of our model's behavior and performance.

In [None]:
from utils import output

subject = 'S3'
predictions = predictions = np.load(str(dataset_export_path / (subject + '_predictions.npy')), allow_pickle=True)
subject_info = json.load(open(str(dataset_export_path / (subject + '_info.json'))))

# Plot Predictions
fig = output.plot_predictions(subject, predictions, subject_info['States'], model_info['Targets'], 
                              target_values=subject_info['Target Values'], CLASSES=CLASSES, 
                              states_colors=['red', 'blue'], class_colors=['red', 'blue'])

fig.savefig(str(dataset_export_path / (subject + '_predictions.png')))

# Plot History
if subject_info['History'] is not None:
    fig = output.plot_history(subject, subject_info['History'], loss=MODEL_PARAMETERS['Loss'], 
                              metrics=MODEL_PARAMETERS['Metrics'])
    
    fig.savefig(str(dataset_export_path / (subject + '_history.png')))
