In [10]:
import matplotlib

matplotlib.use('TkAgg')
from Recordings import AEPFeedbackRecording, Recordings, Recording
import pandas as pd
import numpy as np
import os

The following cell loads all the raw data from the XDF files across different sessions and subjects. Data validation warnings are all manually checked and are fixed at the point the data is loaded.

In [2]:
# Load the data
basepath = './Recordings'
recordings = Recordings(basepath)

Reading recording: ./Recordings/1/9/1/9_aep_2024-06-10_14-32-42_1.xdf
Reading recording: ./Recordings/1/9/1/9_aep_feedback_2024-06-10_14-40-46_1.xdf
Reading recording: ./Recordings/1/9/10/9_resting_closed_2024-06-10_14-29-49_10.xdf
Reading recording: ./Recordings/1/9/10/9_resting_open_2024-06-10_14-26-10_10.xdf
Reading recording: ./Recordings/1/9/2/9_aep_feedback_2024-06-10_14-56-51_2.xdf
Reading recording: ./Recordings/1/9/2/9_aep_2024-06-10_14-50-06_2.xdf
Reading recording: ./Recordings/1/7/1/7_aep_2024-03-28_14-34-11_1.xdf
Reading recording: ./Recordings/1/7/1/7_aep_feedback_2024-03-28_14-41-11_1.xdf
Reading recording: ./Recordings/1/7/10/7_resting_closed_2024-05-23_16-27-08_10.xdf
Reading recording: ./Recordings/1/7/10/7_resting_open_2024-05-23_16-22-06_10.xdf
Reading recording: ./Recordings/1/7/2/7_aep_feedback_2024-03-28_14-55-34_2.xdf
Reading recording: ./Recordings/1/7/2/7_aep_2024-03-28_14-49-32_2.xdf
Reading recording: ./Recordings/1/6/1/6_aep_feedback_2024-03-27_15-09-15_1.x

First we export the raw data in CSV files. All the XDF files are simply converted to tables, where each column is a channel from Unicorn Hybrid Black EEG device, with an additional column for event markers.

In [3]:
# We first create Pandas table for each recording file
output_tables = []
for recording in recordings.recordings:
    data = {
        "Time": recording.eeg_time,
        "EEG_Fz": recording.eeg_data[:, 0],
        "EEG_C3": recording.eeg_data[:, 1],
        "EEG_Cz": recording.eeg_data[:, 2],
        "EEG_C4": recording.eeg_data[:, 3],
        "EEG_Pz": recording.eeg_data[:, 4],
        "EEG_PO7": recording.eeg_data[:, 5],
        "EEG_Oz": recording.eeg_data[:, 6],
        "EEG_PO8": recording.eeg_data[:, 7],
        "Accelerometer_X": recording.accelerometer_data[:, 0],
        "Accelerometer_Y": recording.accelerometer_data[:, 1],
        "Accelerometer_Z": recording.accelerometer_data[:, 2],
        "Gyroscope_X": recording.gyroscope_data[:, 0],
        "Gyroscope_Y": recording.gyroscope_data[:, 1],
        "Gyroscope_Z": recording.gyroscope_data[:, 2],
        "Battery_Level": recording.battery_level_data,
        "Counter": recording.counter_data,
        "Validation": recording.validation_indicator_data,
    }
    df = pd.DataFrame(data=data)
    indices_array = [(np.abs(df['Time'] - value)).argmin() for value in recording.marker_time]
    # Avoid collision of events
    for i, val in enumerate(indices_array):
        if i > 0 and indices_array[i - 1] == val:
            indices_array[i] += 1
    for index, event in zip(indices_array, recording.marker_data):
        df.at[index, 'Event'] = event
    output_tables.append(df)

## Raw Files Description

A folder named "Raw_Files" contains unsegmented, unfiltered and unprocessed data in CSV format. Each folder in "Raw_Files" that corresponds to one of the four experiment types will contain recordings from each subject.

Output folder structure:
- Raw_Files/
    - AEP/
    - AEP_Feedback/
    - Resting_Open/
    - Resting_Closed/

Recording files inside these folders are be named as such: `{Subject ID}_{Experiment Type}_{Session ID}.csv`. 

For example, resting state recording (with eyes open) from the first subject during the second session will have the name: `1_resting-open_2.csv`.

Subjects are numbered from 1 through 10 and possible experiment types are aep/aep-feedback/resting-open/resting-closed. Session IDs are either 1 or 2 for 'aep/aep-feedback' experiments, while all resting state experiments have a single session.

Some recordings will have an additional '_UNFINISHED' suffix at the end, which means that the recording was interrupted during during that session (for example, due to EEG device battery getting too low). All '_UNFINISHED' recordings were re-run from start to finish, which means there are at least two full sessions for 'aep/aep-feedback' experiments. However, epochs from unfinished recordings do not have any defects and can be freely used in the subsequent steps of processing.  

In [15]:
for i, recording in enumerate(recordings.recordings):
    # Find out if the current recording is unfinished
    is_current_unfinished = False
    if recording.experiment_id == 'aep' or recording.experiment_id == 'aep_feedback':
        similar_recordings = [r for r in recordings.recordings if r.subject_id == recording.subject_id and r.experiment_id == recording.experiment_id and r.session == recording.session]
        if len(similar_recordings) > 1:  # There will always be at least 1 similar recording (self)
            similar_recordings = sorted(similar_recordings, key=lambda x: x.recording_name)  # This will sort by name which also sorts by time recorded in this case
            # Checks if current recording was recorded before the other (similar) one, in which case the current must be labeled as unfinished
            if recording.recording_name == similar_recordings[0].recording_name:
                print(f'Recording named {recording.recording_name} is unfinished, while {similar_recordings[1].recording_name} is not.')
                is_current_unfinished = True
    # Create an output directory and name the output file
    if recording.experiment_id == 'aep':
        subfolder = "AEP"
    elif recording.experiment_id == 'aep_feedback':
        subfolder = "AEP_Feedback"
    elif recording.experiment_id == 'resting_open':
        subfolder = "Resting_Open"
    else:
        subfolder = "Resting_Closed"
    output_dir = f'./Raw_Files/{subfolder}/'
    output_filename = f"{recording.subject_id}_{recording.experiment_id.replace('_', '-')}_{recording.session}"
    if is_current_unfinished:
        output_filename += '_UNFINISHED'
    output_path = f'{output_dir}{output_filename}.csv'
    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    # Export the recording file
    output_tables[i].to_csv(output_path)

Recording named 10_aep_2024-07-07_12-58-52_1.xdf is unfinished, while 10_aep_2024-07-07_13-30-26_1.xdf is not.
Recording named 2_aep_feedback_2024-03-18_15-42-22_1.xdf is unfinished, while 2_aep_feedback_2024-03-27_13-00-43_1.xdf is not.


## File Structure

Each CSV file recording has 19 columns. Each column corresponds to an output channel acquired from the Unicorn Hybrid Black EEG device. There are different types of channels in the raw recording files: 1 time channel, 8 EEG channels, 3 accelerometer channels, 3 gyroscope channels, 1 counter channel, 1 validation channel, and 1 event channel. 

Specifically, all the 19 channels are: Time, EEG_Fz, EEG_C3, EEG_Cz, EEG_C4, EEG_Pz, EEG_PO7, EEG_Oz, EEG_PO8, Accelerometer_X, Accelerometer_Y, Accelerometer_Z, Gyroscope_X, Gyroscope_Y, Gyroscope_Z, Battery_Level, Counter, Validation, Event.

1. *Time* column indicates number of seconds passed since the start of the session.
2. *EEG_\** columns indicate unfiltered voltage values (in microvolts) for 8 EEG channels specified in Unicord Hybrid Black EEG device manual. For each EEG channel an electrode position is indicated (e.g. Cz) according to international 10-20 system.
3. *Accelerometer_\** columns indicate acceleration (±8 g) of the Unicorn Hybrid Black EEG device in X/Y/Z directions.
4. *Gyroscope_\** columns indicate the angular rotation (±1000 °/s) of Unicorn Hybrid Black EEG device in X/Y/Z directions.
5. *Battery_Level* column value ranges from 0 to 100 and indicates the remaining battery level.
6. *Counter* column tracks the sample order in which the values were received from Unicorn Hybrid Black to the host PC.
7. *Validation* column is a validation indicator for the samples received from Unicorn Hybrid Black device.
8. *Event* column indicates if any event was registered during the recording. Event descriptions for different experiment types can be found below.

More information about channels can be found in [Unicorn Hybrid Black User Manual](https://github.com/unicorn-bi/Unicorn-Suite-Hybrid-Black-User-Manual).

The data was streamed using Unicorn LSL interface, which by itself outputs 17 raw channels (except "Time" and "Events" channels). Given channels were exported into CSV files without any modification. Event markers were streamed (also using LSL protocol) by NeuroPype™ Experiment Recorder (ER) software. Both of these streams were exported in XDF file format and then automatically synced using PyXDF library. As a result, "Time" and "Events" columns were added to the CSV files.

## Segmented Data