## Initial summary of event files

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

Keys are specified by a `name_indices` tuple which consists of the
pieces of the file name to include. Here pieces are separated by the
underbar character.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple (0, 2) gives a key of `sub-001_task-target`,
while the tuple (0, 3) gives a key of `sub-001_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| name_indices  | Indices used to construct a unique keys representing event filenames. |
| bids_skip  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|

In [1]:
from hed.tools import TsvFileDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'G:\AuditoryOddball\AuditoryOddballWorking'
exclude_dirs = ['code', 'stimuli']
name_indices = (0, 2)
bids_skip_columns = ['onset', 'duration', 'sample', 'response_time']
eeg_skip_columns = ['latency', 'duration', 'sample', 'response_time']

# Construct the event file dictionaries for the BIDS and for EEG.event files
event_files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events",
                                 exclude_dirs=exclude_dirs)
bids_dict = TsvFileDictionary(event_files_bids, name_indices=name_indices)
event_files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp",
                                exclude_dirs=exclude_dirs)
eeg_dict = TsvFileDictionary(event_files_eeg, name_indices=name_indices)

In [2]:
print(f"Summarizing {bids_root_path}...")
bids_dict.print_files(title="\nBIDS style event files")
eeg_dict.print_files(title="\nEEG.event style event files")

Summarizing G:\AuditoryOddball\AuditoryOddballWorking...

BIDS style event files (39 files)
sub-001_run-1: sub-001_task-P300_run-1_events.tsv
sub-001_run-2: sub-001_task-P300_run-2_events.tsv
sub-001_run-3: sub-001_task-P300_run-3_events.tsv
sub-002_run-1: sub-002_task-P300_run-1_events.tsv
sub-002_run-2: sub-002_task-P300_run-2_events.tsv
sub-002_run-3: sub-002_task-P300_run-3_events.tsv
sub-003_run-1: sub-003_task-P300_run-1_events.tsv
sub-003_run-2: sub-003_task-P300_run-2_events.tsv
sub-003_run-3: sub-003_task-P300_run-3_events.tsv
sub-004_run-1: sub-004_task-P300_run-1_events.tsv
sub-004_run-2: sub-004_task-P300_run-2_events.tsv
sub-004_run-3: sub-004_task-P300_run-3_events.tsv
sub-005_run-1: sub-005_task-P300_run-1_events.tsv
sub-005_run-2: sub-005_task-P300_run-2_events.tsv
sub-005_run-3: sub-005_task-P300_run-3_events.tsv
sub-006_run-1: sub-006_task-P300_run-1_events.tsv
sub-006_run-2: sub-006_task-P300_run-2_events.tsv
sub-006_run-3: sub-006_task-P300_run-3_events.tsv
sub-007_

In [3]:
key_diff = bids_dict.key_diffs(eeg_dict)
print(f"Key differences between EEG and BIDS events: {str(key_diff)}")

Key differences between EEG and BIDS events: []


In [4]:
print(f"\nBIDS style event file columns:")
for key, file, rowcount, columns in bids_dict.iter_event_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")

print(f"\nEEG.event style event file columns:")
for key, file, rowcount, columns in eeg_dict.iter_event_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")


BIDS style event file columns:
sub-001_run-1 [863 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-001_run-2 [862 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-001_run-3 [860 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-1 [812 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-2 [807 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-3 [799 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_run-1 [861 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_run-2 [860 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_run-3 [861 events]: ['onset', 'duration', 'sample', 'tri

In [5]:
count_diffs = bids_dict.event_count_diffs(eeg_dict)
if count_diffs:
    print("The number of BIDS events and EEG.event events differ for the following:")
    for item in count_diffs:
        print(f"{item(1)}: {item(2)} BIDS events and {item(3)} EEG.events")
else:
    print("The BIDS event files and EEG.event structures have the same number of events")

The BIDS event files and EEG.event structures have the same number of events


In [6]:
from hed.tools import EventValueSummary
print('\nBIDS events summary:')
bids_dicts_all, bids_dicts =  EventValueSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
bids_dicts_all.print()

print('\nEEG.event events summary:')
eeg_dicts_all, eeg_dicts =  EventValueSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
eeg_dicts_all.print()


BIDS events summary:
Summary for column dictionary :
  Categorical columns (3):
    stim_file (1 distinct values):
      n/a: 32364
    trial_type (4 distinct values):
      STATUS: 2
      n/a: 139
      response: 3771
      stimulus: 28452
    value (9 distinct values):
      condition 5: 2
      ignore: 139
      noise: 4243
      noise_with_reponse: 21
      oddball: 586
      oddball_with_reponse: 3667
      response: 3771
      standard: 19855
      standard_with_reponse: 80
  Value columns (0):

EEG.event events summary:
Summary for column dictionary :
  Categorical columns (3):
    stim_file (1 distinct values):
      NaN: 32364
    trial_type (4 distinct values):
      STATUS: 2
      n/a: 139
      response: 3771
      stimulus: 28452
    type (9 distinct values):
      condition 5: 2
      ignore: 139
      noise: 4243
      noise_with_reponse: 21
      oddball: 586
      oddball_with_reponse: 3667
      response: 3771
      standard: 19855
      standard_with_reponse: 80
 