## Initial summary of event files

**Dataset**: BCIT Mind Wandering (in process)

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.
Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| bids_skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip_columns  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|

In [1]:
from hed.tools import BidsTsvDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'F:/ARLBidsStart/MindWanderingWorking'
exclude_dirs = ['sourcedata', 'stimuli', 'code']
entities = ('sub', 'ses', 'run')
bids_skip_columns = ['onset']
eeg_skip_columns = ['latency', 'urevent', 'usertags']

# Construct the event file dictionaries for the BIDS and for EEG.event files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
bids_dict = BidsTsvDictionary(files_bids, entities=entities)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp", exclude_dirs=exclude_dirs)
eeg_dict = BidsTsvDictionary(files_eeg, entities=entities)

In [2]:
print(f"Summarizing {bids_root_path}...")
bids_dict.print_files(title="\nBIDS style event files")
eeg_dict.print_files(title="\nEEG.event style event files")

Summarizing F:/ARLBidsStart/MindWanderingWorking...

BIDS style event files (60 files)
sub-01_ses-01_run-1: sub-01_ses-01_task-DriveWithTaskAudio_run-1_events.tsv
sub-01_ses-01_run-2: sub-01_ses-01_task-DriveWithTaskAudio_run-2_events.tsv
sub-01_ses-01_run-3: sub-01_ses-01_task-DriveWithTaskAudio_run-3_events.tsv
sub-02_ses-01_run-1: sub-02_ses-01_task-DriveWithTaskAudio_run-1_events.tsv
sub-02_ses-01_run-2: sub-02_ses-01_task-DriveWithTaskAudio_run-2_events.tsv
sub-02_ses-01_run-3: sub-02_ses-01_task-DriveWithTaskAudio_run-3_events.tsv
sub-03_ses-01_run-1: sub-03_ses-01_task-DriveWithTaskAudio_run-1_events.tsv
sub-03_ses-01_run-2: sub-03_ses-01_task-DriveWithTaskAudio_run-2_events.tsv
sub-03_ses-01_run-3: sub-03_ses-01_task-DriveWithTaskAudio_run-3_events.tsv
sub-04_ses-01_run-1: sub-04_ses-01_task-DriveWithTaskAudio_run-1_events.tsv
sub-04_ses-01_run-2: sub-04_ses-01_task-DriveWithTaskAudio_run-2_events.tsv
sub-04_ses-01_run-3: sub-04_ses-01_task-DriveWithTaskAudio_run-3_events.tsv
s

In [3]:
key_diff = bids_dict.key_diffs(eeg_dict)
print(f"Key differences between EEG and BIDS events: {str(key_diff)}")

Key differences between EEG and BIDS events: []


In [4]:
print(f"\nBIDS style event file columns:")
for key, file, rowcount, columns in bids_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")

print(f"\nEEG.event style event file columns:")
for key, file, rowcount, columns in eeg_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")


BIDS style event file columns:
sub-01_ses-01_run-1 [1420 events]: ['onset', 'duration', 'event_code']
sub-01_ses-01_run-2 [1420 events]: ['onset', 'duration', 'event_code']
sub-01_ses-01_run-3 [1420 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-1 [1466 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-2 [1466 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-3 [1466 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-1 [1383 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-2 [1383 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-3 [1383 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-1 [1308 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-2 [1308 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-3 [1308 events]: ['onset', 'duration', 'event_code']
sub-05_ses-01_run-1 [1355 events]: ['onset', 'duration', 'event_code']
sub-05_ses-01_run-2 [1355 events]: ['onset', 

In [5]:
count_diffs = bids_dict.count_diffs(eeg_dict)
if count_diffs:
    print("The number of BIDS events and EEG.event events differ for the following:")
    for item in count_diffs:
        print(f"{item[0]}: {item[1]} BIDS events and {item[2]} EEG.events")
else:
    print("The BIDS event files and EEG.event structures have the same number of events")

The number of BIDS events and EEG.event events differ for the following:
sub-01_ses-01_run-1: 1420 BIDS events and 1458 EEG.events
sub-01_ses-01_run-2: 1420 BIDS events and 1499 EEG.events
sub-02_ses-01_run-1: 1466 BIDS events and 1472 EEG.events
sub-02_ses-01_run-2: 1466 BIDS events and 1438 EEG.events
sub-03_ses-01_run-1: 1383 BIDS events and 1385 EEG.events
sub-03_ses-01_run-2: 1383 BIDS events and 1345 EEG.events
sub-04_ses-01_run-1: 1308 BIDS events and 1243 EEG.events
sub-04_ses-01_run-2: 1308 BIDS events and 1305 EEG.events
sub-05_ses-01_run-1: 1355 BIDS events and 1491 EEG.events
sub-05_ses-01_run-2: 1355 BIDS events and 1345 EEG.events
sub-06_ses-01_run-1: 1364 BIDS events and 1360 EEG.events
sub-06_ses-01_run-2: 1364 BIDS events and 1362 EEG.events
sub-07_ses-01_run-1: 1384 BIDS events and 1469 EEG.events
sub-07_ses-01_run-2: 1384 BIDS events and 1386 EEG.events
sub-08_ses-01_run-1: 930 BIDS events and 1557 EEG.events
sub-09_ses-01_run-1: 1387 BIDS events and 1319 EEG.events


In [6]:
from hed.tools import BidsTsvSummary

bids_sum_all, bids_sum =  BidsTsvSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
bids_sum_all.print('\nBIDS events summary')

eeg_sum_all, eeg_sum =  BidsTsvSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
eeg_sum_all.print('\nEEG.event events summary')


BIDS events summary
  Categorical columns (2):
    duration (1 distinct values):
      n/a: 80792
    event_code (43 distinct values):
      1111: 4836
      1112: 4836
      1121: 5073
      1122: 5073
      1211: 63
      1212: 63
      1331: 23
      1332: 23
      1341: 15
      1342: 15
      1351: 22
      1352: 22
      2221: 673
      2222: 673
      2241: 621
      2242: 621
      2251: 606
      2252: 606
      2621: 349
      2622: 349
      2631: 295
      2632: 295
      2811: 1138
      2812: 1138
      3111: 60
      3112: 60
      3200: 2547
      3310: 60
      4210: 13428
      4220: 6207
      4230: 7251
      4311: 9640
      4312: 8584
      4411: 1852
      4421: 117
      4611: 6
      4612: 6
      4621: 1182
      4622: 1182
      4710: 33
      4720: 65
      4730: 1064
      4740: 20
  Value columns (0):

EEG.event events summary
  Categorical columns (1):
    type (43 distinct values):
      1111: 4902
      1112: 4902
      1121: 5093
      1122: 5093
    