## Initial summary of event files

**Dataset**: BCIT Baseling Driving (in process)

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.
Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| bids_skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip_columns  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|

In [1]:
from hed.tools import BidsTsvDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'F:/ARLBidsStart/BaselineDrivingWorking'
exclude_dirs = ['sourcedata', 'stimuli', 'code']
entities = ('sub', 'ses', 'run')
bids_skip_columns = ['onset']
eeg_skip_columns = ['latency', 'urevent', 'usertags']

# Construct the event file dictionaries for the BIDS and for EEG.event files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
bids_dict = BidsTsvDictionary(files_bids, entities=entities)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp", exclude_dirs=exclude_dirs)
eeg_dict = BidsTsvDictionary(files_eeg, entities=entities)

In [2]:
print(f"Summarizing {bids_root_path}...")
bids_dict.print_files(title="\nBIDS style event files")
eeg_dict.print_files(title="\nEEG.event style event files")

Summarizing F:/ARLBidsStart/BaselineDrivingWorking...

BIDS style event files (131 files)
sub-01_ses-01_run-1: sub-01_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-02_ses-01_run-1: sub-02_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-03_ses-01_run-1: sub-03_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-04_ses-01_run-1: sub-04_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-05_ses-01_run-1: sub-05_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-06_ses-01_run-1: sub-06_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-07_ses-01_run-1: sub-07_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-08_ses-01_run-1: sub-08_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-09_ses-01_run-1: sub-09_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-10_ses-01_run-1: sub-10_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-100_ses-01_run-1: sub-100_ses-01_task-DriveWithSpeedChange_run-1_events.tsv
sub-101_ses-01_run-1: sub-101_ses-01_task-DriveWit

In [3]:
key_diff = bids_dict.key_diffs(eeg_dict)
print(f"Key differences between EEG and BIDS events: {str(key_diff)}")

Key differences between EEG and BIDS events: []


In [4]:
print(f"\nBIDS style event file columns:")
for key, file, rowcount, columns in bids_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")

print(f"\nEEG.event style event file columns:")
for key, file, rowcount, columns in eeg_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")


BIDS style event file columns:
sub-01_ses-01_run-1 [1282 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-1 [1227 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-1 [1189 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-1 [1191 events]: ['onset', 'duration', 'event_code']
sub-05_ses-01_run-1 [1106 events]: ['onset', 'duration', 'event_code']
sub-06_ses-01_run-1 [1134 events]: ['onset', 'duration', 'event_code']
sub-07_ses-01_run-1 [971 events]: ['onset', 'duration', 'event_code']
sub-08_ses-01_run-1 [1127 events]: ['onset', 'duration', 'event_code']
sub-09_ses-01_run-1 [1400 events]: ['onset', 'duration', 'event_code']
sub-10_ses-01_run-1 [1367 events]: ['onset', 'duration', 'event_code']
sub-100_ses-01_run-1 [1611 events]: ['onset', 'duration', 'event_code']
sub-101_ses-01_run-1 [1516 events]: ['onset', 'duration', 'event_code']
sub-102_ses-01_run-1 [1532 events]: ['onset', 'duration', 'event_code']
sub-102_ses-02_run-1 [1550 events]: ['onset

In [5]:
count_diffs = bids_dict.count_diffs(eeg_dict)
if count_diffs:
    print("The number of BIDS events and EEG.event events differ for the following:")
    for item in count_diffs:
        print(f"{item[0]}: {item[1]} BIDS events and {item[2]} EEG.events")
else:
    print("The BIDS event files and EEG.event structures have the same number of events")

The number of BIDS events and EEG.event events differ for the following:
sub-77_ses-01_run-1: 730 BIDS events and 725 EEG.events
sub-77_ses-02_run-1: 963 BIDS events and 497 EEG.events
sub-83_ses-01_run-1: 1123 BIDS events and 316 EEG.events


In [6]:
from hed.tools import BidsTsvSummary

bids_sum_all, bids_sum =  BidsTsvSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
bids_sum_all.print('\nBIDS events summary')

eeg_sum_all, eeg_sum =  BidsTsvSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
eeg_sum_all.print('\nEEG.event events summary')


BIDS events summary
  Categorical columns (2):
    duration (1 distinct values):
      n/a: 168288
    event_code (28 distinct values):
      1111: 15771
      1112: 15771
      1121: 15690
      1122: 15690
      1211: 542
      1212: 538
      2611: 109
      2612: 109
      2621: 1755
      2622: 1754
      3111: 125
      3112: 127
      3200: 10192
      3310: 131
      4200: 4
      4210: 14075
      4220: 5361
      4230: 8838
      4311: 31100
      4312: 30562
      4411: 5
      4421: 5
      5221: 1
      5222: 3
      5231: 1
      5232: 7
      5241: 9
      5242: 13
  Value columns (0):

EEG.event events summary
  Categorical columns (1):
    type (28 distinct values):
      1111: 15641
      1112: 15641
      1121: 15585
      1122: 15585
      1211: 537
      1212: 534
      2611: 109
      2612: 109
      2621: 1755
      2622: 1754
      3111: 128
      3112: 125
      3200: 10119
      3310: 131
      4200: 4
      4210: 13924
      4220: 5286
      4230: 8760
     