## Initial summary of event files

**Dataset**: BCIT Auditory Cueing (in process)

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.
Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| bids_skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip_columns  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|

In [1]:
from hed.tools import BidsTsvDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'F:/ARLBidsStart/AuditoryCueingWorking'
exclude_dirs = ['sourcedata', 'stimuli', 'code']
entities = ('sub', 'ses', 'run')
bids_skip_columns = ['onset']
eeg_skip_columns = ['latency', 'urevent', 'usertags']

# Construct the event file dictionaries for the BIDS and for EEG.event files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
bids_dict = BidsTsvDictionary(files_bids, entities=entities)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp", exclude_dirs=exclude_dirs)
eeg_dict = BidsTsvDictionary(files_eeg, entities=entities)

In [2]:
print(f"Summarizing {bids_root_path}...")
bids_dict.print_files(title="\nBIDS style event files")
eeg_dict.print_files(title="\nEEG.event style event files")

Summarizing F:/ARLBidsStart/AuditoryCueingWorking...

BIDS style event files (34 files)
sub-01_ses-01_run-1: sub-01_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-01_ses-01_run-2: sub-01_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-02_ses-01_run-1: sub-02_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-02_ses-01_run-2: sub-02_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-03_ses-01_run-1: sub-03_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-03_ses-01_run-2: sub-03_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-04_ses-01_run-1: sub-04_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-04_ses-01_run-2: sub-04_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-05_ses-01_run-1: sub-05_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-05_ses-01_run-2: sub-05_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-06_ses-01_run-1: sub-06_ses-01_task-DriveRandomSound_run-1_events.tsv
sub-06_ses-01_run-2: sub-06_ses-01_task-DriveRandomSound_run-2_events.tsv
sub-07_ses-01_run-1: sub

In [3]:
key_diff = bids_dict.key_diffs(eeg_dict)
print(f"Key differences between EEG and BIDS events: {str(key_diff)}")

Key differences between EEG and BIDS events: []


In [4]:
print(f"\nBIDS style event file columns:")
for key, file, rowcount, columns in bids_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")

print(f"\nEEG.event style event file columns:")
for key, file, rowcount, columns in eeg_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")


BIDS style event file columns:
sub-01_ses-01_run-1 [2967 events]: ['onset', 'duration', 'event_code']
sub-01_ses-01_run-2 [2967 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-1 [3111 events]: ['onset', 'duration', 'event_code']
sub-02_ses-01_run-2 [3111 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-1 [2956 events]: ['onset', 'duration', 'event_code']
sub-03_ses-01_run-2 [2956 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-1 [3158 events]: ['onset', 'duration', 'event_code']
sub-04_ses-01_run-2 [3158 events]: ['onset', 'duration', 'event_code']
sub-05_ses-01_run-1 [3272 events]: ['onset', 'duration', 'event_code']
sub-05_ses-01_run-2 [3272 events]: ['onset', 'duration', 'event_code']
sub-06_ses-01_run-1 [2963 events]: ['onset', 'duration', 'event_code']
sub-06_ses-01_run-2 [2963 events]: ['onset', 'duration', 'event_code']
sub-07_ses-01_run-1 [2989 events]: ['onset', 'duration', 'event_code']
sub-07_ses-01_run-2 [2989 events]: ['onset', 

In [5]:
count_diffs = bids_dict.count_diffs(eeg_dict)
if count_diffs:
    print("The number of BIDS events and EEG.event events differ for the following:")
    for item in count_diffs:
        print(f"{item[0]}: {item[1]} BIDS events and {item[2]} EEG.events")
else:
    print("The BIDS event files and EEG.event structures have the same number of events")

The number of BIDS events and EEG.event events differ for the following:
sub-01_ses-01_run-1: 2967 BIDS events and 2960 EEG.events
sub-02_ses-01_run-1: 3111 BIDS events and 2851 EEG.events
sub-03_ses-01_run-1: 2956 BIDS events and 3052 EEG.events
sub-04_ses-01_run-1: 3158 BIDS events and 3145 EEG.events
sub-05_ses-01_run-1: 3272 BIDS events and 3069 EEG.events
sub-06_ses-01_run-1: 2963 BIDS events and 2916 EEG.events
sub-07_ses-01_run-1: 2989 BIDS events and 3092 EEG.events
sub-08_ses-01_run-1: 2801 BIDS events and 2630 EEG.events
sub-09_ses-01_run-1: 2805 BIDS events and 2889 EEG.events
sub-10_ses-01_run-1: 2907 BIDS events and 2799 EEG.events
sub-11_ses-01_run-1: 3018 BIDS events and 3109 EEG.events
sub-12_ses-01_run-1: 2837 BIDS events and 2966 EEG.events
sub-13_ses-01_run-1: 3258 BIDS events and 3185 EEG.events
sub-14_ses-01_run-1: 3130 BIDS events and 3129 EEG.events
sub-15_ses-01_run-1: 3180 BIDS events and 3100 EEG.events
sub-16_ses-01_run-1: 2985 BIDS events and 3071 EEG.events

In [6]:
from hed.tools import BidsTsvSummary

bids_sum_all, bids_sum =  BidsTsvSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
bids_sum_all.print('\nBIDS events summary')

eeg_sum_all, eeg_sum =  BidsTsvSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
eeg_sum_all.print('\nEEG.event events summary')


BIDS events summary
  Categorical columns (2):
    duration (1 distinct values):
      n/a: 102524
    event_code (25 distinct values):
      1111: 2664
      1112: 2664
      1121: 2726
      1122: 2726
      1131: 3066
      1132: 3066
      1141: 3070
      1142: 3070
      1211: 34
      1212: 34
      1311: 5070
      1312: 5070
      1321: 6126
      1322: 6126
      2621: 2268
      2622: 2266
      3111: 34
      3112: 34
      3200: 2364
      3310: 34
      4210: 15220
      4220: 6866
      4230: 8370
      4311: 11246
      4312: 8310
  Value columns (0):

EEG.event events summary
  Categorical columns (1):
    type (26 distinct values):
      1111: 2938
      1112: 2938
      1121: 2883
      1122: 2883
      1131: 2904
      1132: 2904
      1141: 2894
      1142: 2894
      1211: 34
      1212: 34
      1311: 5193
      1312: 5193
      1321: 5796
      1322: 5796
      2621: 2239
      2622: 2238
      3111: 34
      3112: 34
      3200: 2342
      3310: 34
      4210: