## Code summary of the Attention Shift dataset

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

In [1]:
from hed.tools.io_utils import get_file_list, make_file_dict
bids_root_path = 'G:/AttentionShift/AttentionShiftExperiments'
skip_columns = ['onset', 'duration', 'sample', 'stim_file', 'HED']
#skip_columns = ['latency', 'urevent', 'usertags', 'sample_offset']
name_suffix = "_events_temp2"
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix=name_suffix)
dict_bids = make_file_dict(files_bids)
print(f"\n{len(list(dict_bids))} Attention shift event files")


49 Attention shift event files


In [2]:
from hed.tools.data_util import get_new_dataframe

print(f"\nBIDS style event file columns:")
for key, file in dict_bids.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")


BIDS style event file columns:
sub-001_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-002_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-003_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-004_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-005_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-006_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-007_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-008_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-009_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', 'cond_code']
sub-010_events: ['onset', 'duration', 'sample', 'trial_type', 'value', 'event_code', '

In [3]:
from hed.tools.map_utils import make_combined_dicts

print('\nBIDS events summary:')
bids_counts_all, bids_counts =  make_combined_dicts(dict_bids, skip_cols=skip_columns)
bids_counts_all.print()


BIDS events summary:
Summary for column dictionary :
  Categorical columns (4):
    cond_code (3 distinct values):
      1: 58829
      2: 54629
      3: 168968
    event_code (16 distinct values):
      1: 11511
      10: 4606
      11: 36781
      12: 36760
      13: 18394
      14: 18396
      2: 11509
      201: 28551
      202: 912
      3: 9200
      4: 9205
      5: 36788
      6: 36783
      7: 9214
      8: 9216
      9: 4600
    trial_type (3 distinct values):
      1: 58829
      2: 54629
      3: 168968
    value (51 distinct values):
      1: 240
      10: 96
      11: 3155
      110: 96
      111: 766
      112: 766
      113: 384
      114: 382
      12: 3149
      1201: 5024
      13: 4861
      14: 4859
      15: 17897
      16: 17898
      17: 192
      18: 192
      19: 96
      2: 240
      201: 764
      202: 912
      21: 2218
      212: 3
      22: 2221
      2201: 4496
      23: 4436
      24: 4441
      25: 17736
      26: 17731
      28: 2
      3: 192
      