## Preliminary summary of the Attention Shift dataset

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

In [1]:
bids_root_path = 'G:/AttentionShift/AttentionShiftExperiments'
bids_skip = ['onset', 'duration', 'sample', 'stim_file', 'HED']
eeg_skip = ['latency', 'urevent', 'usertags', 'sample_offset']

In [2]:
from hed.tools.io_utils import get_file_list, make_file_dict

# Construct the dictionary for BIDS events files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
dict_bids = make_file_dict(files_bids)
print(f"\n{len(list(dict_bids))} Sternberg BIDS style event files")
for key, value in dict_bids.items():
    print(f"{key}: {value}")

# Construct the dictionary for EEG.event files
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp")
dict_eeg = make_file_dict(files_eeg, indices=(0, -3))
print(f"\n{len(list(dict_eeg))} Sternberg EEG.event style event files")
for key, value in dict_eeg.items():
    print(f"{key}: {value}")


54 Sternberg BIDS style event files
sub-001_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-001\eeg\sub-001_task-AuditoryVisualShift_run-01_events.tsv
sub-002_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-002\eeg\sub-002_task-AuditoryVisualShift_run-01_events.tsv
sub-003_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-003\eeg\sub-003_task-AuditoryVisualShift_run-01_events.tsv
sub-004_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-004\eeg\sub-004_task-AuditoryVisualShift_run-01_events.tsv
sub-004_run-02: G:/AttentionShift/AttentionShiftExperiments\sub-004\eeg\sub-004_task-AuditoryVisualShift_run-02_events.tsv
sub-005_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-005\eeg\sub-005_task-AuditoryVisualShift_run-01_events.tsv
sub-006_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-006\eeg\sub-006_task-AuditoryVisualShift_run-01_events.tsv
sub-007_run-01: G:/AttentionShift/AttentionShiftExperiments\sub-007\eeg\sub-007_task-AuditoryVisualShi

In [3]:
from hed.tools.data_utils import get_new_dataframe

print(f"\nBIDS style event file columns:")
for key, file in dict_bids.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")

print(f"\nEEG.event style event file columns:")
for key, file in dict_eeg.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")


BIDS style event file columns:
sub-001_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-002_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-003_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-004_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-004_run-02: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-005_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-006_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-007_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED']
sub-008_run-01: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HE

In [4]:
from hed.tools.map_utils import make_combined_dicts

print('\nBIDS events summary:')
bids_counts_all, bids_counts =  make_combined_dicts(dict_bids, skip_cols=bids_skip)
bids_counts_all.print()

print('\nEEG.event events summary:')
eeg_counts_all, eeg_counts =  make_combined_dicts(dict_eeg, skip_cols=eeg_skip)
eeg_counts_all.print()




BIDS events summary:
Summary for column dictionary :
  Categorical columns (3):
    response_time (1 distinct values):
      n/a: 287333
    trial_type (4 distinct values):
      0: 6264
      1: 58184
      2: 54045
      3: 168840
    value (52 distinct values):
      1: 240
      10: 96
      11: 3179
      110: 96
      111: 766
      112: 766
      113: 384
      114: 382
      12: 3173
      1201: 5075
      13: 4909
      14: 4907
      15: 18089
      16: 18090
      17: 192
      18: 192
      19: 96
      199: 197
      2: 240
      201: 764
      202: 928
      21: 2242
      212: 3
      22: 2245
      2201: 4545
      23: 4484
      24: 4489
      25: 17927
      26: 17923
      28: 2
      3: 192
      31: 6810
      310: 4510
      311: 36014
      312: 35989
      313: 18009
      314: 18014
      32: 6809
      3201: 18644
      33: 96
      34: 96
      35: 383
      36: 385
      37: 9022
      38: 9022
      39: 4504
      4: 192
      5: 772
      6: 769
      7: 