## Preliminary summary of the Sternberg dataset

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

### Manual editing

 1. `sub-022_run-1` and `sub-022_run-2` each had an event at the end of the
file with value `empty`. These have been removed manually in `_events.tsv` and
`_events_temp.tsv`.
 2. The following had extra key press events at the beginning of the recording
which were removed:  `sub-003_run-3`(2), `sub-004_run-2`(5), `sub-004_run-4`(3),
`sub-006_run-1`(8), `sub-008_run-1`(4), `sub-009_run-4`(2), `sub-021_run-2`(2),
`sub-021_run-2`(2).
 3. `sub-015_run-3`(4) had extra key presses at the end of the file without a trial.
These were removed as well as a beginning boundary event.
 4. The EEG versions `sub-023_run-1`, `sub-023_run-2`, `sub-023_run-3`,
`sub-023_run-4`, and `sub-023_run-5` had the following extra 'n/a' columns: `event_code`, `cond_code`,
and `sample_offset` which were removed.

In [1]:
bids_root_path = 'G:/Sternberg/SternbergWorking/'
bids_skip = ['onset', 'duration', 'sample', 'response_time', 'trial_type', 'stim_file']
eeg_skip = ['latency', 'urevent', 'ReqTime', 'ReqDur', 'init_index', 'init_time',
            'event_code', 'cond_code', 'sample_offset', 'duration', 'TTime']

In [2]:
from hed.tools.io_utils import get_file_list, make_file_dict

# Construct the dictionary for BIDS events files
event_files_bids = get_file_list(bids_root_path, extensions=[".tsv"], suffix="_events")
bids_file_dict = make_file_dict(event_files_bids)
print(f"\n{len(list(bids_file_dict))} Sternberg BIDS style event files")
for key, value in bids_file_dict.items():
    print(f"{key}: {value}")

# Construct the dictionary for EEG.event files
event_files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], suffix="_events_temp")
eeg_file_dict = make_file_dict(event_files_eeg, indices=[0, -3])
print(f"\n{len(list(eeg_file_dict))} Sternberg EEG.event style event files")
for key, value in eeg_file_dict.items():
    print(f"{key}: {value}")


85 Sternberg BIDS style event files
sub-001_run-1: G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-1_events.tsv
sub-001_run-2: G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-2_events.tsv
sub-001_run-3: G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-3_events.tsv
sub-001_run-4: G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-4_events.tsv
sub-002_run-1: G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-1_events.tsv
sub-002_run-2: G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-2_events.tsv
sub-002_run-3: G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-3_events.tsv
sub-002_run-4: G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-4_events.tsv
sub-003_run-1: G:/Sternberg/SternbergWorking/sub-003\ses-01\eeg\sub

In [3]:
from hed.tools.io_utils import get_new_dataframe

print(f"\nBIDS style event file columns:")
for key, file in bids_file_dict.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")

print(f"\nEEG.event style event file columns:")
for key, file in eeg_file_dict.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")


Sternberg BIDS style event file columns:
sub-001_run-1: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-001_run-2: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-001_run-3: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-001_run-4: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-1: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-2: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-3: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_run-4: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_run-1: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_run-2: ['onset', 'duration', 'sample', 'trial_type

In [4]:
from hed.tools.map_utils import make_combined_dicts

print('\nBIDS events summary:')
bids_dicts_all, bids_dicts =  make_combined_dicts(bids_file_dict, skip_cols=bids_skip)
bids_dicts_all.print()

print('\nEEG.event events summary:')
eeg_dicts_all, eeg_dicts =  make_combined_dicts(eeg_file_dict, skip_cols=eeg_skip)
eeg_dicts_all.print()


BIDS events summary:
Summary for column dictionary :
  Categorical columns (1):
    value (70 distinct values):
      1: 2382
      255: 1828
      B: 333
      C: 438
      D: 594
      F: 461
      G: 533
      H: 502
      J: 428
      K: 604
      L: 566
      M: 626
      N: 441
      P: 404
      Q: 418
      R: 630
      S: 582
      T: 445
      V: 528
      W: 464
      WM: 2097
      X: 408
      Y: 484
      Z: 601
      boundary: 63
      correct: 1868
      gB: 314
      gC: 505
      gD: 269
      gF: 293
      gG: 336
      gH: 400
      gJ: 327
      gK: 217
      gL: 140
      gM: 436
      gN: 238
      gP: 333
      gQ: 362
      gR: 293
      gS: 204
      gT: 343
      gV: 214
      gW: 305
      gX: 240
      gY: 186
      gZ: 338
      nonWM: 2106
      rB: 179
      rC: 83
      rD: 55
      rF: 82
      rG: 115
      rH: 62
      rJ: 134
      rK: 116
      rL: 177
      rM: 60
      rN: 37
      rP: 94
      rQ: 133
      rR: 152
      rS: 97
      rT: 45
   