## Preliminary summary of the Sternberg dataset

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

### Manual editing

 1. `sub-022_run-1` and `sub-022_run-2` each had an event at the end of the
file with value `empty`. These have been removed manually in `_events.tsv` and
`_events_temp.tsv`.
 2. The following had extra key press events at the beginning of the recording
which were removed:  `sub-003_run-3`(2), `sub-004_run-2`(5), `sub-004_run-4`(3),
`sub-006_run-1`(8), `sub-008_run-1`(4), `sub-009_run-4`(2), `sub-021_run-2`(2),
`sub-021_run-2`(2).
 3. `sub-015_run-3`(4) had extra key presses at the end of the file without a trial.
These were removed as well as a beginning boundary event.
 4. The EEG versions `sub-023_run-1`, `sub-023_run-2`, `sub-023_run-3`,
`sub-023_run-4`, and `sub-023_run-5` had the following extra 'n/a' columns: `event_code`, `cond_code`,
and `sample_offset` which were removed.

In [1]:
bids_root_path = 'G:/Sternberg/SternbergWorking/'
bids_skip = ['onset', 'duration', 'sample', 'response_time', 'trial_type', 'stim_file']
eeg_skip = ['latency', 'urevent', 'ReqTime', 'ReqDur', 'init_index', 'init_time',
            'event_code', 'cond_code', 'sample_offset', 'duration', 'TTime']

In [2]:
from hed.util import get_file_list, make_file_dict

# Construct the dictionary for BIDS events files
event_files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
bids_file_dict = make_file_dict(event_files_bids, name_indices=(0, 2))
print(f"\n{len(list(bids_file_dict))} Sternberg BIDS style event files")
for key, value in bids_file_dict.items():
    print(f"{key}: {value}")

# Construct the dictionary for EEG.event files
event_files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp")
eeg_file_dict = make_file_dict(event_files_eeg, name_indices=(0, 2))
print(f"\n{len(list(eeg_file_dict))} Sternberg EEG.event style event files")
for key, value in eeg_file_dict.items():
    print(f"{key}: {value}")


23 Sternberg BIDS style event files
sub-001_task-Experiment: G:\Sternberg\SternbergWorking\sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-4_events.tsv
sub-002_task-Experiment: G:\Sternberg\SternbergWorking\sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-4_events.tsv
sub-003_task-Experiment: G:\Sternberg\SternbergWorking\sub-003\ses-01\eeg\sub-003_ses-01_task-Experiment_run-4_events.tsv
sub-004_task-Experiment: G:\Sternberg\SternbergWorking\sub-004\ses-01\eeg\sub-004_ses-01_task-Experiment_run-4_events.tsv
sub-005_task-Experiment: G:\Sternberg\SternbergWorking\sub-005\ses-01\eeg\sub-005_ses-01_task-Experiment_run-4_events.tsv
sub-006_task-Experiment: G:\Sternberg\SternbergWorking\sub-006\ses-01\eeg\sub-006_ses-01_task-Experiment_run-4_events.tsv
sub-007_task-Experiment: G:\Sternberg\SternbergWorking\sub-007\ses-01\eeg\sub-007_ses-01_task-Experiment_run-4_events.tsv
sub-008_task-Experiment: G:\Sternberg\SternbergWorking\sub-008\ses-01\eeg\sub-008_ses-01_task-Experiment_run-

In [3]:
from hed.util.data_util import get_new_dataframe

print(f"\nBIDS style event file columns:")
for key, file in bids_file_dict.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")

print(f"\nEEG.event style event file columns:")
for key, file in eeg_file_dict.items():
    df = get_new_dataframe(file)
    print(f"{key}: {str(list(df.columns.values))}")


BIDS style event file columns:
sub-001_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-003_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-004_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-005_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-006_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-007_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-008_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-009_task-Experiment: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 

In [4]:
from hed.tools import ColumnSummary

print('\nBIDS events summary:')
bids_dicts_all, bids_dicts =  ColumnSummary.make_combined_dicts(bids_file_dict, skip_cols=bids_skip)
bids_dicts_all.print()

print('\nEEG.event events summary:')
eeg_dicts_all, eeg_dicts =  ColumnSummary.make_combined_dicts(eeg_file_dict, skip_cols=eeg_skip)
eeg_dicts_all.print()


BIDS events summary:
Summary for column dictionary :
  Categorical columns (1):
    value (70 distinct values):
      1: 586
      255: 567
      B: 99
      C: 123
      D: 128
      F: 125
      G: 186
      H: 151
      J: 112
      K: 166
      L: 181
      M: 187
      N: 142
      P: 100
      Q: 115
      R: 171
      S: 144
      T: 118
      V: 133
      W: 135
      WM: 570
      X: 74
      Y: 138
      Z: 142
      boundary: 25
      correct: 507
      gB: 74
      gC: 86
      gD: 104
      gF: 96
      gG: 84
      gH: 102
      gJ: 108
      gK: 69
      gL: 22
      gM: 135
      gN: 92
      gP: 84
      gQ: 88
      gR: 82
      gS: 41
      gT: 83
      gV: 55
      gW: 63
      gX: 56
      gY: 43
      gZ: 123
      nonWM: 575
      rB: 57
      rC: 23
      rD: 27
      rF: 31
      rG: 20
      rH: 31
      rJ: 41
      rK: 48
      rL: 17
      rM: 22
      rN: 21
      rP: 19
      rQ: 25
      rR: 33
      rS: 26
      rT: 9
      rV: 36
      rW: 20
      rX