## Initial summary of event files

**Dataset**: Go-nogo categorization and detection task v1.2.0 [openNeuro ds002680](https://openneuro.org/datasets/ds002680/versions/1.2.0).

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| bids_skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip_columns  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|

In [1]:
from hed.tools import BidsTsvDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'G:/GoNogo/GoNogoWorking'
exclude_dirs = ['code', 'stimuli']
entities = ('sub', 'ses', 'run')
bids_skip_columns = ['onset', 'response_time', 'stim_file']
eeg_skip_columns = ['latency', 'response_time', 'stim_file']

# Construct the event file dictionaries for the BIDS and for EEG.event files
event_files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events",
                                 exclude_dirs=exclude_dirs)
bids_dict = BidsTsvDictionary(event_files_bids, entities=entities)
event_files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp",
                                exclude_dirs=exclude_dirs)
eeg_dict = BidsTsvDictionary(event_files_eeg, entities=entities)


In [2]:
print(f"Summarizing {bids_root_path}...")
bids_dict.print_files(title="\nBIDS style event files")
eeg_dict.print_files(title="\nEEG.event style event files")

Summarizing G:/GoNogo/GoNogoWorking...

BIDS style event files (350 files)
sub-002_ses-01_run-10: sub-002_ses-01_task-gonogo_run-10_events.tsv
sub-002_ses-01_run-11: sub-002_ses-01_task-gonogo_run-11_events.tsv
sub-002_ses-01_run-12: sub-002_ses-01_task-gonogo_run-12_events.tsv
sub-002_ses-01_run-13: sub-002_ses-01_task-gonogo_run-13_events.tsv
sub-002_ses-01_run-1: sub-002_ses-01_task-gonogo_run-1_events.tsv
sub-002_ses-01_run-2: sub-002_ses-01_task-gonogo_run-2_events.tsv
sub-002_ses-01_run-3: sub-002_ses-01_task-gonogo_run-3_events.tsv
sub-002_ses-01_run-4: sub-002_ses-01_task-gonogo_run-4_events.tsv
sub-002_ses-01_run-5: sub-002_ses-01_task-gonogo_run-5_events.tsv
sub-002_ses-01_run-6: sub-002_ses-01_task-gonogo_run-6_events.tsv
sub-002_ses-01_run-7: sub-002_ses-01_task-gonogo_run-7_events.tsv
sub-002_ses-01_run-8: sub-002_ses-01_task-gonogo_run-8_events.tsv
sub-002_ses-01_run-9: sub-002_ses-01_task-gonogo_run-9_events.tsv
sub-002_ses-02_run-10: sub-002_ses-02_task-gonogo_run-10_ev

In [3]:
key_diff = bids_dict.key_diffs(eeg_dict)
print(f"Key differences between EEG and BIDS events: {str(key_diff)}")

Key differences between EEG and BIDS events: []


In [4]:
print(f"\nBIDS style event file columns:")
for key, file, rowcount, columns in bids_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")

print(f"\nEEG.event style event file columns:")
for key, file, rowcount, columns in eeg_dict.iter_tsv_info():
    print(f"{key} [{rowcount} events]: {str(columns)}")


BIDS style event file columns:
sub-002_ses-01_run-10 [152 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-11 [153 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-12 [152 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-13 [156 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-1 [151 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-2 [150 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-3 [151 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-4 [154 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-

In [5]:
count_diffs = bids_dict.count_diffs(eeg_dict)
if count_diffs:
    print("The number of BIDS events and EEG.event events differ for the following:")
    for item in count_diffs:
        print(f"{item[0]}: {item[1]} BIDS events and {item[2]} EEG.events")
else:
    print("The BIDS event files and EEG.event structures have the same number of events")

The BIDS event files and EEG.event structures have the same number of events


In [6]:
from hed.tools import BidsTsvSummary
bids_dicts_all, bids_dicts =  BidsTsvSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip_columns)
bids_dicts_all.print(title='\nBIDS events summary:')

eeg_dicts_all, eeg_dicts =  BidsTsvSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip_columns)
eeg_dicts_all.print(title='\nEEG.event events summary')


BIDS events summary:
  Categorical columns (4):
    duration (1 distinct values):
      n/a: 52662
    sample (1 distinct values):
      n/a: 52662
    trial_type (2 distinct values):
      response: 17666
      stimulus: 34996
    value (10 distinct values):
      animal_distractor: 6300
      animal_target: 6298
      correct: 16921
      difficult_distractor: 4200
      difficult_target: 4198
      easy_distractor: 4200
      easy_target: 4200
      incorrect: 745
      nonanimal_distractor: 2800
      nonanimal_target: 2800
  Value columns (0):

EEG.event events summary
  Categorical columns (4):
    duration (1 distinct values):
      NaN: 52662
    sample (1 distinct values):
      NaN: 52662
    trial_type (2 distinct values):
      response: 17666
      stimulus: 34996
    type (10 distinct values):
      animal_distractor: 6300
      animal_target: 6298
      correct: 16921
      difficult_distractor: 4200
      difficult_target: 4198
      easy_distractor: 4200
      easy_ta