## Initial summary of event files

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

Keys are specified by a `name_indices` tuple which consists of the
pieces of the file name to include. Here pieces are separated by the
underbar character.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple (0, 2) gives a key of `sub-001_task-target`,
while the tuple (0, 3) gives a key of `sub-001_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the key.)||
| skip_columns  |  List of column names in the `events.tsv` files to skip in the analysis. |

In [1]:
from hed.tools import BidsTsvDictionary
from hed.util import get_file_list

# Variables to set for the specific dataset
bids_root_path = 'S:/openneuro/ds002790-download'
exclude_dirs = ['derivatives']
entities = ('sub', 'task')
skip_columns = ['onset', 'duration', 'response_time', 'stop_signal_delay']
tasks = ['emomatching', 'restingstate', 'stopsignal', 'workingmemory']

# Construct the event file dictionaries for the BIDS and for EEG.event files
event_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events",
                            exclude_dirs=exclude_dirs)
bids_dict = BidsTsvDictionary(event_files, entities=('sub', 'task'))

In [2]:
task_dicts, leftovers = bids_dict.create_split_dict('task')
print(f"Dataset tasks are [{str(task_dicts.keys())}]")
for task, task_dict in task_dicts.items():
    task_dict.print_files(title=f"\nBIDS-style event files for task {task}")

if leftovers:
    leftovers.print_files(title=f"\nThese file did not have a task entity")

Dataset tasks are [dict_keys(['stopsignal', 'workingmemory', 'emomatching'])]

BIDS-style event files for task stopsignal (226 files)
sub_0001_task_stopsignal: sub-0001_task-stopsignal_acq-seq_events.tsv
sub_0002_task_stopsignal: sub-0002_task-stopsignal_acq-seq_events.tsv
sub_0003_task_stopsignal: sub-0003_task-stopsignal_acq-seq_events.tsv
sub_0004_task_stopsignal: sub-0004_task-stopsignal_acq-seq_events.tsv
sub_0005_task_stopsignal: sub-0005_task-stopsignal_acq-seq_events.tsv
sub_0006_task_stopsignal: sub-0006_task-stopsignal_acq-seq_events.tsv
sub_0007_task_stopsignal: sub-0007_task-stopsignal_acq-seq_events.tsv
sub_0008_task_stopsignal: sub-0008_task-stopsignal_acq-seq_events.tsv
sub_0009_task_stopsignal: sub-0009_task-stopsignal_acq-seq_events.tsv
sub_0010_task_stopsignal: sub-0010_task-stopsignal_acq-seq_events.tsv
sub_0011_task_stopsignal: sub-0011_task-stopsignal_acq-seq_events.tsv
sub_0012_task_stopsignal: sub-0012_task-stopsignal_acq-seq_events.tsv
sub_0013_task_stopsignal: 

In [3]:
print(f"\nBIDS-style event file columns:")
for task, task_dict in task_dicts.items():
    print(f"\nTask {task} event file columns:")
    for key, file, rowcount, columns in task_dict.iter_tsv_info():
        print(f"{key} [{rowcount} events]: {str(columns)}")



BIDS-style event file columns:

Task stopsignal event file columns:
sub_0001_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
sub_0002_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
sub_0003_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
sub_0004_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
sub_0005_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex']
sub_0006_task_stopsignal [100 events]: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'respon

In [4]:
from hed.tools import BidsTsvSummary

print('\nBIDS events summary counts:')
for task, task_dict in task_dicts.items():
    dicts_all, dicts_sep = BidsTsvSummary.make_combined_dicts(task_dict, skip_cols=skip_columns)
    dicts_all.print(title=f"\nBIDS-style event info for task {task}")


BIDS events summary counts:

BIDS-style event info for task stopsignal
  Categorical columns (4):
    response_accuracy (4 distinct values):
      correct: 17061
      incorrect: 923
      miss: 246
      n/a: 4370
    response_hand (2 distinct values):
      left: 9395
      right: 13205
    sex (2 distinct values):
      female: 11360
      male: 11240
    trial_type (3 distinct values):
      go: 15119
      succesful_stop: 4370
      unsuccesful_stop: 3111
  Value columns (0):

BIDS-style event info for task workingmemory
  Categorical columns (3):
    response_accuracy (4 distinct values):
      correct: 3665
      incorrect: 3353
      miss: 631
      n/a: 1311
    response_hand (3 distinct values):
      left: 4235
      n/a: 8
      right: 4717
    trial_type (3 distinct values):
      active_change: 3584
      active_nochange: 3584
      passive: 1792
  Value columns (0):

BIDS-style event info for task emomatching
  Categorical columns (9):
    emo_match (3 distinct values):