## Summarize the contents of the event files in a BIDS dataset.

The first step in annotating a BIDS dataset is to find out what is in the dataset
event files. This tool traverse through the BIDS data set and gathers the unique
values for each column and number of times each value appears in the dataset.

The steps are:

1. Set the dataset location (`bids_root_path`) to the absolute path to the root of your BIDS dataset.
2. Get a list of the event files in the BIDS dataset.
3. Set the columns to be excluded (`column_to_skip`) from the summary.
Usually, these are columns that are specific or unique to each event
including `onset`, `duration`, `sample`, and `HED`.
4. Create a dictionary of the unique values in each column.
5. Output the results.

The example below uses a small version of the Wakeman-Hanson face-processing dataset
available on openNeuro as ds003654.

In [19]:
import os
import json
from hed.tools import generate_sidecar_entry
from hed.tools import ColumnDict
from hed.util import get_file_list


bids_root_path =  os.path.abspath(os.path.join(os.path.dirname(os.path.abspath('')),
                                               os.path.join('../../datasets/eeg_ds003654s_hed')))
event_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
column_to_skip = ["onset", "duration", "sample", "stim_file", "trial", "response_time"]
col_dict = ColumnDict(skip_cols=column_to_skip, name=bids_root_path)
for file in event_files:
    col_dict.update(file)
col_dict.print()

side_dict = {}
cat_col = col_dict.categorical_info
print("to here")

side_dict = {}
for column_name, columns in cat_col.items():
    side_dict[column_name] = generate_sidecar_entry(column_name, list(columns.keys()).sort())
str_json = json.dumps(side_dict, indent=4)
print(str_json)
print("to here 1")

Bids root path: D:\Research\HED\hed-examples\datasets\eeg_ds003654s_hed
Summary for column dictionary D:\Research\HED\hed-examples\datasets\eeg_ds003654s_hed:
  Categorical columns (5):
    event_type (8 distinct values):
      double_press: 1
      left_press: 246
      right_press: 457
      setup_right_sym: 6
      show_circle: 884
      show_cross: 884
      show_face: 878
      show_face_initial: 6
    face_type (4 distinct values):
      famous_face: 294
      n/a: 2478
      scrambled_face: 298
      unfamiliar_face: 292
    rep_lag (12 distinct values):
      1: 216
      10: 49
      11: 57
      12: 40
      13: 17
      14: 15
      15: 3
      6: 1
      7: 5
      8: 10
      9: 21
      n/a: 2928
    rep_status (4 distinct values):
      delayed_repeat: 218
      first_show: 450
      immediate_repeat: 216
      n/a: 2478
    value (15 distinct values):
      0: 884
      1: 884
      13: 150
      14: 64
      15: 78
      17: 150
      18: 82
      19: 66
      256: 246