## Summarize the contents of the event files in a BIDS dataset.

A first step in annotating a BIDS dataset is to find out what is in the dataset
event files.
Sometimes event files will have a few unexpected or incorrect codes.
It is usually a good idea to find out what is actually in the dataset
event files before starting the annotation process.

This notebook traverses through the BIDS data set and gathers the unique
values for each column and number of times each value appears in the dataset.

To use the notebook for your own data:

1. Set the dataset location (`bids_root_path`) to the absolute path to
the root of your BIDS dataset.
2. Set the `columns_to_skip` to a list of the columns that will not be
included in the summary.

For large datasets, you will want to be sure to exclude columns such as
`onset` and `sample`, since the summary produces the number of times
each unique value appears somewhere in an event file.

The notebook uses the `ColumnSummary` object to handle the summarization.

The example below uses a
[small version](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003654s_hed)
of the Wakeman-Hanson face-processing dataset available on openNeuro as
[ds003654](https://openneuro.org/datasets/ds003645/versions/2.0.0).

In [2]:
import os
from hed.tools import ColumnSummary
from hed.util import get_file_list

bids_root_path =  os.path.abspath(os.path.join(os.path.dirname(os.path.abspath('')),
                                               os.path.join('../../datasets/eeg_ds003654s_hed')))
event_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
columns_to_skip = ["onset", "duration", "sample", "stim_file", "trial", "response_time"]
col_sum = ColumnSummary(skip_cols=columns_to_skip, name=bids_root_path)
col_sum.update(event_files)
col_sum.print(title=f"Summary of event file values with counts for {bids_root_path}")

Summary of event file values with counts for D:\Research\HED\hed-examples\datasets\eeg_ds003654s_hed
  Categorical columns (5):
    event_type (8 distinct values):
      double_press: 1
      left_press: 246
      right_press: 457
      setup_right_sym: 6
      show_circle: 884
      show_cross: 884
      show_face: 878
      show_face_initial: 6
    face_type (4 distinct values):
      famous_face: 294
      n/a: 2478
      scrambled_face: 298
      unfamiliar_face: 292
    rep_lag (12 distinct values):
      1: 216
      10: 49
      11: 57
      12: 40
      13: 17
      14: 15
      15: 3
      6: 1
      7: 5
      8: 10
      9: 21
      n/a: 2928
    rep_status (4 distinct values):
      delayed_repeat: 218
      first_show: 450
      immediate_repeat: 216
      n/a: 2478
    value (15 distinct values):
      0: 884
      1: 884
      13: 150
      14: 64
      15: 78
      17: 150
      18: 82
      19: 66
      256: 246
      3: 6
      4096: 457
      4352: 1
      5: 150
   