## Initial analysis of a BIDS dataset events.

The first step in annotating a BIDS dataset is to find out what is in the dataset
event files. This tool traverse through the BIDS data set and gathers the unique
values for each column and number of times each value appears in the dataset.


#### Set dataset location

The example used in this notebook is reduced version of an auditory attention shift
dataset which is available at
[https://github.com/hed-standard/hed-examples/data/eeg_ds0028932](https://github.com/hed-standard/hed-examples/data/eeg_ds0028932).

To run this notebook, you will need download this dataset and set the `bids_root_path`
variable to the local path of the dataset's root directory.

Alternatively, you can set `bids_root_path` to the full path of your own BIDS dataset.

In [4]:
bids_root_path = "D:/eeg_ds002893s"
print(bids_root_path)

D:/eeg_ds002893s


#### Exclude appropriate columns

Usually, you will want to exclude the columns `onset`, `duration`, `sample`,
and `HED` as the unique values in these columns are not meaningful.

In [5]:
column_names_to_skip = ["onset", "duration", "sample", "HED"]

#### Get a list of event files

The next example recursively traverses the directory tree and produces
a list of the full paths of the dataset event files.

Event files have extension `.tsv` and the file names end with `_events`.
You may wish to check the returned list to verify that the expected event files
are in the dataset.

In [6]:
from hed.tools.io_utils import get_file_list
event_file_list = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
print(f"Bids dataset {bids_root_path} has {len(event_file_list)} event files")

Bids dataset D:/eeg_ds002893s has 6 event files


#### Output unique column values with counts

The HED tools provide functions for analyzing the contents of the event files.
In order to HED tag the datasets, we will need to know the different
categorical values present in the dataset. The `ColumnDict` class provides
basic facilities for analyzing the contents and for making template files.

In [6]:
from hed.tools.col_dict import ColumnDict

col_dict = ColumnDict(skip_cols=column_names_to_skip, name=bids_root_path)
for file in event_file_list:
    col_dict.update(file)
col_dict.print()

KeyboardInterrupt: 

#### Alternative approach (individual file counts)

The previous example collates the information for all the event files
in a single `ColumnDict` object.

If you want the results of the individual files, you can create a
`ColumnDict` for each file individually and use the `update_dict` to
create and update a summary `ColumnDict`.

In [None]:
from hed.tools.col_dict import ColumnDict
col_dict_all = ColumnDict(skip_cols=column_names_to_skip,
                          name=f"{bids_root_path} from individual dictionaries" )
file_dicts = {}
for file in event_file_list:
    col_dict = ColumnDict(skip_cols=column_names_to_skip, name=file)
    col_dict.update(file)
    file_dicts[file] = col_dict
    col_dict_all.update_dict(col_dict)
col_dict_all.print()

#### Output the individual dictionaries

In [None]:
print("\n Individual file counts:")
for key, value in file_dicts.items():
    value.print(title=f"\nSummary for file {key}")