## Initial analysis of a BIDS dataset.
This notebook works through the process analyzing the contents of the event files in
a BIDS dataset and creating a template for remapping event codes if desired.

The example used in this notebook is reduced version of an auditory attention shift
dataset which is available at
[https://github.com/hed-standard/hed-examples/data/eeg_ds0028932](https://github.com/hed-standard/hed-examples/data/eeg_ds0028932).

To run this notebook, you will need download this dataset and set the `bids_root_path`
variable to the local path.

### Step 1: Assess the events

The tools traverse through the BIDS data set and gathers the unique values for each
column and number of times each value appears in the dataset. Usually, you will
want to exclude the columns `onset`, `duration`, `sample`, and `HED` as the unique
values in these columns are not meaningful.

The next example traverses the directory tree to produce a list of event file paths.

In [1]:
from hed.tools import get_file_list

bids_root_path = "D:/eeg_ds002893s"
#bids_root_path = "G:\AttentionShift\AttentionShiftExperiments"
event_file_list = get_file_list(bids_root_path, types=[".tsv"], suffix="_events")
print(f"Bids dataset {bids_root_path} has {len(event_file_list)} event files")

Bids dataset D:/eeg_ds002893s has 6 event files


The `event_file_list` of the previous example is then used as input to `get_key_counts`,
a function that takes a root path and returns a dictionary. The keys of this top-level
dictionary are the column names of the event files. The values are dictionaries of the
unique values and their associated counts.

In [2]:
from hed.tools import get_key_counts, print_columns_info

column_names_to_skip = ["onset", "duration", "sample", "HED"]
count_dicts = get_key_counts(bids_root_path, skip_cols=column_names_to_skip)
print_columns_info(count_dicts, skip_cols=None)


trial_type:
	1: 5456
	2: 6059
	3: 17712

response_time:
	n/a: 29227

stim_file:
	n/a: 29227

value:
	11: 228
	12: 228
	13: 456
	14: 456
	15: 1822
	16: 1820
	21: 252
	22: 252
	23: 505
	24: 504
	25: 2010
	26: 2012
	28: 2
	31: 720
	32: 719
	37: 960
	38: 960
	39: 480
	202: 72
	212: 3
	310: 480
	311: 3838
	312: 3832
	313: 1920
	314: 1915
	1201: 433
	2201: 502
	3201: 1846


For finding the number of times a unique combination of column values appears,
use the `KeyDict` class.  In the following example, We use the same list of
files as in the previous example but count the number of times each unique
combinations of values that appear in the `value` and `trial_type` columns.

In [3]:
from hed.tools import KeyDict
key_counts = KeyDict(["value", "trial_type"])
for file in event_file_list:
    key_counts.update(file)
key_counts.resort()
key_counts.print()

Counts for key [['value', 'trial_type']]:
[11, 1]	228
[12, 1]	228
[13, 1]	456
[14, 1]	456
[15, 1]	1822
[16, 1]	1820
[21, 2]	252
[22, 2]	252
[23, 2]	505
[24, 2]	504
[25, 2]	2010
[26, 2]	2012
[28, 2]	2
[31, 3]	720
[32, 3]	719
[37, 3]	960
[38, 3]	960
[39, 3]	480
[202, 1]	13
[202, 2]	17
[202, 3]	42
[212, 2]	3
[310, 3]	480
[311, 3]	3838
[312, 3]	3832
[313, 3]	1920
[314, 3]	1915
[1201, 1]	433
[2201, 2]	502
[3201, 3]	1846


### Step 2: Make a template for remapping existing event values

If you decide to recode your events, you will need to create a template for doing the
remapping. The following code creates a data frame template and writes it as a `.tsv`.

In this case we decided to recode the events files so that each unique combination of
[`value`, `trial_type`] will be translated into a specific combination of the new columns
[`event_type`, `task_role`, `shift_cond`].

The template will consist of the 5 columns: [`value`, `trial_type`, `event_type`,
`task_role`, `shift_cond`] representing this association. The template is output.

In [4]:
import os
df = key_counts.make_template(additional_cols=["event_type", "task_role"])
template_file = os.path.join(bids_root_path, "event_template.tsv")
df.to_csv(template_file, sep='\t', index=False)

### Step 3: Fill in the template the mapping

The creation of a template for remapping the events is part of the event design.
Once this template has been filled in, HED has remapping tools to perform the remap.