## Find event combinations

This notebook traverses through a data set and gathers the unique combinations of values in the specified columns of the event files.

The setup requires the following variables for your dataset:

| Variable            | Purpose                                                        |
|---------------------|----------------------------------------------------------------|
| `dataset_root_path` | Full path to root directory of dataset.                        |
| `output_path`       | Output path for the spreadsheet template. If None, then print. |
| `exclude_dirs`      | List of directories to exclude when constructing file lists.   |
| `key_columns`       | List of column names in the events.tsv files to combine.       |

The result will be a tabular file (tab-separated file) whose columns are the `key_columns` in the order given. The values will be all unique combinations of the `key_columns`, sorted by columns left to right.

This can be used to remap the columns in event files to use a new recoding. The resulting spreadsheet is also useful for deciding whether two columns contain redundant information.

In [1]:
import os
from hed.tools.analysis.key_map import KeyMap
from hed.tools.util.data_util import get_new_dataframe
from hed.tools.util.io_util import get_file_list

# Variables to set for the specific dataset
dataset_root_path =  os.path.realpath('../../../datasets/eeg_ds002893s_hed_attention_shift')
output_path = ''
exclude_dirs = ['stimuli']

# Construct the key map
key_columns = [ "event_code", "cond_code", "event_type", "focus_modality", "attention_status", "task_role", "condition"]
key_map = KeyMap(key_columns)

# Construct the unique combinations
event_files = get_file_list(dataset_root_path, extensions=[".tsv"], name_suffix="_events", exclude_dirs=exclude_dirs)
for event_file in event_files:
    df = get_new_dataframe(event_file)
    key_map.update(df)

key_map.resort()
template = key_map.make_template()
if output_path:
    template.to_csv(output_path, sep='\t', index=False, header=True)
else:
    print(template)  


    key_counts event_code cond_code       event_type focus_modality  \
0           96          1         1        hear_word       auditory   
1           96          1         2        hear_word         visual   
2          288          1         3        hear_word       auditory   
3           96          2         1        look_word       auditory   
4           96          2         2        look_word         visual   
5          287          2         3        look_word         visual   
6          192          3         1        high_tone       auditory   
7          192          3         2        high_tone         visual   
8          192          4         1        light_bar       auditory   
9          192          4         2        light_bar         visual   
10         767          5         1         low_tone       auditory   
11         767          5         2         low_tone         visual   
12         766          6         1         dark_bar       auditory   
13    