## Preliminary column summary of events

This script does a preliminary summary of the contents of the events files.
The summary includes printing out the column names of each event file so
that they can be manually checked for differences.

The script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.

There are several standard functions used in this summary:
* [get_file_list](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#getting-a-list-of-files-anchor)
* [make_file_dict](https://hed-examples.readthedocs.io/en/latest/HedInPython.html   #dictionaries-of-filenames-anchor)

The setup requires the setting of the following variables for your dataset:

| Variable | Purpose |
| -------- | ------- |
| bids_root_path | Full path to root directory of dataset.|
| exclude_dirs | List of directories to exclude when constructing file lists. |
| name_indices  | Indices used by make_file_dict to construct a unique key. |
| bids_skip  |  List of column names in the `events.tsv` files to skip in the analysis. |
| eeg_skip  | List of column names in the `events_temp.tsv` files form EEG.events to skip in analysis.|


In [1]:
import os
from hed.util import get_file_list, make_file_dict

bids_root_path = 'G:/GoNogo/GoNogoWorking'
exclude_dirs = ['code', 'stimuli']
name_indices = (0, 1, 3)
# bids_skip = ['onset', 'duration', 'sample', 'response_time', 'stim_file']
# eeg_skip = ['latency', 'duration', 'sample', 'response_time', 'stim_file']
bids_skip = ['onset',  'response_time', 'stim_file']
eeg_skip = ['latency', 'response_time', 'stim_file']

print(f"Summarizing {bids_root_path}...")
bids_files = get_file_list(bids_root_path, extensions=[".tsv"],
                                 name_suffix="_events", exclude_dirs=exclude_dirs)
bids_dict = make_file_dict(bids_files, name_indices=name_indices)
print(f"\n{len(list(bids_dict))} BIDS style event files")
for key, value in bids_dict.items():
    print(f"{key}: {os.path.basename(value)}")

# Construct the dictionary for EEG.event files
event_files_eeg = get_file_list(bids_root_path, extensions=[".tsv"],
                                name_suffix="_events_temp", exclude_dirs=exclude_dirs)
eeg_dict = make_file_dict(event_files_eeg, name_indices=name_indices)
print(f"\n{len(list(eeg_dict))} EEG.event style event files")
for key, value in eeg_dict.items():
    print(f"{key}: {os.path.basename(value)}")

Summarizing G:/GoNogo/GoNogoWorking...

350 BIDS style event files
sub-002_ses-01_run-10: sub-002_ses-01_task-gonogo_run-10_events.tsv
sub-002_ses-01_run-11: sub-002_ses-01_task-gonogo_run-11_events.tsv
sub-002_ses-01_run-12: sub-002_ses-01_task-gonogo_run-12_events.tsv
sub-002_ses-01_run-13: sub-002_ses-01_task-gonogo_run-13_events.tsv
sub-002_ses-01_run-1: sub-002_ses-01_task-gonogo_run-1_events.tsv
sub-002_ses-01_run-2: sub-002_ses-01_task-gonogo_run-2_events.tsv
sub-002_ses-01_run-3: sub-002_ses-01_task-gonogo_run-3_events.tsv
sub-002_ses-01_run-4: sub-002_ses-01_task-gonogo_run-4_events.tsv
sub-002_ses-01_run-5: sub-002_ses-01_task-gonogo_run-5_events.tsv
sub-002_ses-01_run-6: sub-002_ses-01_task-gonogo_run-6_events.tsv
sub-002_ses-01_run-7: sub-002_ses-01_task-gonogo_run-7_events.tsv
sub-002_ses-01_run-8: sub-002_ses-01_task-gonogo_run-8_events.tsv
sub-002_ses-01_run-9: sub-002_ses-01_task-gonogo_run-9_events.tsv
sub-002_ses-02_run-10: sub-002_ses-02_task-gonogo_run-10_events.tsv

In [2]:
print("Verifying that both dictionaries have the same keys")
keys_bids = set(bids_dict.keys())
keys_eeg = set(eeg_dict.keys())
list_bids = list(keys_bids.difference(keys_eeg))
list_eeg = list(keys_eeg.difference(keys_bids))
print(f"Bids extra keys {str(list_bids)}")
print(f"EEG extra keys {str(list_eeg)}")

Verifying that both dictionaries have the same keys
Bids extra keys []
EEG extra keys []


In [3]:
from hed.util import get_new_dataframe

print(f"\nBIDS style event file columns:")
bids_count_dict = {}
for key, file in bids_dict.items():
    df = get_new_dataframe(file)
    bids_count_dict[key] = len(df.index)
    print(f"{key} [{len(df.index)} events]: {str(list(df.columns.values))}")

print(f"\nEEG.event style event file columns:")
eeg_count_dict = {}
for key, file in eeg_dict.items():
    df = get_new_dataframe(file)
    eeg_count_dict[key] = len(df.index)
    print(f"{key} [{len(df.index)} events]: {str(list(df.columns.values))}")


BIDS style event file columns:
sub-002_ses-01_run-10 [152 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-11 [153 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-12 [152 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-13 [156 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-1 [151 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-2 [150 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-3 [151 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-002_ses-01_run-4 [154 events]: ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value']
sub-

In [4]:
from hed.tools import ColumnSummary
print('\nBIDS events summary:')
bids_sum_all, bids_sum =  ColumnSummary.make_combined_dicts(bids_dict, skip_cols=bids_skip)
bids_sum_all.print()

print('\nEEG.event events summary:')
eeg_sum_all, eeg_sum =  ColumnSummary.make_combined_dicts(eeg_dict, skip_cols=eeg_skip)
eeg_sum_all.print()


BIDS events summary:
Summary for column dictionary :
  Categorical columns (4):
    duration (1 distinct values):
      n/a: 52662
    sample (1 distinct values):
      n/a: 52662
    trial_type (2 distinct values):
      response: 17666
      stimulus: 34996
    value (10 distinct values):
      animal_distractor: 6300
      animal_target: 6298
      correct: 16921
      difficult_distractor: 4200
      difficult_target: 4198
      easy_distractor: 4200
      easy_target: 4200
      incorrect: 745
      nonanimal_distractor: 2800
      nonanimal_target: 2800
  Value columns (0):

EEG.event events summary:
Summary for column dictionary :
  Categorical columns (4):
    duration (1 distinct values):
      NaN: 52662
    sample (1 distinct values):
      NaN: 52662
    trial_type (2 distinct values):
      response: 17666
      stimulus: 34996
    type (10 distinct values):
      animal_distractor: 6300
      animal_target: 6298
      correct: 16921
      difficult_distractor: 4200
     