## Preliminary verification of the Attention Shift dataset

This script is designed to check specific relationships in the events files.
The checks are very specific to the dataset. The script assumes that the data
is in BIDS format and that each BIDS events file of the form `_events.tsv`
has a corresponding events file with suffix `_events_temp.tsv` that was previously
dumped from the `EEG.set` files.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

The following verifications are performed:
### Verification of relationships:

These relationships are checked for each event file:
1. EEG `cond_code` == BIDS `trial_type`
2. EEG `type` == BIDS `value`

In [None]:
bids_root_path = 'G:/AttentionShift/AttentionShiftExperiments'
bids_skip = ['onset', 'duration', 'sample', 'stim_file', 'HED']
eeg_skip = ['latency', 'urevent', 'usertags', 'sample_offset']

In [None]:
from hed.tools.io_utils import get_file_list, make_file_dict

# Construct the dictionary for BIDS events files
files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp")
dict_bids = make_file_dict(files_bids)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp")


In [None]:
print("Checking cond_code == trial_type and type == value")
for key, file_eeg in eeg_file_dict.items():
    # Get the column dictionaries for corresponding files
    eeg_dict = eeg_dicts[key]
    bids_dict = bids_dicts[key]
    eeg_type_dict = eeg_dict.categorical_info['type']
    eeg_cond_dict = eeg_dict.categorical_info['cond_code']
    bids_value_dict = bids_dict.categorical_info['value']
    bids_trial_type_dict = bids_dict.categorical_info['trial_type']

    # Check number of values match for cond_code == trial_type
    for key1, count_eeg in eeg_cond_dict.items():
        count_orig = bids_trial_type_dict[key1]
        if count_eeg != count_orig:
            print(f"EEG key {key} cond_code {count_eeg} != orig trial_type {count_orig}")

    # Check number of values match for cond_code == trial_type
    for key1, count_eeg in eeg_type_dict.items():
        count_orig = bids_value_dict[key1]
        if count_eeg != count_orig:
            print(f"EEG key {key} type {count_eeg} != orig value {count_orig}")


### Check EEG relationships

1. EEG  `cond_code` + `event_code` == EEG `type` unless `cond_code` == 0 or
`event_code` == 202
2. EEG `event_code` == 255

In [None]:
# for key, file in eeg_file_dict.items():
#     df_eeg = get_new_dataframe(file)
#     df_eeg.drop(['sample_offset', 'latency', 'urevent', 'usertags'], axis=1, inplace=True)
#     df_eeg['new_col'] = df_eeg['cond_code'].map(str) + df_eeg['event_code'].map(str)
#     code_255_col = df_eeg['event_code'].map(str) == '255'
#     trial_col = df_eeg['cond_code'].map(str) != '0'
#     pause_col = df_eeg['event_code'].map(str) != '202'
#     type_col = df_eeg['type'].map(str) != '202'
#     comp_col = df_eeg['new_col'].map(str) != df_eeg["type"].map(str)
#     x = comp_col & trial_col & pause_col
#     y = (type_col & ~pause_col) | (~type_col & pause_col)
#     print(f"{key}: has {sum(x)} event_code and {sum(y)} 202 type disagreements :")
#     for index, value in x.iteritems():
#         if value:
#             row = df_eeg.loc[index]
#             print(f"Key {key} index {index}: event_code:{row['event_code']} type:{row['type']} cond_code:{row['cond_code']}")
#
#     for index, value in y.iteritems():
#         if value:
#             row = df_eeg.loc[index]
#             print(f"Key {key} index {index}: event_code:{row['event_code']} type:{row['type']}")
#
#     for index, value in code_255_col.iteritems():
#         if value:
#             row = df_eeg.loc[index]
#             print(f"Key {key} index {index}: event_code:{row['event_code']} type:{row['type']}")
#