## Preliminary restructuring of Sternberg
This script assumes that `sternberg_preliminary_summary.ipynb` has been run
and basic manual editing has occurred.

In order to compare the events coming from the BIDS events files and those
from the EEG.set files, the script creates dictionaries of `key` to full path
for each type of file.  The `key` is of the form `sub-xxx_run-y` which
uniquely specify each event file in the dataset. If a dataset contains
multiple sessions for each subject, the `key` should include additional
parts of the file name to uniquely specify each subject.

Since conversions from samples to seconds are performed, the script
must know about sampling rates.
This script assumes a two-column `_sampling.tsv` file has been created containing
the keys and sampling rates of the files for easy lookup.

### BIDS events restructuring
Create `events_temp2.tsv` files restructured as follows:
 1. Drop columns `response_time`, `trial_type`, and `stim_file`.
 2. Convert the `duration` column from samples to seconds so that it is compliant with BIDS.
 3. Convert the `value` column to have all string values.
 4. Replace `value` column empty slots and slots with `empty` with `n/a`.
 5. Remove `boundary` events from beginning of files.
 6. Add which has columns `event_type`, `task_role`, `memory_cond`, `trial` and
`letter`  filled with 'n/a'.

### EEG.event restructuring
Create `events_temp3.tsv` files restructured as follows:
 1. Drop columns `TTime`, `Uncertainy`, `Uncetainty2`, `ReqTime`, `ReqDur`,
`init_index`, `init_time`, and `urevent`.
 2. Convert the `duration` column from samples to seconds so that it is compliant with BIDS.
 3. Convert the `type` column to have all string values.
 4. The `type` column has some empty slots which are replaced with 'n/a'.
 5. Remove boundary events from beginning of files.

This script assumes that the data is in BIDS format and that each BIDS events
file of the form `_events.tsv` has a corresponding events file with
suffix `_events_temp.tsv` that was previously dumped from the `EEG.set` files.
The new files will be saved in `_events_temp2.tsv` (BIDS) and `_events_temp3.tsv` (EEG)
respectively.


In [1]:
from hed.util import get_file_list, make_file_dict
bids_root_path = 'G:/Sternberg/SternbergWorking/'
bids_delete = ['response_time', 'trial_type', 'stim_file']
bids_add = ['event_type', 'task_role', 'memory_cond', 'trial', 'letter']
eeg_delete = ['urevent', 'ReqTime', 'ReqDur', 'init_index', 'init_time', 'TTime',
            'event_code', 'cond_code', 'sample_offset', 'Uncertainty', 'Uncertainty2']

files_bids = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")
bids_file_dict = make_file_dict(files_bids)
files_eeg = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp")
eeg_file_dict = make_file_dict(files_eeg, name_indices=(0, 2))

In [2]:
from hed.util.data_util import get_new_dataframe
from hed.util.io_util import make_key

sampling_file = 'G:/Sternberg/Sternberg_sampling.tsv'
sampling_df = get_new_dataframe(sampling_file)

srate_dict = {}
for index, row in sampling_df.iterrows():
    key = make_key(row['name'], indices=(0, 2))
    srate_dict[key] = row['srate']
    print(f"{key}: {srate_dict[key]}")

sub-001_run-1: 250.0
sub-001_run-2: 250.0
sub-001_run-3: 250.0
sub-001_run-4: 250.0
sub-002_run-1: 250.0
sub-002_run-2: 250.0
sub-002_run-3: 250.0
sub-002_run-4: 250.0
sub-003_run-1: 250.0
sub-003_run-2: 250.0
sub-003_run-3: 250.0
sub-003_run-4: 250.0
sub-004_run-1: 250.0
sub-004_run-2: 250.0
sub-004_run-3: 250.0
sub-004_run-4: 250.0
sub-005_run-1: 250.0
sub-005_run-2: 250.0
sub-005_run-3: 250.0
sub-005_run-4: 250.0
sub-006_run-1: 250.0
sub-006_run-2: 250.0
sub-006_run-3: 250.0
sub-006_run-4: 250.0
sub-007_run-1: 250.0
sub-007_run-2: 250.0
sub-007_run-3: 250.0
sub-007_run-4: 250.0
sub-008_run-1: 250.0
sub-008_run-2: 250.0
sub-008_run-3: 250.0
sub-008_run-4: 250.0
sub-009_run-1: 250.0
sub-009_run-2: 250.0
sub-009_run-3: 250.0
sub-009_run-4: 250.0
sub-010_run-1: 250.0
sub-010_run-2: 250.0
sub-010_run-3: 250.0
sub-010_run-4: 250.0
sub-011_run-1: 250.0
sub-011_run-2: 250.0
sub-011_run-3: 250.0
sub-011_run-4: 250.0
sub-012_run-1: 250.0
sub-012_run-2: 250.0
sub-012_run-3: 250.0
sub-014_run-1

In [3]:
# import pandas as pd
# from hed.util.data_util import add_columns, get_new_dataframe, delete_columns, \
#     delete_rows_by_column, replace_values
#
# print(f"\nBIDS form of the events: {len(files_bids)} files")
# for key, file in bids_file_dict.items():
#     df_bids = get_new_dataframe(file)
#
#     # Delete the specified columns
#     delete_columns(df_bids, bids_delete)
#
#     # Convert the duration to seconds from samples, handling empty or nan
#     srate = srate_dict[key]
#     df_bids['duration'] = pd.to_numeric(df_bids['duration'], errors='coerce')
#     df_bids['duration'] = df_bids['duration'].div(srate)
#     df_bids['duration'] = df_bids['duration'].fillna('n/a')
#
#     # Convert empty value columns to 'n/a'
#     replace_values(df_bids, values=['', 'empty'], column_list=['value'])
#
#     # Remove boundary events
#     delete_rows_by_column(df_bids, 'boundary', column_list=['value'])
#
#     # Add columns with 'n/a'
#     add_columns(df_bids, bids_add)
#
#     # Save the file
#     filename = file[:-4] + "_temp2.tsv"
#     df_bids.to_csv(filename, sep='\t', index=False)


BIDS form of the events: 85 files


In [4]:
# import pandas as pd
# from hed.tools.data_utils import get_new_dataframe, delete_columns, delete_rows_by_column, replace_values
#
# print(f"\nEEG form of the events: {len(files_eeg)} files")
# for key, file in eeg_file_dict.items():
#     print(f"{key} processing {file}")
#     df_eeg = get_new_dataframe(file)
#
#     # Delete the specified list of columns
#     delete_columns(df_eeg, eeg_delete)
#
#     # Convert the duration to seconds from samples, handling empty or nan
#     srate = srate_dict[key]
#     df_eeg['duration'] = pd.to_numeric(df_eeg['duration'], errors='coerce')
#     df_eeg['duration'] = df_eeg['duration'].div(srate)
#     df_eeg['duration'] = df_eeg['duration'].fillna('n/a')
#
#     # Convert '' and 'empty' type columns to 'n/a'
#     replace_values(df_eeg, values=['', 'empty'], column_list=['type'])
#
#     # Remove rows with 'boundary' in 'type' column
#     delete_rows_by_column(df_eeg, 'boundary', column_list=['type'])
#
#     # Save the files
#     filename = file[:-4] + "3.tsv"
#     df_eeg.to_csv(filename, sep='\t', index=False)


EEG form of the events: 85 files
sub-001_run-1 processing G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-1_events_temp.tsv
sub-001_run-2 processing G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-2_events_temp.tsv
sub-001_run-3 processing G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-3_events_temp.tsv
sub-001_run-4 processing G:/Sternberg/SternbergWorking/sub-001\ses-01\eeg\sub-001_ses-01_task-Experiment_run-4_events_temp.tsv
sub-002_run-1 processing G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-1_events_temp.tsv
sub-002_run-2 processing G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-2_events_temp.tsv
sub-002_run-3 processing G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub-002_ses-01_task-Experiment_run-3_events_temp.tsv
sub-002_run-4 processing G:/Sternberg/SternbergWorking/sub-002\ses-01\eeg\sub