## Performs the initial summary and checking of the Attention Shift

This script does cross-checking for consistency after the initial `_events_temp1.tsv`
files are produced. The `as_01_initial_combination.ipnb` script performs
the initial file production.  If there are errors or inconsistencies in this
result, then additional data cleaning will need to be performed on the input
files and everything rerun.

## The following should be done:

1. `sub-004` is problematic: `sub-004_run-01` is very short and `sub-004_run-02` has
set file that fails to read and is truncated. If the underlying data cannot be
recovered, `sub-004` should be removed completely.
2. `sub-020_run-01`, `sub-021_run-01`, and `sub-022_run-01` should be deleted and the
corresponding `run-02` files renamed as `run-01`.  The original `run-01` files only contain
pulse codes.
3. `sub-007_run-01` has 3 events with `event_code` value of 255.
These are the first 3 events of the file and should be removed.
4. `sub-008_run-01` only has `cond_code` equals 1. It has 5920 events.
It has 2874 events that should only occur when `cond_code` is 3. It seems that
the `cond_code` should be adjusted to correctly reflect this.
5. `sub-005_run-01` has 5 shift event codes in a focus condition.
This should be examined. The bad codes are events #s in the interval [266, 270].
6. `sub-015_run-01` has 239 focus event codes in a shift condition.
This should be examined. The bad codes are events #s in the interval [4106, 4388].
7. `sub-036_run-02` had 721 focus events in shift.
The bad codes are events #s in the interval [3883, 4750]. Note: `sub-036_run-01`
is very short.

### Checking for forbidden codes
       Codes 1 and 2 can appear anywhere
       Codes 3 through 6 should appear only in the focus condition.
       Codes 7 through 14 should appear only in the shift condition.
       Codes 199, 201, 202, and 255 are not related to condition.

In [1]:
from hed.tools.hed_logger import HedLogger
from hed.tools.io_util import get_file_list, make_file_dict
from hed.tools.data_util import get_new_dataframe
from hed.tools.map_util import make_combined_dicts

# Set up the logger
status = HedLogger()

# Make the dictionaries of the events.tsv files and the EEG.set events files
bids_root_path = 'G:\AttentionShift\AttentionShiftWorking'
bids_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events_temp1")
bids_dict = make_file_dict(bids_files, indices=(0, 2))
bids_skip_cols = ['onset', 'duration', 'sample', 'latency', 'sample_offset']

print('\nBIDS events summary:')
bids_dicts_all, bids_dicts =  make_combined_dicts(bids_dict, skip_cols=bids_skip_cols)
bids_dicts_all.print()



BIDS events summary:
Summary for column dictionary :
  Categorical columns (2):
    cond_code (4 distinct values):
      0: 306
      1: 59410
      2: 55206
      3: 172519
    event_code (17 distinct values):
      1: 11703
      10: 4702
      11: 37548
      12: 37524
      13: 18778
      14: 18779
      199: 306
      2: 11701
      201: 29028
      202: 927
      3: 9296
      4: 9301
      5: 37171
      6: 37167
      7: 9406
      8: 9408
      9: 4696
  Value columns (0):


In [2]:
print("Isolating the bad codes:")
for key, file in bids_dict.items():
    df_bids = get_new_dataframe(file)

    focus_cond_mask = df_bids['cond_code'].map(str).isin(['1', '2'])
    shift_cond_mask = df_bids['cond_code'].map(str).isin(['3'])
    focus_event_mask = df_bids['event_code'].map(str).isin(['3', '4', '5', '6'])
    shift_event_mask = df_bids['event_code'].map(str).isin(['7', '8', '9', '10', '11', '12', '13', '14'])
    bad_focus = sum(focus_cond_mask & shift_event_mask)
    if bad_focus:
        status.add(key, f"{key} has {bad_focus} shift event codes in a focus condition")

    bad_shift = sum(shift_cond_mask & focus_event_mask)
    if bad_shift:
        status.add(key, f"{key} has {bad_shift} focus event codes in a shift condition")

    bad_cond_mask = df_bids['cond_code'].map(str).isin(['0'])
    if sum(bad_cond_mask):
        status.add(key, f"{key} has {sum(bad_cond_mask)} cond_code values of 0")

    pulse_code_mask = df_bids['event_code'].map(str).isin(['199'])
    if sum(pulse_code_mask):
        status.add(key, f"{key} has {sum(pulse_code_mask)} event_code values of 199")

    pulse_combo_count = sum(pulse_code_mask & bad_cond_mask)
    if pulse_combo_count:
        status.add(key, f"{key} has {pulse_combo_count} event_code values of 199 with cond_code 0")

    unknown_count = sum(df_bids['event_code'].map(str).isin(['255']))
    if unknown_count:
        status.add(key, f"{key} has {unknown_count} event_code values of 255")

    pause_count = sum(df_bids['event_code'].map(str).isin(['202']))
    if pause_count:
        status.add(key, f"{key} has {pause_count} event_code values of 202")



Isolating the bad codes:


In [3]:
key = 'sub-005_run-01'
df = get_new_dataframe(bids_dict[key])
shift_event_mask_005 = df['event_code'].map(str).isin(['7', '8', '9', '10', '11', '12', '13', '14'])
focus_cond_mask_005 = df['cond_code'].map(str).isin(['1', '2'])
bad_focus_005 = focus_cond_mask_005 & shift_event_mask_005
df_cond_index = df.index[bad_focus_005].tolist()
status.add(key, f"{key} has {len(df_cond_index)} bad focus events at {str(df_cond_index)}", also_print=True)


sub-005_run-01 has 5 bad focus events at [266, 267, 268, 269, 270]


In [4]:
key = 'sub-008_run-01'
df = get_new_dataframe(bids_dict[key])
shift_event_mask_008 = df['event_code'].map(str).isin(['7', '8', '9', '10', '11', '12', '13', '14'])
focus_cond_mask_008 = df['cond_code'].map(str).isin(['1', '2'])
bad_focus_008 = focus_cond_mask_008 & shift_event_mask_008
df_cond_index = df.index[bad_focus_008].tolist()
status.add(key, f"{key} has {len(df_cond_index)} bad focus events in " +
           f"[{min(df_cond_index)}, {max(df_cond_index)}]", also_print=True)

sub-008_run-01 has 2874 bad focus events in [663, 4196]


In [5]:
key = 'sub-015_run-01'
df = get_new_dataframe(bids_dict[key])
focus_event_mask_015 = df['event_code'].map(str).isin(['3', '4', '5', '6'])
shift_cond_mask_015 = df['cond_code'].map(str).isin(['3'])
bad_shift_015 = shift_cond_mask_015 & focus_event_mask_015
df_cond_index = df.index[bad_shift_015].tolist()
status.add(key, f"{key} has {len(df_cond_index)} bad shift events in " +
           f"[{min(df_cond_index)}, {max(df_cond_index)}]", also_print=True)

sub-015_run-01 has 239 bad shift events in [4106, 4388]


In [6]:
key = 'sub-036_run-02'
df = get_new_dataframe(bids_dict[key])
focus_event_mask_036 = df['event_code'].map(str).isin(['3', '4', '5', '6'])
shift_cond_mask_036 = df['cond_code'].map(str).isin(['3'])
bad_shift_036 = shift_cond_mask_036 & focus_event_mask_036
df_cond_index = df.index[bad_shift_036].tolist()
status.add(key, f"{key} has {len(df_cond_index)} bad shift events in " +
           f"[{min(df_cond_index)}, {max(df_cond_index)}]", also_print=True)

sub-036_run-02 has 721 bad shift events in [3883, 4750]


In [7]:
status.print_log()

sub-001_run-01
	sub-001_run-01 has 6 event_code values of 202
sub-002_run-01
	sub-002_run-01 has 20 event_code values of 202
sub-003_run-01
	sub-003_run-01 has 20 event_code values of 202
sub-004_run-01
	sub-004_run-01 has 1 event_code values of 202
sub-004_run-02
	sub-004_run-02 has 19 event_code values of 202
sub-005_run-01
	sub-005_run-01 has 5 shift event codes in a focus condition
	sub-005_run-01 has 6 event_code values of 202
	sub-005_run-01 has 5 bad focus events at [266, 267, 268, 269, 270]
sub-006_run-01
	sub-006_run-01 has 20 event_code values of 202
sub-007_run-01
	sub-007_run-01 has 18 event_code values of 202
sub-008_run-01
	sub-008_run-01 has 2874 shift event codes in a focus condition
	sub-008_run-01 has 11 event_code values of 202
	sub-008_run-01 has 2874 bad focus events in [663, 4196]
sub-009_run-01
	sub-009_run-01 has 26 event_code values of 202
sub-010_run-01
	sub-010_run-01 has 18 event_code values of 202
sub-011_run-01
	sub-011_run-01 has 21 event_code values of 2