## Check BIDS stim_file consistency
** In development**

BIDS requires that files in the `stimuli` directory appear somewhere
in the event files in the `stim_file` column.
This script creates a dictionary of the `stim_file` files used in the dataset and
eliminates the unused stimuli files.

You can choose whether to delete the extra stim files. The steps are:

1. Set the dataset location (`bids_root_path`) to the absolute path to the root of your BIDS dataset).
2. Get a list of the event files in the BIDS dataset.
3. Create a dictionary of the unique entries in the `stim_file` column of the events files.
4.

The example below uses a small version of the Wakeman-Hanson face-processing dataset
available on openNeuro as ds003654.

In [1]:
import os
from hed.tools import HedLogger
from hed.util import get_file_list, get_new_dataframe


remove_extra_files = False
stim_file_exts = [".bmp"]
dataset = "eeg_ds003654s"
bids_root_path =  os.path.join(os.path.dirname(os.path.abspath('')), os.path.join('../../datasets/', dataset))
bids_root_path = os.path.abspath(bids_root_path)
bids_files = get_file_list(bids_root_path, extensions=[".tsv"], name_suffix="_events")

## Make a dictionary of the unique stim files in the event files
total_events = 0
stim_dict = {}
for file in bids_files:
    df = get_new_dataframe(file)
    for value in df['stim_file'].items():
        stim_dict[os.path.basename(value)] = True

## Find out what stim files are actually there
stimuli_path = os.path.abspath(os.path.join(bids_root_path, 'stimuli'))
missing_files = []
extra_files = []
stim_files = get_file_list(stimuli_path, extensions=stim_file_exts)
remove_count = 0
status = HedLogger()
for file in stim_files:
    basename = os.path.basename(file)
    if basename in stim_dict:
        continue

    os.remove(file)
    remove_count += 1
    status.add(dataset, f"Removed {remove_count} out of {len(stim_files)} files", also_print=True)

status.print_log()

Bids root path: D:\Research\HED\hed-examples\datasets\eeg_ds003654s

BIDS event files: 6
Total events: 3362 unique stimuli: 346
