## Merge a HED tag spreadsheet with an existing JSON sidecar.

This notebook merges a 4-column spreadsheet with an existing JSON sidecar.
Although the merge does not require any information except the spreadsheet
and the sidecar, this notebook constructs a test by first extracting a JSON
sidecar from the information from dataset event files and transforming
it into a spreadsheet. The merge is performed by merging this
Keys are specified by a `entities` tuple lists the BIDS entity names
to include in the key.
BIDS base file names are constructed of entity *name*-*value* pairs separated
by underbars and followed by an ending *_suffix*.

For a file name `sub-001_ses-3_task-target_run-01_events.tsv`,
the tuple ('sub', 'task') gives a key of `sub-001_task-target`,
while the tuple ('sub', 'ses', 'run) gives a key of `sub-001_ses-3_run-01`.
The use of dictionaries of file names with such keys makes it
easier to associate related files in the BIDS naming structure.

To use this notebook, substitute the specifics of your BIDS
dataset for the following variables:

| Variable | Purpose |
| -------- | ------- |
| json_sidecar_path | Full path to root directory of dataset.|
| spreadsheet_path | List of directories to exclude when constructing file lists. |
| entities  | Tuple of entity names used to construct a unique keys representing filenames. <br>(See [Dictionaries of filenames](https://hed-examples.readthedocs.io/en/latest/HedInPython.html#dictionaries-of-filenames-anchor) for examples of how to choose the keys.)|
| skip_columns  | List of columns in the `events.tsv` files to skip in the analysis. |
| value_columns | List of columns in the `events.tsv` files to annotate as<br>as a whole rather than by individual column value. |

**Note:** To actually do the merge of spreadsheet into a JSON sidecar,
you only need to have the file names of the spreadsheet and the JSON sidecar.

For large datasets, you will want to be sure to exclude columns such as
`onset` and `sample`, since the summary produces counts of the number of times
each unique value appears somewhere in an event file.

When run, the script extracts a JSON sidecar and converts it to a spreadsheet
for illustrating the merge. To merge, the spreadsheet is converted back and then
merged with another JSON sidecar.

The example below uses a
[small version](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed)
of the Wakeman-Hanson face-processing dataset available on openNeuro as
[ds003645](https://openneuro.org/datasets/ds003645/versions/2.0.0).

In [None]:
import os
import json
from hed.tools import df_to_hed, hed_to_df, merge_hed_dict

# Create a test spreadsheet for the merge
bids_sidecar_path =  os.path.realpath('../../../datasets/eeg_ds003645s_hed/task-FacePerception_events.json')
with open(bids_sidecar_path) as fp:
    sidecar_json = json.load(fp)
test_spreadsheet = hed_to_df(sidecar_json)

# Use an empty sidecar to merge into, but any valid sidecar will work
target_sidecar = {}
# Must convert the spreadsheet to a sidecar before merging
test_sidecar = df_to_hed(test_spreadsheet, description_tag=False)
merge_hed_dict(target_sidecar, test_sidecar)
merged_json = json.dumps(target_sidecar, indent=4)
print(merged_json)