## Validate HED in a BIDS dataset that uses library schema.

Validating annotations HED as you develop them makes the annotation process much easier and
faster to debug. This notebook validates HED in a BIDS dataset.

The tool creates a `BidsDataset` object, which represents the information from a BIDS
dataset that is relevant to HED, including the `dataset_description.json`,
all `events.tsv` files, and all `events.json` sidecar files.

The `validate` method of `BidsDataset` first validates all of the `events.json` sidecars
and then assembles the relevant sidecars for each `events.tsv` file and validates it.
The validation uses the HED schemas specified in the `HEDVersion` field of the
dataset's `dataset_description.json` file.

The script does the following steps:

1. Set the dataset location (`bids_root_path`) to the absolute path of the root of your BIDS dataset.
2. Indicates whether to check for warnings during validation (`check_for_warnings`).
3. Create a `BidsDataset` for the dataset.
4. Validate the dataset and output the issues.

**Note:** This validation pertains to event files and HED annotation only. It does not do a full BIDS validation.

The example below uses a
[small version](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed)
of the Wakeman-Hanson face-processing dataset available on openNeuro as
[ds003645](https://openneuro.org/datasets/ds003645/versions/2.0.0).

This dataset has no validation errors, but since we have set `check_for_warnings` to `True`,
validation returns warnings that the `sample` column does not have any metadata.

For validation of a single `events.json` file during annotation development,
users often find the [online sidecar tools](https://hedtools.ucsd.edu/hed/sidecar)
convenient, but the online tool does not provide complete dataset-level validation.

In [1]:
from hed.errors import get_printable_issue_string
from hed.schema import HedSchemaGroup, load_schema, load_schema_version
from hed.tools import BidsDataset

## Set the dataset location and the check_for_warnings flag
check_for_warnings = False
bids_path = '../../../datasets/eeg_ds003645s_hed_library'
bids = BidsDataset(bids_path)

## Validate the dataset using the information from the dataset_description
print(f"Handling a BIDS data set that uses dataset_description")
issue_list1 = bids.validate(check_for_warnings=check_for_warnings)
if issue_list1:
    issue_str1 = get_printable_issue_string(issue_list1, "HED validation errors: ", skip_filename=False)
else:
    issue_str1 = "No HED validation errors when dataset_description is used"
print(issue_str1)

## Now validate URLs
print("\nNow validating with the prerelease schema.")
base_version = '8.2.0'
library1_url = "https://raw.githubusercontent.com/hed-standard/hed-schemas/main/" + \
               "library_schemas/score/hedxml/HED_score_1.0.0.xml"
library2_url = "https://raw.githubusercontent.com/hed-standard/hed-schemas/main/" + \
               "library_schemas/testlib/hedxml/HED_testlib_1.0.2.xml"
schema_list = [load_schema_version(xml_version=base_version),
               load_schema(library1_url, schema_namespace="sc"),
               load_schema(library2_url, schema_namespace="test")]
bids.schema = HedSchemaGroup(schema_list)


issue_list2 = bids.validate(check_for_warnings=check_for_warnings)
if issue_list2:
    issue_str2 = get_printable_issue_string(issue_list2, "HED validation errors: ", skip_filename=False)
else:
    issue_str2 = "No HED validation errors when schemas are passed"
print(issue_str2)

Handling a BIDS data set that uses dataset_description
No HED validation errors when dataset_description is used

Now validating with the prerelease schema.
No HED validation errors when schemas are passed
