# 2. Creating a convention

A convention can also be defined in a YAML (or JSON) file. It consists of three parts:

    1. General information (keywords must start and end with double underscores)
    2. Definition of *standard attributes*
    3. Definition of *standard attribute validators*

1. General information at the header indicated by double underscores:

2. Definition of standard attributes

A `standard_attribute` has various properties like the `description`, `target_method` (during which the standard attribute can be passed), the `validator`, which validates the input and the `default_value`. The latter can be "\\$EMPTY" indicating that no default value is set, and thus this attribute is obligatory. "\\$NONE" indicates, that the attribute is optional. And finally, a valid value can be given to be written even if no input is provided for this standard attribute.

3. Special type definitions

Here the allowed values for the standard attribute `data_type` is listed:

The heart of standard attributes is the **validator**. A validator becomes effective when metadata is written.
The Flow chart below illustrates the writing and reading procedure for the example of writing the attribute "units".

1. Writing (`ds.units = 'm/s'`): If "units" is defined in the convention, the validator checks the value. "m/s" is a correct unit, so it will be written. Otherwise, invalid values raise an error.
3. Reading (`print(ds.units)`): The validator becomes effective upon reading attributes, that are standardized. However, then invalid value only raise warnings, in order to allow the user to still work with the file and fix the issue.

<img src="../../_static/h5RDMtoolbox_standard_attribute_concept.png" alt="h5RDMtoolbox_standard_attribute_concept.png" width="800"/>

## Reading a Convention from a file:

Let's read the above example file into the class `Convention`. The object representation displays the standard attributes which are expected for the root group (`File.__init__()`), group creation (`Group.create_group()`) and dataset creation (`Group.create_dataset()`).

Note, that the standard attributes, which are marked **bold**, are obligatory. The others may or may not be provided during object creation:

In [1]:
from h5rdmtoolbox import convention
import h5rdmtoolbox as h5tbx

In [2]:
h5tbx.convention.utils.yaml2json('example_convention.yaml')

WindowsPath('example_convention.json')

In [3]:
cv = h5tbx.convention.from_yaml('example_convention.yaml')
# you may also use .from_json('example_convention.json')
cv

Convention("h5rdmtoolbox-tuturial-convention")

In order to make the convention affective in this session, it must be enabled. We do this by calling `use()`:

In [4]:
h5tbx.use(cv)

using("h5rdmtoolbox-tuturial-convention")

Now, we will get an error if we create a HDF5 file without providing the attribute `contact_id`. As we made it a required attribute, it must be provided during file initialization:

In [5]:
try:
    with h5tbx.File() as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Convention "h5rdmtoolbox-tuturial-convention" expects standard attribute "contact_id" to be provided as an argument during file creation.


Providing a wrong value raises an error, too:

In [6]:
try:
    with h5tbx.File(contact_id='id1722') as h5:
        h5.create_dataset(name='velocity', shape=(3, 4), units='m/s', comment='velocity field')
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Validation of "velocity field" for standard attribute "comment" failed.
Expected fields: {'comment': FieldInfo(annotation=str, required=True, metadata=[WrapValidator(func=<function regex_0 at 0x000001DC04E029D0>)])}
Pydantic error: 1 validation error for comment
comment
  Value error, Invalid format for pattern [type=value_error, input_value='velocity field', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/value_error


Now, we got it:

In [7]:
with h5tbx.File(contact_id='id1722') as h5:
    h5.dump()

Note, that if we were to reopen the file not in read-only (r) but in read-write mode, then the standard attributes which already exist are not checked again. So if the HDF5 was written with another package, e.g. h5py, then the value might be wrong:

In [8]:
with h5tbx.File(name=h5.hdf_filename, mode='r+') as h5:
    pass # note, that we were not required to pass "data_type" as it was present already!

Note, that a convention can also be enabled only **temporarily** using the context manager syntax:

In [9]:
with h5tbx.use(cv):
    with h5tbx.File(contact_id='id1722') as h5:
        pass

## Importing/Loading an online convention

The intended distribution of convention is via online repositories. The YAML file hence should be uploaded such it is accessible to all users. The `h5RDMtoolbox` currently favors the usage of [Zenodo](https://zenodo.org) repositories. The advantages are long-term storage and assignment of a DOI. However, files accessible via an URL can also be downloaded.

A tutorial convention is published [here](https://zenodo.org/record/8276817). By calling `from_zenodo()` the convention object is created:

In [10]:
cv = h5tbx.convention.from_zenodo(doi_or_recid='10156750')
cv

Convention("h5rdmtoolbox-tutorial-convention")

## Effect of enabling a convention

The convention above defined the usage of certain attributes with certain methods. E.g. "data_type" is to be used when a HDF5 file is created. When the convention is enabled, the **signature of the respective methods is changed**. To proof this, let's implement a small function, which prints all parameters of a given function and inspect the effect of the convention in the `__init__` method:

In [11]:
cv.properties[h5tbx.Dataset]['standard_name']

<StandardAttribute@create_dataset[keyword/optional]("standard_name"): default_value="None" | "Standard name of the dataset. If not set, the long_name attribute must be given.">

In [12]:
import inspect

def print_method_parameters(method):
    print(f'\nParameters for "{method.__name__}":')
    for param in inspect.signature(method).parameters.values():
        if not param.name == 'self':
            if param.name in h5tbx.convention.get_current_convention().methods[h5tbx.File].get('__init__', {}).keys():
                print(f'  - {h5tbx._repr.make_bold(param.name)}')
            else:
                print(f'  - {param.name}')

methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)

print('no convention: ')
h5tbx.use(None)
print_method_parameters(h5tbx.File.__init__)

print(f'\n------------\nwith convention {cv.name}: (standard attributes are made bold)')
h5tbx.use(cv)
print_method_parameters(h5tbx.File.__init__)

no convention: 

Parameters for "__init__":
  - name
  - mode
  - attrs
  - kwargs

------------
with convention h5rdmtoolbox-tutorial-convention: (standard attributes are made bold)

Parameters for "__init__":
  - name
  - mode
  - attrs
  - [1mdata_type[0m
  - [1mstandard_name_table[0m
  - [1mcomment[0m
  - [1mcontact[0m
  - [1mreferences[0m
  - kwargs


In [13]:
h5tbx.use(None)  # fall back to the default convention

using("h5py")