# Introduction to Conventions

The standardization of attributes is important in order to reliable find specific datasets in a file. Typos and attribute naming based on personal preferences must be avoided. Even if humans may still be able to understand such data, especially automatic exploration and processing of such data will be impossible.

The toolbox provides as solution based on standardized HDF5 attributes explained in this chapter

<!-- 

The `h5RDmtoolbox` lets you specify rules for those "special attributes". We will call them `standard attributes` and a collection of it a `convention` (More on this [here](conventions.ipynb)).

<!-- If an attribute is addressed by the user, e.g. the attribute `units`, and a standard attribute implementation exists for this name, then the value is processed by the respective rule and the attribute is set or an error is raised in case of a invalid input.

Standard attributes can be made required **during dataset creation,** for instance. This enforces users to pass certain meta information and validates it at the same time. Consequently, data becomes re-usable and explorable.

Additionally, so-called [layouts](./layouts.ipynb) can be defined, too. They are used to specify the content of an HDF5 file after it has been written. This concept applies best during file exchange, as the layout validates if a file is complete and meets the expectation of the project or collaborative user. -->

<!-- 
## Concept
The figure below illustrates the general concept. Standard attributes are defined by the user and added to a convention. A registered convention is activated by calling `.use(<name of convention>)`. By doing so, the signature of the methods `create_dataset`, `create_group` and `__init__` are modified according to the generated standard attributes. Moreover, the docstring will be updated, too, as we will see later.


<img src=concept_of_std_attrs.png width=800px>


Let's see how this is done in practice: -->

In [1]:
import h5rdmtoolbox as h5tbx

## Definition of Standard Attributes in a Convention (YAML file)

Standard attributes are defined in YAML files and can be read by the class `Convention`. Each entry of the YAML file corresponds to an object of class `StandardAttribute`.

An example of a convention defined in a YAML file is shown below. It first contains some lines of general information (name, contact, etc) and then is followed by definitions of standard attributes (in this case one, it is called "data_type")

Let's read this file into the class `Convention`:

In [2]:
cv = h5tbx.conventions.from_yaml('example_convention.yaml')
cv

[1mConvention("h5rdmtoolbox-tuturial-convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [1mdata_type[0m:
		Type of data in file. Can be numerical or experimental

In order to make the convention affective in this session, it must be enabled. We do this by calling `use()`:

In [3]:
h5tbx.use(cv)  # or h5tbx.ue('h5rdmtoolbox-tuturial-convention') or cv.use()

using("h5rdmtoolbox-tuturial-convention")

Now, we will get an error if we create a HDF5 file without providing the attribute "data_type". As we made it a required attribute, it must be provided during file initialization:

In [4]:
try:
    with h5tbx.File() as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)

The standard attribute "data_type" is required but not provided.


Providing a wrong value raises an error, too:

In [5]:
try:
    with h5tbx.File(data_type='observational') as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Setting "observational" for standard attribute "data_type" failed. Original error: The value "observational" is not in ['experimental', 'numerical']. Expecting one of these: ['experimental', 'numerical']


Now, we got it:

In [6]:
with h5tbx.File(data_type='numerical') as h5:
    h5.dump()

If we provide a default value for "data_type" in the YAML file, it will not be required by the `__init__` method:

In [7]:
cv.properties[h5tbx.File]['data_type'].make_optional()
h5tbx.use(None)
h5tbx.use(cv)
cv

[1mConvention("h5rdmtoolbox-tuturial-convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [3mdata_type[0m (default=None):
		Type of data in file. Can be numerical or experimental

In [8]:
with h5tbx.File() as h5:
    h5.dump()

In [9]:
try:
    with h5tbx.File() as h5:
        h5.data_type='observational'
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Setting "observational" for standard attribute "data_type" failed. Original error: The value "observational" is not in ['experimental', 'numerical']. Expecting one of these: ['experimental', 'numerical']


In [10]:
with h5tbx.File() as h5:
    h5.data_type='experimental'

In [11]:
-------------

SyntaxError: invalid syntax (1241381794.py, line 1)

## The Standard Attribute

Suppose, we would like users to write the attribute "data_type" in the root group of each file. The user has two valid options for the value of the attribute: 'numerical' or 'experimental'.

In [None]:
data_type = h5tbx.conventions.StandardAttribute(
    name='data_type',
    validator={'$in': ['numerical', 'experimental']},
    target_methods='__init__',
    description='Type of data. Can be numerical or experimental'
)
data_type

In [None]:
data_type.to_dict()

In [None]:
dwadawd

In [None]:
comment = h5tbx.conventions.StandardAttribute(
    name='comment',
    validator={'$regex': "^[A-Z].*$"},
    target_methods=('__init__', 'create_dataset','create_group'),
    description='Additional information about the file'
)
comment

## Standard attributes

Based on the figure above, we need to define two standard attributes. The first one is called "units" and becomes relevant, when a user creates a new dataset. The second one is called "comment" and can be passed during file, dataset or group creation. This attribute is optional while "units" is mandatory.

**The comment attribute:**
The module `h5tbx.conventions` provides the class `StandardAttribute`. It requires the `name`, a `validator`, information about where to apply the standard attribute (`method`) and a `description`:

In [None]:
comment = h5tbx.conventions.StandardAttribute(
    name='comment',
    validator={'$regex': "^[A-Z].*$"},
    target_methods=('__init__', 'create_dataset','create_group'),
    description='Additional information about the file'
)
comment

The `validator` used here is regular expression. This means, that the user input is matched with the given pattern ('^[A-Z].*$')

For the "units"-attribute, we use another already implemented `validator`, namely "$pintunits":

The second standardized attribute is called "contact". The attribute is mandatory for the root group and be one or multiple researcher IDs (ORCID IDs). To check, whether the ORCID ID is valid, the built-in `Validator` "$orcid" is used:

In [None]:
units = h5tbx.conventions.StandardAttribute(
    name='units',
    validator='$pintunit',
    target_methods='create_dataset',
    description='The physical units of the dataset'
)
units

### Validators

The following `validators` are availbale:

In [None]:
list(h5tbx.conventions.standard_attributes.av_validators.keys())

Some validators **require reference values**. One example would be the `$in`-validator, where a list of expected values must be provided. To find out how a validator is used, call the help for the respected validator:

In [None]:
help(h5tbx.conventions.standard_attributes.av_validators['$in'])

## Conventions: Enable standard attributes

Conventions contain one or multiple standard attributes. Below, we create one with the prior defined attributes:

In [None]:
# provide a name and an ORCID for the creator(s) of the convention:
my_convention = h5tbx.conventions.Convention('my_convention',
                                            contact='https://orcid.org/0000-0001-8729-0482')
my_convention.add(comment)
my_convention.add(units)

my_convention.register() # only now we an enable it

h5tbx.use('my_convention')  # enable the convention

# print an overview:
my_convention

Let's convince, if the signatures of `__init__`, `create_group` and `create_dataset` changed:

In [None]:
import inspect

methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)

for method in methods:
    print(f'\nParameters for "{method.__name__}":')
    for param in inspect.signature(method).parameters.values():
        if not param.name == 'self':
            if param.name in ('contact', 'comment'):
                print(f'  - {h5tbx._repr.make_bold(param.name)}')
            else:
                print(f'  - {param.name}')

The docstrings of the methods also changed. Call `help()` on them:

In [None]:
help(h5tbx.File.__init__)

In [None]:
help(h5tbx.Group.create_dataset)

## Working with the convention 

First we test the comment attribute:

A wrong or missing input will raise an error:

In [None]:
try:
    with h5tbx.File(comment='123') as h5:
        h5.dump()
except Exception as e:
    print(e)

Unexpected parameters to the methods, will raise an error:

In [None]:
try:
    with h5tbx.File(contact='https://orcid.org/0000-0001-8729-0482',
                    comment='123') as h5:
        h5.dump()
except Exception as e:
    print(e)

This is correct:

In [None]:
with h5tbx.File(comment='My first file') as h5:
    h5.dump()

Next we test the units attribute:

In [None]:
with h5tbx.File(comment='My first file') as h5:
    h5.create_dataset('velocity', data=1.3, units='m/s', comment='Hello')
    h5.dump()

## Import a convention

Conventions are defined for a project. Standard attributes can be defined in a single or multiple YAML files. Those files can be loaded into the current work from a local storage or a remote web resource. We first take a look at loading a local definition of standard names.

### Load a local convention

In [None]:
from h5rdmtoolbox import tutorial

In [None]:
convention_filename = tutorial.get_standard_attribute_yaml_filename()

local_cv = h5tbx.conventions.Convention.from_yaml(convention_filename)
local_cv.register()
local_cv

In [None]:
h5tbx.use(local_cv)

In [None]:
with h5tbx.File(contact='https://orcid.org/0000-0001-8729-0482', mode='r') as h5:
    h5.dump()

### Load a remote convention

This is generally done only once a due to some revisions a few time. Such a conventions therefore needs to get a version or evene better a persistent identifier like a DOI.

The toolbox suggests using Zenodo as a repository. The following shows, how a convention, wich was uploaded to Zenodo can be integrated into the user's workflow.

The example convention is registered under the DOI 123123 on Zenodo. It contains multiple \*.yaml-files.

In [None]:
cv = h5tbx.conventions.from_zenodo(doi='8276817')
h5tbx.use(cv)  # enable the downloaded convention

## List of available conventions

It is possible to register conventions, which is the list of standard attributes for the respective HDF objects. A list can be optained by the dictionary `conventons.registered_conventions`:

In [None]:
h5tbx.conventions.get_registered_conventions()