# Introduction to Conventions

The standardization of attributes is important in order to reliable find specific datasets in a file. Typos and attribute naming based on personal preferences must be avoided. Even if humans may still be able to understand such data, especially automatic exploration and processing of such data will be impossible.

The toolbox provides as solution based on standardized HDF5 attributes explained in this chapter

<!-- 

The `h5RDmtoolbox` lets you specify rules for those "special attributes". We will call them `standard attributes` and a collection of it a `convention` (More on this [here](conventions.ipynb)).

<!-- If an attribute is addressed by the user, e.g. the attribute `units`, and a standard attribute implementation exists for this name, then the value is processed by the respective rule and the attribute is set or an error is raised in case of a invalid input.

Standard attributes can be made required **during dataset creation,** for instance. This enforces users to pass certain meta information and validates it at the same time. Consequently, data becomes re-usable and explorable.

Additionally, so-called [layouts](./layouts.ipynb) can be defined, too. They are used to specify the content of an HDF5 file after it has been written. This concept applies best during file exchange, as the layout validates if a file is complete and meets the expectation of the project or collaborative user. -->

<!-- 
## Concept
The figure below illustrates the general concept. Standard attributes are defined by the user and added to a convention. A registered convention is activated by calling `.use(<name of convention>)`. By doing so, the signature of the methods `create_dataset`, `create_group` and `__init__` are modified according to the generated standard attributes. Moreover, the docstring will be updated, too, as we will see later.


<img src=concept_of_std_attrs.png width=800px>


Let's see how this is done in practice: -->

In [1]:
import h5rdmtoolbox as h5tbx

## Definition of Standard Attributes in a Convention (YAML file)

Standard attributes are defined in YAML files and can be read by the class `Convention`. Each entry of the YAML file corresponds to an object of class `StandardAttribute`.

An example of a convention defined in a YAML file is shown below. It first contains some lines of general information (name, contact, etc) and then is followed by definitions of standard attributes (in this case one, it is called "data_type")

Let's read this file into the class `Convention`:

In [2]:
cv = h5tbx.conventions.from_yaml('example_convention.yaml')
cv

[1mConvention("h5rdmtoolbox-tuturial-convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [1mdata_type[0m:
		Type of data in file. Can be numerical or experimental.

In order to make the convention affective in this session, it must be enabled. We do this by calling `use()`:

In [3]:
h5tbx.use(cv)  # or h5tbx.ue('h5rdmtoolbox-tuturial-convention') or cv.use()

using("h5rdmtoolbox-tuturial-convention")

Now, we will get an error if we create a HDF5 file without providing the attribute "data_type". As we made it a required attribute, it must be provided during file initialization:

In [4]:
try:
    with h5tbx.File() as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)

The standard attribute "data_type" is required but not provided.


Providing a wrong value raises an error, too:

In [5]:
try:
    with h5tbx.File(data_type='observational') as h5:
        pass
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Setting "observational" for standard attribute "data_type" failed. Original error: The value "observational" is not in ['experimental', 'numerical']. Expecting one of these: ['experimental', 'numerical']


Now, we got it:

In [6]:
with h5tbx.File(data_type='numerical') as h5:
    h5.dump()

Note, that if we were to reopen the file not in read-only (r) but in read-write mode, then the standard attributes which already exist are not checked again. So if the HDF5 was written with another package, e.g. h5py, then the value might be wrong:

In [7]:
with h5tbx.File(name=h5.hdf_filename, mode='r+') as h5:
    pass # note, that we were not required to pass "data_type" as it was present already!

Note, that a convention can also be enabled only **temporarily** using the context manager syntax:

In [8]:
with h5tbx.use(cv):
    with h5tbx.File(data_type='numerical') as h5:
        pass

If we provide a default value for "data_type" in the YAML file, it will not be required by the `__init__` method:

In [9]:
cv.properties[h5tbx.File]['data_type'].make_optional()
cv

[1mConvention("h5rdmtoolbox-tuturial-convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [3mdata_type[0m (default=None):
		Type of data in file. Can be numerical or experimental.

In [10]:
# no error:
with h5tbx.File() as h5:
    h5.dump()

In [11]:
try:
    with h5tbx.File() as h5:
        h5.data_type='observational'
except h5tbx.errors.StandardAttributeError as e:
    print(e)

Setting "observational" for standard attribute "data_type" failed. Original error: The value "observational" is not in ['experimental', 'numerical']. Expecting one of these: ['experimental', 'numerical']


In [12]:
with h5tbx.File() as h5:
    h5.data_type='experimental'

## Designing Standard Attributes

The above standard attribute used the `validator` called `$in` to compare the input against a reference list defined in the YAML file. There are more validators to choose from:

### Validators

Here is a list of available validators. An advanced introduction into details can be found in a [separate chapter](introduction_to_validators.ipynb)

In [13]:
h5tbx.conventions.get_validator().keys()

dict_keys(['$none', '$datetime', '$type', '$in', '$orcid', '$quantity', '$offset', '$units', '$ref', '$url', '$bibtex', '$minlength', '$maxlength', '$regex', '$standard_name', '$standard_name_table'])

## Importing/Loading an online convention

The intended distribution of convention is via online repositories. The YAML file hence should be uploaded such it is accessible to all users. The `h5RDMtoolbox` currently favors the usage of [Zenodo](https://zenodo.org) repositories. The advantages are long-term storage and assignment of a DOI. However, files accessible via an URL can also be downloaded.

A tutorial convention is published [here](https://zenodo.org/record/8276817). By calling `from_zenodo()` the convention object is created:

In [14]:
cv = h5tbx.conventions.from_zenodo(doi=8301535)
cv

[1mConvention("h5rdmtoolbox-tutorial-convention")[0m
contact: https://orcid.org/0000-0001-8729-0482
  File.__init__():
    * [1mdata_type[0m:
		Type of data in file. Can be numerical, analytical or experimental.
    * [1mcontact[0m:
		Contact or responsible person for the full file. Contact is represented by an ORCID.
    * [3mstandard_name_table[0m (default=<h5rdmtoolbox.conventions.consts.DefaultValue object at 0x0000026C955D23D0>):
		The standard name table of the convention.
    * [3mcomment[0m (default=None):
		Comment describes the file content in more detail.
    * [3mreferences[0m (default=None):
		Web resources servering as references for the full file.
  Group.create_dataset():
    * [1munits[0m:
		The physical unit of the dataset. If dimensionless, the unit is ''.
    * [1mstandard_name[0m:
		Standard name of the dataset. If not set, the long_name attribute must be given.
    * [1mlong_name[0m:
		An comprehensive description of the dataset. If not set, the 

## Effect of enabling a convention

The convention above defined the usage of certain attributes with certain methods. E.g. "data_type" is to be used when a HDF5 file is created. When the convention is enabled, the **signature of the respective methods is changed**. To proof this, let's implement a small function, which prints all parameters of a given function and inspect the effect of the convention in the `__init__` method:

In [15]:
cv.properties[h5tbx.Dataset]['standard_name']

<StandardAttribute[positional]("standard_name"): "Standard name of the dataset. If not set, the long_name attribute must be given.">

In [16]:
import inspect

def print_method_parameters(method):
    print(f'\nParameters for "{method.__name__}":')
    for param in inspect.signature(method).parameters.values():
        if not param.name == 'self':
            if param.name in h5tbx.conventions.get_current_convention().methods[h5tbx.File].get('__init__', {}).keys():
                print(f'  - {h5tbx._repr.make_bold(param.name)}')
            else:
                print(f'  - {param.name}')

methods = (h5tbx.File.__init__, h5tbx.Group.create_group, h5tbx.Group.create_dataset)

print('no convention: ')
h5tbx.use(None)
print_method_parameters(h5tbx.File.__init__)

print(f'\n------------\nwith convention {cv.name}: (standard attributes are made bold)')
h5tbx.use(cv)
print_method_parameters(h5tbx.File.__init__)

no convention: 

Parameters for "__init__":
  - name
  - mode
  - layout
  - attrs
  - kwargs

------------
with convention h5rdmtoolbox-tutorial-convention: (standard attributes are made bold)

Parameters for "__init__":
  - name
  - mode
  - layout
  - attrs
  - [1mdata_type[0m
  - [1mstandard_name_table[0m
  - [1mcomment[0m
  - [1mcontact[0m
  - [1mreferences[0m
  - kwargs


In [None]:
h5tbx.use(None)  # fall back to the default convention