# Conventions based on Standard Attributes

The most important data in HDF5 files stored in the attributes. They are auxiliary data, wich describe the data in datasets and commonly its called metadata.

Thus the file (content) is only as useful to others as well the data is described. Two  goals must be achieved:
1. Users set attributes
2. Attributes and their values are set according in a way excepted by a community or by collaborators

The first aspect can be solved by inherting the class `h5py.Group` and adjust the method `create_dataset` such that positional arguments are added, which correspont to the minimal meta information required. This is e.g. the unit of the data stored in a dataset. The second aspect then must be covered inside the function by evaluating the value passed for these arguments. However, this requires individual user code (inheritence of the `h5py` classes each time a new requirement is introduced).

The `h5RDMtoolbox` allows to flexible define, add or remove standardized meta information to the core classes and methods respectively, in order to achieve the above 2 goals. The definition of such standardized data is done using `StandardAttribute`. A set of those form a `convention`.

In [1]:
import h5rdmtoolbox as h5tbx

## Standard Attributes

A standard attribute can be defined by inheriting from the class `StandardAttribute`. To control the processed data, which means to evaluate the user input and control return values, we need to provide the methods `set()` and `get()` and a `name`. To make it available for use it must be registerde (added) to a convention which then is enabled. Will walk though all that in the following.

## Definition
Let's say we want to enforce the user to store the type of data with each dataset, e.g. whether it is from a numerical or experimental source. We call this attribute "source":

In [2]:
from h5rdmtoolbox import conventions
import warnings

class SourceAttribute(conventions.StandardAttribute):
    
    name = 'source'
    
    def set(self, source_type: str):
        if source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            raise ValueError('Unknown source type')
        super().set(source_type)
        
    def get(self):
        source_type = super().get()
        if source_type is None:
            warnings.warn('No source available')
            return
        elif source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            warings.warn(f'Unexpected source type: {source_type}')
        return source_type.upper()

Now, we regulated what happens, when this special (standardized) attribute is written (`set`) and read (`get`).

## Add to a convention
Next we need to add this attbribute to a convention and assign it to the `Group` calss and the method `create_dataset` in order to make "source" available to the user and enforce its usage.

Let's initialize a new convention and register it (make it available in the package):

In [3]:
cv = conventions.Convention('my_convention')
cv

[1mConvention(my_convention)[0m
[1m
> Properties[0m: ([3mNothing registered[0m)[1m
> Methods[0m:
  init_file: ([3mNothing registered[0m)
  create_group: ([3mNothing registered[0m)
  create_dataset: ([3mNothing registered[0m)

The output shows which attributes are associated with the objects `File`, `Group` and `Dataset` and the methods `__init__`, `create_group` and `create_dataset`. What this exactly means will get clear shortly. Let's add `SourceAttribute` the class `Dataset`:

In [4]:
cv.add(SourceAttribute,
      target_cls=h5tbx.Dataset,
      add_to_method=True,
      optional=True,
      position={'after': 'data'})

The `SourceAttribute` is now added to the class `Group`:

In [5]:
cv

[1mConvention(my_convention)[0m
[1m
> Properties[0m:
Dataset:
  * source: SourceAttribute[1m
> Methods[0m:
  init_file: ([3mNothing registered[0m)
  create_group: ([3mNothing registered[0m)
  create_dataset:
  * source (opt=None)

For now, it is only registered as a property. This means, the user is yet responsible for setting the "source".

## Register and enable
We need to register the convention `cv` and enable it (and thus enable the "source" attribute)

In [6]:
cv.register()
h5tbx.use('my_convention')
h5tbx.current_convention

## Example:
Let's create a dataset and get the source. As we do not pass the argument `source` (we set it to optional) and we do not set it via the attribute manager, we expect a warning:

In [7]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5))
    print(ds.source)

None




We may pass "source" directly as an argument or via "attrs". Both of which will check if the source is "numerical" or "experimental", thus the `set()` method is called in both cases:

In [8]:
with h5tbx.File() as h5:
    ds1 = h5.create_dataset('data1', (4, 5), attrs={'source': 'numerical'})
    ds2 = h5.create_dataset('data2', (4, 5), source='experimental')
    # two example that fail:
    try:
        h5.create_dataset('data3', (4, 5), attrs={'source': 'model-based'})
    except ValueError as e:
        print(e)
    try:
        h5.create_dataset('data4', (4, 5), source='model-based')
    except ValueError as e:
        print(e)

2023-04-05_10:06:16,338 ERROR    [core.py:621] Could not set attributes {'source': 'model-based'} for dataset data3
2023-04-05_10:06:16,338 ERROR    [core.py:621] Could not set attributes {'source': 'model-based'} for dataset data3
2023-04-05_10:06:16,341 ERROR    [core.py:621] Could not set attributes {'source': 'model-based'} for dataset data4
2023-04-05_10:06:16,341 ERROR    [core.py:621] Could not set attributes {'source': 'model-based'} for dataset data4


Unknown source type
Unknown source type


Until now, the source attribute was **optional**. We want to enforce the use, so let's change this property of the standard attribute:

In [9]:
cv.make_required('create_dataset', 'source')

In [10]:
cv

[1mConvention(my_convention)[0m
[1m
> Properties[0m:
Dataset:
  * source: SourceAttribute[1m
> Methods[0m:
  init_file: ([3mNothing registered[0m)
  create_group: ([3mNothing registered[0m)
  create_dataset:
  * source

In [11]:
with h5tbx.File() as h5:
    try:
        ds = h5.create_dataset('data', (4, 5))
    except h5tbx.conventions.StandardAttributeError as e:
        print(e)

The standard attribute "source" is required but not provided.


In [12]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5), source='Experimental')
    ds.dump()

Dataset "/data"
---------------
*shape:        (4, 5)
*dtype:        float32
*compression:  gzip (5)
source:        EXPERIMENTAL
