# Standard Attributes

Data gets its real value through meta data. In HDF5 files this is done through attributes. A name for an attribute can generally be freely selected. Its value is typically an interger or a string (other data types are possible but out of the packages scope and concept at this stage). Multiple attributes are possible per HDF5 dataset or group. We believe, that by standardizing the use of attributes, the FAIRness of data significantly is improved. Only when all users agree on the same minimum set of attributes data become explorable and automatically processable by machines.

Standard attributes can be made required during dataset or group creation. As there might always be a work-around, [layouts](./layouts.ipynb) can be used to validate the file meta-content during file exchanged. Invalid files would be rejected and missing (meta) data needed to be updated.

Thus, to summarize, the file (content) is only useful to others (and software!), if data is described in a way, everybody in the community/project/collaboration agreed on. Two  goals must be achieved:
1. Users are enforced to use these attributes
2. Attributes and their values are set according in a way excepted by a community or by collaborators (mainly syntax)

Instead of inherting inheriting from the base class `h5py.Group` and adjusting the method `create_dataset` according to user-defined attributes, we flexible enabling and disabling of specific attributes during run-time. This only requires implementing the attribute `set` and `get` methods of attributes but no further code manipulation of the core implementation.

The following explains the usage:

In [1]:
import h5rdmtoolbox as h5tbx

2023-05-07_09:51:57,819 DEBUG    [__init__.py:36] changed logger level for h5rdmtoolbox from 20 to DEBUG


## Standard Attributes

A standard attribute can be defined by inheriting from the class `StandardAttribute`. To control the processed data, which means to evaluate the user input and control return values, we need to provide the methods `set()` and `get()` and a `name`. To make it available for use it must be registerde (added) to a convention which then is enabled. Will walk though all that in the following.

## List of available conventions

It is possible to register conventions, which is the list of standard attributes for the respective HDF objects. A list can be optained by the dictionary `conventons.registered_conventions`:

In [2]:
h5tbx.conventions.registered_conventions.keys()

dict_keys(['h5py', 'tbx'])

## Definition of a standard attribute
Let's say we want to enforce the user to store the type of data with each dataset, e.g. whether it is from a numerical or experimental source. We call this attribute "source":

In [3]:
from h5rdmtoolbox import conventions
import warnings

class SourceAttribute(conventions.StandardAttribute):
    
    name = 'source'
    
    def set(self, source_type: str):
        if source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            raise ValueError('Unknown source type')
        super().set(source_type)
        
    def get(self):
        source_type = super().get()
        if source_type is None:
            warnings.warn('No source available')
            return
        elif source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            warings.warn(f'Unexpected source type: {source_type}')
        return source_type.upper()

Now, we regulated what happens, when this special (standard) attribute is written (`set`) and read (`get`).

## Add to a convention
Next we need to add this attbribute to a convention and assign it to the `Group` calss and the method `create_dataset` in order to make "source" available to the user and enforce its usage.

Let's initialize a new convention and register it (make it available in the package):

In [4]:
cv = conventions.Convention('my_convention')
cv

[1mConvention(my_convention)[0m[1m
> Properties[0m: ([3mNothing registered[0m)[1m
> Methods[0m:

The output shows which attributes are associated with the objects `File`, `Group` and `Dataset` and the methods `__init__`, `create_group` and `create_dataset`. What this exactly means will get clear shortly. Let's add `SourceAttribute` the class `Dataset`:

In [5]:
cv['create_dataset'].add(SourceAttribute,
                         add_to_method=True,
                         optional=True,
                         position={'after': 'data'})

The `SourceAttribute` is now added to the class `Group`:

In [6]:
cv

[1mConvention(my_convention)[0m[1m
> Properties[0m:
Dataset:
    * source: SourceAttribute[1m
> Methods[0m:
  Group.create_dataset():
    * source (optional)

For now, it is only registered as a property. This means, the user is yet responsible for setting the "source".

## Register and enable
We need to register the convention `cv` and enable it (and thus enable the "source" attribute)

In [7]:
cv.register()
h5tbx.use('my_convention')
h5tbx.current_convention

## Example:
Let's create a dataset and get the source. As we do not pass the argument `source` (we set it to optional) and we do not set it via the attribute manager, we expect a warning:

In [8]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5))
    print(ds.source)

None




We may pass "source" directly as an argument or via "attrs". Both of which will check if the source is "numerical" or "experimental", thus the `set()` method is called in both cases:

In [20]:
with h5tbx.File() as h5:
    ds1 = h5.create_dataset('data1', (4, 5), attrs={'source': 'numerical'})
    ds2 = h5.create_dataset('data2', (4, 5), source='experimental')
    # two example that fail:
    try:
        h5.create_dataset('data3', (4, 5), attrs={'source': 'model-based'})
    except ValueError as e:
        print(e)
    try:
        h5.create_dataset('data4', (4, 5), source='model-based')
    except ValueError as e:
        print(e)

2023-05-07_17:56:58,573 ERROR    [core.py:637] Could not set attributes {'source': 'model-based'} for dataset data3
2023-05-07_17:56:58,573 ERROR    [core.py:637] Could not set attributes {'source': 'model-based'} for dataset data3
2023-05-07_17:56:58,577 ERROR    [core.py:637] Could not set attributes {'source': 'model-based'} for dataset data4
2023-05-07_17:56:58,577 ERROR    [core.py:637] Could not set attributes {'source': 'model-based'} for dataset data4


Unknown source type
Unknown source type


Until now, the source attribute was **optional**. We want to enforce the use, so let's change this property of the standard attribute:

In [15]:
cv.make_required('create_dataset', 'source')

In [16]:
cv

[1mConvention(my_convention)[0m[1m
> Properties[0m:
Dataset:
    * source: SourceAttribute[1m
> Methods[0m:
  Group.create_dataset():
    * source

In [17]:
with h5tbx.File() as h5:
    try:
        ds = h5.create_dataset('data', (4, 5))
    except h5tbx.conventions.StandardAttributeError as e:
        print(e)

The standard attribute "source" is required but not provided.


In [18]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5), source='Experimental')
    ds.dump()

Dataset "/data"
---------------
*shape:        (4, 5)
*dtype:        float32
*compression:  gzip (5)
source:        EXPERIMENTAL
