# Standard Attributes

Data alone is meaningless. Only if it is associated with auxiliary data (meta data) it becomes interpretable and (re-)usable for others. In HDF5 files is realized by using **attribute**, which are assigned to groups and datasets. HDF attributes are like dictionaries: You provide a name and a value. However, which name and value you are using is generally up to the you.

The `h5RDmtoolbox` let's you specify rules for specific attributes. These attributes are simply called **standard attributes** and can be enabled or disabled and therefore made available to the user or not.

If an attribute is addressed by the user, e.g. the attribute `units`, and a standard attribute implementation exists for this name, then the value is processed by the respective rule and the attribute is set or an error is raised in case of a invalid input.

Standard attributes can be made required **during dataset creation** for instance. This enforces users to pass certain meta information and validates it at the same time. Consequently data becomes re-usable and explorable.

Additionally, so-called [layouts](./layouts.ipynb) can be defined, too. They are used to specify the content of an HDF5 file after it has been written. This concept applies best during file exchange as the layout validates if a file is complete and meets the expertation of the project or collaborative user.

In [1]:
import h5rdmtoolbox as h5tbx

NameError: name 'Dict' is not defined

## Defining a new standard attribute

Referring to the above example, let's define the standard attribute `units`. Therefore, we need to inherite from the class `StandardAttribute` and provide the methods `set()` and `get()`. The attribute name with which this class is associated is the class name or, if set, the class variable `name` (the latter is recommended to be used):

In [None]:
from h5rdmtoolbox import conventions
import warnings

class SourceAttribute(conventions.StandardAttribute):
    
    name = 'source'
    
    def set(self, source_type: str):
        if source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            raise ValueError('Unknown source type')
        super().set(source_type)
        
    def get(self):
        source_type = super().get()
        if source_type is None:
            warnings.warn('No source available')
            return
        elif source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            warings.warn(f'Unexpected source type: {source_type}')
        return source_type.upper()

In [None]:
from h5rdmtoolbox import conventions
import warnings

class SourceAttribute(conventions.StandardAttribute):
    
    name = 'source'
    
    def set(self, source_type: str):
        if source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            raise ValueError('Unknown source type')
        super().set(source_type)
        
    def get(self):
        source_type = super().get()
        if source_type is None:
            warnings.warn('No source available')
            return
        elif source_type.upper() not in ('NUMERICAL', 'EXPERIMENTAL'):
            warings.warn(f'Unexpected source type: {source_type}')
        return source_type.upper()

## List of available conventions

It is possible to register conventions, which is the list of standard attributes for the respective HDF objects. A list can be optained by the dictionary `conventons.registered_conventions`:

In [None]:
h5tbx.conventions.registered_conventions.keys()

Now, we regulated what happens, when this special (standard) attribute is written (`set`) and read (`get`).

## Add to a convention
Next we need to add this attbribute to a convention and assign it to the `Group` calss and the method `create_dataset` in order to make "source" available to the user and enforce its usage.

Let's initialize a new convention and register it (make it available in the package):

In [None]:
cv = conventions.Convention('my_convention')
cv

The output shows which attributes are associated with the objects `File`, `Group` and `Dataset` and the methods `__init__`, `create_group` and `create_dataset`. What this exactly means will get clear shortly. Let's add `SourceAttribute` the class `Dataset`:

In [None]:
cv['create_dataset'].add(SourceAttribute,
                         add_to_method=True,
                         optional=True,
                         position={'after': 'data'})

The `SourceAttribute` is now added to the class `Group`:

In [None]:
cv

For now, it is only registered as a property. This means, the user is yet responsible for setting the "source".

## Register and enable
We need to register the convention `cv` and enable it (and thus enable the "source" attribute)

In [None]:
cv.register()
h5tbx.use('my_convention')
h5tbx.current_convention

## Example:
Let's create a dataset and get the source. As we do not pass the argument `source` (we set it to optional) and we do not set it via the attribute manager, we expect a warning:

In [None]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5))
    print(ds.source)

We may pass "source" directly as an argument or via "attrs". Both of which will check if the source is "numerical" or "experimental", thus the `set()` method is called in both cases:

In [None]:
with h5tbx.File() as h5:
    ds1 = h5.create_dataset('data1', (4, 5), attrs={'source': 'numerical'})
    ds2 = h5.create_dataset('data2', (4, 5), source='experimental')
    # two example that fail:
    try:
        h5.create_dataset('data3', (4, 5), attrs={'source': 'model-based'})
    except ValueError as e:
        print(e)
    try:
        h5.create_dataset('data4', (4, 5), source='model-based')
    except ValueError as e:
        print(e)

Until now, the source attribute was **optional**. We want to enforce the use, so let's change this property of the standard attribute:

In [None]:
cv.make_required('create_dataset', 'source')

In [None]:
cv

In [None]:
with h5tbx.File() as h5:
    try:
        ds = h5.create_dataset('data', (4, 5))
    except h5tbx.conventions.StandardAttributeError as e:
        print(e)

In [None]:
with h5tbx.File() as h5:
    ds = h5.create_dataset('data', (4, 5), source='Experimental')
    ds.dump()