# Layouts

Layouts specify the meta content of HDF5 files, e.g. which attributes are required or which shape certain datasets are expected to have. A layout is published by a project management team or a collaboration for instance and helps during data generation and usage: A creator of HDF5 file content can verify if all required data is written. Likewise, the receiver of HDF5 file, e.g. an analyst, can check if the file that is being inspected is "complete". The standardisation of a file layout will reduce back-and-forth actions as it minimizes errors or missing data and ultimately saves costs in the process.

## Creating a layout

The `layout.File` object is designed such that the syntax is similar to the `h5py.File` class.

In [1]:
import h5rdmtoolbox as h5tbx
from h5rdmtoolbox.conventions import layout

2023-04-13_18:20:02,889 DEBUG    [__init__.py:35] changed logger level for h5rdmtoolbox from 20 to DEBUG


Initialize a `File` object:

In [2]:
lay = layout.File()

Next we will specify various attributes of groups and dataset as well as properties of datasets such as their shape for instance. Using explicit HDF5 paths or wildcards we will define whether the specifications apply for a specific HDF object or various ones.

### Attribute specifications

Let's require the user to set the root attribute "version", which holds the current version of this package:

In [3]:
lay['/'].attrs['version'] = h5tbx.__version__

Further define, that each group must have an attribute called "long_name". We don't specify the value of it, we just request to use the attribute. The wildcard (`*`) indicates, that the location of the group does not matter, so that the specification applies to any group within an HDF5 file:

In [4]:
lay['*'].group().attrs['long_name'] = layout.Any()

Now, we specify, that a group "device" must exist. We explicitly tell that the device group must be located at the lowest level (root level):

In [5]:
lay['/'].group('device')

LayoutGroup("/device")

Note, that if we would not specify the exact group, hence writing `lay[*].group('device')`, then this would have no effect as this made "device" being optional.

Let's see which "validators" we defined until now (validators are the specifications we set. they will be called in a sequence later on)

In [6]:
lay

<Layout File (3 validators):
 [0] AttributeValidation(path="/", key="version", validator=AttributeEqual(0.4.0a1, opt=False))
 [1] AttributeValidation(path="/*", key="long_name", validator=AnyAttribute(None, opt=False))
 [2] GroupValidation(path="/device")>

Now, say, that each *dataset* (in any group) shall have an attribute called "standard_name". Again, the wildcard is used and no specific dataset name is set:

In [7]:
lay['*'].dataset()#.attrs['standard_name'] = layout.Any()

DatasetValidation(parent=/*")

Each *dataset* in the group "fluid" (and below) shall have an attribute called "units".

In [8]:
lay

<Layout File (4 validators):
 [0] AttributeValidation(path="/", key="version", validator=AttributeEqual(0.4.0a1, opt=False))
 [1] AttributeValidation(path="/*", key="long_name", validator=AnyAttribute(None, opt=False))
 [2] GroupValidation(path="/device")
 [3] DatasetValidation(parent=/*")>

In [9]:
awdawd

NameError: name 'awdawd' is not defined

In [None]:
lay['fluid/*'].dataset().attrs['units'] = layout.Any()

### Dataset property specification

Each dataset (despite its location within the hierachical structure), which starts with eith "x", "y" or "z" and ends with "_coordinate" shall be one-dimensional. This can be specified by the dataset property "ndim"

In [None]:
lay['*'].dataset(name=layout.Regex('^[x-z]_coordinate'), ndim=1)

## Perform a layout validation

Let's create an empty HDF5 file first:

In [None]:
h5tbx.use(None)

with h5tbx.File() as h5:
    h5.dump()

Running the validation with `lay.validate()` will get us a total of three issues (`res.total_issues()`):

In [None]:
res = lay.validate(h5.hdf_filename)
res

To find out what the issues are, best is to `print()` the issue messages. Note, some issues are "hidden".

In [None]:
res.print()

In [None]:
layout.File.Registry()

Adding the version will reduce the issues about one:

In [None]:
with h5tbx.File() as h5:
    h5.attrs['version'] = h5tbx.__version__
    res2 = lay.validate(h5)
res2

Adding a dataset in the group "fluid" without a unit:

In [None]:
with h5tbx.File() as h5:
    h5.attrs['version'] = h5tbx.__version__
    res2 = lay.validate(h5)
res2

## User-defined Validators