# Getting Started with "Layouts"

Layout (definitions) are a mean to validate the setup and metadata content of an HDF5 file. The `h5rdmtoolbox` provides a way which allows defining exected dataset, groups, attributes and properties. This is intended to be used as a validation step before any further handling with an HDF5 file.

Different to conventions, a layout validates an existing HDF5 files. The below image illustrates a typical workflow:
1. The file is created while a convention may be in place. The convention supports during the creation process and makes sure that some required data will be available. However, it is not possible to capture all requirements, especially structural ones, e.g. a certain dataset must exist.
2. The structural and conditional testing can be described by means of a layout definition. A layout validates an already written file. If it succeeds, follow-up steps like sharing, storing or additional processing steps could take place.

<img src="../../_static/layout_workflow.svg"
     alt="../../_static/layout_workflow.svg"
     style="margin-right: 10px; height: 200px;" />

Let's learn about it by practical examples:

In [1]:
from h5rdmtoolbox import layout  # import the layout module

Failed to import module h5tbx


## 1. Create a layout

The core class is called `Layout`, which will take so-called (layout) specifications:

In [2]:
lay = layout.Layout()

Currently, the layout has no specifications:

In [3]:
lay.specifications

[]

### 1.1 Adding a specification

Let's add a specification. For this we call `.add()`. We will add information for a query request, which will be performed later, when we validate a file (layout).

The most important argument of a specification is a query-function. It can be any function that returns a list of HDF5 objects which are found based on keyword arguments requested by that function.

Such a query function exists in h5rdmtoolbox, see the HDF5 database class: [`h5rdmtoolbox.database.hdfdb.FileDB`](../database/hdfDB.ipynb).

As a first example, we request the following for all files to be validated with our layout:
- all dataset must be compressed with "gzip"
- the dataset with name "/u" must exist

In [4]:
from h5rdmtoolbox.database import hdfdb

In [5]:
# the file must have datasets (this spec makes more sense with the following spec)
spec_all_dataset = lay.add(
    hdfdb.FileDB.find,  # query function
    flt={},
    objfilter='dataset',
    n={'$gte': 1},  # at least one dataset must exist
    description='At least one dataset exists'
)

# all datasets must be compressed with gzip (conditional spec. only called if parent spec is successful)
spec_compression = spec_all_dataset.add(
    hdfdb.FileDB.find,  # query function
    flt={'$compression': 'gzip'},  # query parameter
    n=1,
    description='Compression of any dataset is "gzip"'
)

# the file must have the dataset "/u"
spec_ds_u = lay.add(
    hdfdb.FileDB.find,  # query function
    flt={'$name': '/u'},
    objfilter='dataset',
    n=1,
    description='Dataset "/u" exists'
)

# we added one specification to the layout:
lay.specifications

[LayoutSpecification(description="At least one dataset exists", kwargs={'flt': {}, 'objfilter': 'dataset'}),
 LayoutSpecification(description="Dataset "/u" exists", kwargs={'flt': {'$name': '/u'}, 'objfilter': 'dataset'})]

**Note:** We added three specifications: The first (`spec_all_dataset`) and the last (`spec_ds_u`) specification were added to layout class. The second specification (`spec_compression`) was added to the first specification and therefore is a *conditional specification*. This means, that it is only called, if the parent specification was successful. Also note, that a child specification is called on all result objects of the parent specification. In our case, `spec_compression` is called on all datasets objects in the file.

In [6]:
lay.specifications[0].specifications

[LayoutSpecification(description="Compression of any dataset is "gzip"", kwargs={'flt': {'$compression': 'gzip'}})]

## 2. Validate a file

**Example data**

To test our layout, we need some example data

In [7]:
import h5rdmtoolbox as h5tbx
with h5tbx.File() as h5:
    h5.create_dataset('u', shape=(3, 5), compression='lzf')
    h5.create_dataset('v', shape=(3, 5), compression='gzip')
    h5.create_group('instruments', attrs={'description': 'Instrument data'})
    h5.dump()

Let's perform the validation. We expect one failed validation, because dataset "u" has the wrong compression type:

In [8]:
res = lay.validate(h5.hdf_filename)

2024-04-07_18:34:55,177 ERROR    [core.py:320] Applying spec. "LayoutSpecification(description="Compression of any dataset is "gzip"", kwargs={'flt': {'$compression': 'gzip'}})" failed due to not matching the number of results: 1 != 0


In [9]:
res.specifications[0].specifications[0].results[0].validation_flag

10

The summary gives a comprehensive set of information about the performed calls on targets (datasets or groups) and their outcomes:

In [14]:
res.print_summary(exclude_keys=['id', 'kwargs', 'func'])


Summary of layout validation
+----------+--------+------------------------+--------------------------------------+---------------+---------------+
| called   |   flag | flag description       | description                          | target_type   | target_name   |
|----------+--------+------------------------+--------------------------------------+---------------+---------------|
| True     |      1 | SUCCESSFUL             | At least one dataset exists          | Group         | tmp0.hdf      |
| True     |     10 | FAILED, INVALID_NUMBER | Compression of any dataset is "gzip" | Dataset       | /u            |
| True     |      1 | SUCCESSFUL             | Compression of any dataset is "gzip" | Dataset       | /v            |
| True     |      1 | SUCCESSFUL             | Dataset "/u" exists                  | Group         | tmp0.hdf      |
+----------+--------+------------------------+--------------------------------------+---------------+---------------+
--> Layout validation foun

The error log message shows us that one specification was not successful. The `is_valid()` call therefore will return `False`, too:

In [11]:
res.is_valid()

False

In [12]:
hdfdb.FileDB(h5.hdf_filename).find({'$compression': 'gzip'})

[<LDataset "/v" in "C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\tmp\tmp_15\tmp0.hdf" attrs=()>]

The "compression-specification" got called twice and failed one (for dataset "u"):

In [13]:
spec_compression.n_calls, spec_compression.n_fails

(2, 1)