# Getting Started with "Layouts"

The `toolbox` provides a framework, which allows defining the layout (structure) of an HDF5 file.  This means we can specify, which dataset, groups, attributes and properties are expected.

What's the difference to conventions?, Conventions come into play during dataset or group creation. Conventions define which attributes could or must be provided during the creation. This is done through "standard attributes", which also validate the value. A layout on the other hand is used after a file is created and checks the full content (except array values) of a file. It should be used to check, if a file adheres to a project definition. If it does, it can be shared with others (other users, repositories, databases, ...).

Let's learn about it by practical examples:

In [1]:
from h5rdmtoolbox import layout  # import the layout module

## 1. Create a layout

Creating a layout by calling the `Layout` class:

In [2]:
lay = layout.Layout()

A Layout consists of so-called "layout specifications". All specifications are store in a list. So far we have not added any specification, so the list is empty:

In [3]:
lay.specifications

[]

### 1.1 Adding a specification

Let's add a specification. For this we call `.add()`. We will add information for a query request, which will be performed later, when we validate a file (layout).

The first argument is the query method. We will use `find` from the database class [`h5rdmtoolbox.database.hdfdb.FileDB`](../database/hdfDB.ipynb). Then we add keyword arguments to be passed to that method.

As a first example, we request the following for all files to be validated with our layout:
- all dataset must be compressed with "gzip"
- the dataset with name "/u" must exist

In [4]:
from h5rdmtoolbox.database import hdfdb

In [5]:
# the file must have datasets (this spec makes more sense with the following spec)
spec_all_dataset = lay.add(
    hdfdb.FileDB.find,  # query function
    flt={},
    objfilter='dataset'
)

# all datasets must be compressed with gzip (conditional spec. only called if parent spec is successful)
spec_compression = spec_all_dataset.add(
    hdfdb.FileDB.find_one,  # query function
    flt={'$compression': 'gzip'}  # query parameter
)

# the file must have the dataset "/u"
spec_ds_u = lay.add(
    hdfdb.FileDB.find,  # query function
    flt={'$name': '/u'},
    objfilter='dataset'
)

# we added one specification to the layout:
lay.specifications

[LayoutSpecification (kwargs={'flt': {}, 'objfilter': 'dataset'}),
 LayoutSpecification (kwargs={'flt': {'$name': '/u'}, 'objfilter': 'dataset'})]

**Note:** We added three specifications: The first (`spec_all_dataset`) and the last (`spec_ds_u`) specification were added to layout class. The second specification (`spec_compression`) was added to the first specification and therefore is a *conditional specification*. This means, that it is only called, if the parent specification was successful. Also note, that a child specification is called on all result objects of the parent specification. In our case, `spec_compression` is called on all datasets objects in the file.

We can call `sepcifications` of the first specification and, indeed, see specification defining the compression type:

In [6]:
lay.specifications[0].specifications

[LayoutSpecification (kwargs={'flt': {'$compression': 'gzip'}})]

**Example data**

To test our layout, we need some example data

In [7]:
import h5rdmtoolbox as h5tbx
with h5tbx.File() as h5:
    h5.create_dataset('u', shape=(3, 5), compression='lzf')
    h5.create_dataset('v', shape=(3, 5), compression='gzip')
    h5.create_group('instruments', attrs={'description': 'Instrument data'})

## 2. Validate a file

Let's perform the validation. We expect one failed validation, because dataset "u" has the wrong compression:

In [8]:
res = lay.validate(h5.hdf_filename)

2023-12-18_14:51:45,123 ERROR    [core.py:117] Applying spec. "LayoutSpecification (kwargs={'flt': {'$compression': 'gzip'}})" on "<HDF5 dataset "u": shape (3, 5), type "<f4", convention "h5py">" failed.


The error log message shows us that one specification was not successful. The `is_valid()` call therefore will return `False`, too:

In [9]:
res.is_valid()

False

In [10]:
hdfdb.FileDB(h5.hdf_filename).find({'$compression': 'gzip'})

[<LDataset "/v" in "C:\Users\da4323\AppData\Local\h5rdmtoolbox\h5rdmtoolbox\tmp\tmp_2\tmp0.hdf" attrs=()>]

The "compression-specification" got called twice and failed one (for dataset "u"):

In [11]:
spec_compression.n_calls, spec_compression.n_fails

(2, 1)

## 3. Sharing layouts

**This is work in progress!!!**

Currently, the only way to share layouts, is to share the above code or to save the object as a pickle file (c.f. https://docs.python.org/3/library/pickle.html).