# Quick Overview

This chapter gives a quick overview into how to use the package. Detailed explanations can be found in the subchapters of the respective sub-classes.

Import the package and give it an alias, e.g. `h5tbx`.

In [None]:
import h5rdmtoolbox as h5tbx

## Select a convention
The file content is controlled by means of a `convention`, which is a set of standardized attributes. They enforce the user to provide certain meta-data and at the same controls its value (e.g. syntax). Either use the pre-defined convention (`tbx`) or [create your own](../conventions/standard_attributes.ipynb). For now, we select the existing one:

In [None]:
h5tbx.use('tbx')
h5tbx.get_current_convention()

From the representation string of the convention object we can read which attributes are *optional* or **required** for file creation (`__init__`), dataset creation (`create_dataset`) or group creation (`create_group`).

## Create an HDF file

We recommend using python's context manager (`with` ...). it is not required to provide a filename. If so, a **temporary file** is created and deleted after the session, thus perfectly suited for this tutorial session:

In [None]:
with h5tbx.File(title='A test file') as h5:
    print(h5.hdf_filename.name)  # equal to h5.filename but a pathlib.Path and exists also after the file is closed

## Create a dataset

To create a dataset we need to call `create_dataset`. We already know what, that by enabling the "tbx"-convention a few additional parameters can or must be passed. By enabling or disabling a convention, the method signature is always updated acordingly, too:

In [None]:
h5tbx.File.create_dataset

A deep dive into what the various parameters to is given [here](..conventions/tbx.ipynb).

But now let's create a sinusoidal signal $v(t)$, which represents a velocity measurement in `units` of volts. The measurments conversion factor into physical units for this example shall be $2.5 \frac{m/s}{V}$. We choose "vel" as the dataset `name` but with `long_name` we will give a more precise description:

In [None]:
import numpy as np

time = np.linspace(0, np.pi/4, 21) # units [s]
signal = np.sin(2*np.pi*3*time) # units [V], physical: [m/s]

with h5tbx.File(contact='https://orcid.org/0000-0001-8729-0482') as h5:
    vel_hdf_filename = h5.hdf_filename # store for later use
    
    ds_time = h5.create_dataset(name='time',
                                data=time, 
                                units='s',
                                long_name='measurement time',
                                make_scale=True)
    
    ds_signal = h5.create_dataset(name='vel',
                                  data=signal,
                                  units='V',
                                  offset=10.0,
                                  scale='2.5 m/s/V',
                                  long_name='air velocity in pipe',
                                  attach_scale=ds_time)

With the `h5rdmtoolbox` you receive a `xr.DataArray` object instead of a `np.ndarray` when an HDF5 dataset is sliced. Thus, meta information (the attributes of the dataset) is still provided with the data and useful features like plotting is possible:

In [None]:
with h5tbx.File(vel_hdf_filename) as h5:
    vel_data = h5['vel'][:]
    vel_data.plot(marker='o')
    
vel_data  # this returns the interactive view of the array and its meta data

## Create a group
Groups don't really differ from the implementation in `h5py`. Besides standard attributes, which may be required, `attrs` can be passed during group creation. This is also possible for dataset creationg. Overwriting existing objects is possible, too.

In [None]:
with h5tbx.File(vel_hdf_filename, 'r+') as h5:
    h5.create_group('mygroup',
                    overwrite=True,
                    attrs={'long_name': 'my special group'})

## Natural Naming
Until here we used the conventional way of addressing variables and groups in a dictionary-like style. `h5RDMtoolbox` allows to use "natural naming" which means that we can address those objects as if they were attributes. Make sure `h5tbx.config.natural_naming` is set to `True` (the default)

In [None]:
from h5rdmtoolbox import config

Let's first disable `natural_naming`:

In [None]:
config.natural_naming = False
with h5tbx.File(vel_hdf_filename, 'r') as h5:
    try:
        ds = h5.vel[:]
    except AttributeError as e:
        print(e)

Enable it:

In [None]:
config.natural_naming = True
with h5tbx.File(vel_hdf_filename, 'r') as h5:
    ds = h5.vel[:]
    grp = h5.mygroup

## Inspect file content
Often it is necessary to inspect the content of a file (structure, meta data, not the raw data). Calling `dump()` on a group represents the content (dataset, groups and attributes) as a pretty nd interactive (!) html representation. This is adopted from the `xarray` package. All credits for this idea go there. The representation here avoids showing data, though. Outside an IPython environment call `sdump()` to get a string representation of the file.

In [None]:
with h5tbx.File(vel_hdf_filename) as h5:
    h5.dump()

In [None]:
with h5tbx.File(vel_hdf_filename) as h5:
    h5.sdump()

## Conventions

Conventions specify **which attributes are specified** and which of them are **required** for an HDF5 file. These specifications are called **standard attributes** and **layouts**:

### Standard Attributes

[Standard Attributes](../conventions/standard_attributes.ipynb) are defined via special python classes. They have a `get` and `set` method which check e.g. the syntax or the value of an attribute. These attributes can be associated with the root group, other groups or datasets. They can also be required during the creation of those objects.

In [None]:
from h5rdmtoolbox.conventions import StandardAttribute

class CreationTime(StandardAttribute):
    name = 'creation_time'
    
    def get(self):
        return f'The creation time is: {super().get()}'
    
    def set(self, value):
        if not isinstance(value, datetime.datetime):
            raise ValueError(f'Not a valid creation time: {value}')
        return super().set(str(value))

In [None]:
from h5rdmtoolbox.conventions import Convention

In [None]:
mycv = Convention('my-cv')
mycv['__init__'].add(CreationTime,
                     add_to_method=True,
                     position={'after': 'mode'},
                     optional=False)
mycv.register()

In [None]:
h5tbx.use('my-cv')

In [None]:
try:
    with h5tbx.File() as h5:
        h5.dump()
except Exception as e:
    print(e)

In [None]:
import datetime
with h5tbx.File(creation_time=datetime.datetime.now()) as h5:
    h5.dump()
    print(h5.creation_time)

### Layouts

Layouts define how a file is expected to be orginzed, which groups and datasets must exist, which attributes are expected and much more. Layout define expectations an thus help file exchange where multiple users are involved. E.g. for numerical and experimental data, layouts are defined such that the minimum data will exist. If a layout validation exist, the exchanged file is rejected.

In [None]:
from h5rdmtoolbox.conventions import Validator

class ValidCreationTime(Validator):
    
    def __init__(self, optional):
        super().__init__(reference=None, optional=optional)
    
    def validate(self, date_string):
        try:
            dt = datetime.datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
        except Exception as e:
            print(e)
            return False
        return True

In [None]:
from h5rdmtoolbox.conventions import Layout, validators

lay = h5tbx.conventions.layout.Layout()
lay['/'].attrs['creation_time'] = ValidCreationTime(False)
lay['/'].attrs['title'] = validators.ValidString(False)

In [None]:
with h5tbx.File(creation_time=datetime.datetime.now()) as h5:    
    lay.validate(h5)
lay.report()

In [None]:
lay.get_failed_validations()

In [None]:
with h5tbx.File(creation_time=datetime.datetime.now()) as h5:    
    h5.attrs['title'] = 'Test file'
    lay.validate(h5)
lay.report()