# Serverless HDF Database

No classic server client-client concept is used. Instead a HDF file is created with external links to root groups of HDF files that included to the database.

In [None]:
from h5rdmtoolbox import h5database as h5db
from h5rdmtoolbox import generate_temporary_directory
from h5rdmtoolbox import tutorial
import h5rdmtoolbox as h5tbx

In [None]:
tocdir = generate_temporary_directory('test_repo')
tutorial.Database.build_test_repo(tocdir)

# Filtering a single file with pymongo-syntax

In [None]:
repo = h5db.H5repo(tocdir)

### Find based on name/basename
A `name` is the path within the file, the basename is the raw dataset or group name itself:

In [None]:
with h5tbx.H5File(repo[0].filename) as h5:
    print(h5.find({'$dataset': {'$basename': 'ptot'}}))
    print(h5.find({'$dataset': {'$name': '/operation_point/ptot'}}))
    print(h5.find({'$group': {'$basename': 'operation_point'}}))

`find_one` returns the object, not a list of objects:

In [None]:
with h5tbx.H5File(repo[0].filename) as h5:
    print(h5.find_one({'$dataset': {'$basename': 'ptot'}}))

### Find a dataset based on shape or dimension:

In [None]:
with h5tbx.H5File(repo[0].filename) as h5:
    h5.dump()
    print(h5.find({'$dataset': {'$shape': (100,)}}))
    print(h5.find({'$dataset': {'$ndim': 1}}))

In [None]:
with h5tbx.H5File(repo[0].filename) as h5:
    print(h5.find_one({'$dataset': {'$basename': 'ptot'}}))

---
# H5Repo - External link based reository

**NOTE**: This is an old approach and may be removed from the package. The py-mongo-syntax will stay...

Initialize a `H5Repo` object and specify the root directory under which HDF files are placed:

In [None]:
repo = h5db.H5repo(tocdir)

The object creates a `toc` file (toc=table of content) which is a HDF5 file with external links to the found HDF files:

In [None]:
repo.toc_filename.name

The content can be dumped to the screen as a (pandas-) table:

In [None]:
repo.dump(full_path=False)  # minimizes the output (no full folder path is shown)

The entries can be indexed and the file content is shown:

In [None]:
repo[0]

### Filtering

The repository can be filtered in a HDF5-like syntax. First import all filter classes from the module `filter_classes`:

In [None]:
from h5rdmtoolbox.h5database.filter_classes import *
# repo.list_attribute_values('operator', '/')

The filter method requires an object `Entry`. It is the access location within a file, here the group "operation_point" in the root group. In the example the repository is filtered for the attribute "long_name" equal to "Operation point data group". A sub-repository is returend which is again an HDF5 file with external links - but this time only to the HDF files matching the filter request:

In [None]:
%%time
sub_repo = repo.filter(Entry['/operation_point'].attrs['long_name'] == 'Operation point data group')

In [None]:
sub_repo.dump(False)

The elsaped time for the filter request and building the new HDF toc-file is:

In [None]:
sub_repo.elapsed_time  # [s]

Evaluating the sub-repository is quite straight forward as we are still working with HDF5 files. Let's plot data from the filter results:

In [None]:
%%time
import matplotlib.pyplot as plt

plt.figure()
for r in sub_repo:
    with r as h5:
        if 'operation_point' in h5:
            plt.scatter(h5['operation_point']['vfr'].attrs['mean'], h5['operation_point']['ptot'].attrs['mean'])
plt.xlabel('vfr')
plt.ylabel('ptot')
plt.show()

## H5Files - Accessing multiple HDF files

This concepts assumes that we already know the HDF files. This might be a result from above

In [None]:
from h5rdmtoolbox.h5database import H5Files

In [None]:
sub_repo[0:3]

In [None]:
with H5Files(*[sr.filename for sr in sub_repo[0:4]]) as h5files:
    print(h5files.keys())
    h5files[0].dump()