# HDF File as a Database
As many information can be stored in an HDF5 files. By using goups, the structure can be quite nested and sometime it could be helpful to just search for an attribute, a dataset name or a specific property of it, or instance. Thus, an HDF5 file itself is a kind of database in itself.

The h5rdmtoolbox provides methods for H5File (and its subclasses) to find datasets and groups. The query syntax is tried to be as close as possible to the one pymongo uses.

## Filtering a single file with pymongo-syntax

In [None]:
from h5rdmtoolbox import database as h5db
import h5rdmtoolbox as h5tbx

Let's build an example file:

In [None]:
test_filename = h5tbx.generate_temporary_filename()

with h5tbx.H5File(test_filename, 'r+') as h5:
    h5.attrs['attrvalue'] = 3
    h5.attrs['a float'] = 4.1
    h5.attrs['root_attr3'] = 'a string'
    h5.create_group('a group', attrs=dict(attrvalue=14.3))
    h5.create_dataset('x', shape=(2,), units='', standard_name='x_coordinate', attrs=dict(attrvalue=3))
    h5.create_dataset('y', shape=(2,), units='', standard_name='y_coordinate')

### Query syntax

The method to be called to find something in an HDF5 file is find() (or find_one()). What can be foud are datasets or groups. It can also be limited to one of the both objects.

The syntax is very similar to that of pymongo. There's a basic search and a advanced search.

### Basic search

The basic search queries attributes, only. To find an object, a dictionary is passed containing the name of an attribute and the value, e.g.:

In [None]:
with h5tbx.H5File(test_filename) as h5:
    print(h5.find({'attrvalue': 3}))
    r = h5.find_one({'attrvalue': 3})
    print(r)



The above query finds to objects (root group and dataset 'x') based on the attribute "attrvalue", which matches 3.


### Advanced search

The advanced search involveds special keywords starting with a dollar sign. For the key of the dictionary this is intepreted as a class property: `$<class-property>`. A typical property-keyword would be `$basename`, which matches all base-names (name without parent-path). The dictionary value can also be adjusted. This allows other comparisons than "is equal to" as it is the case with the basic search. Let's perform some advanced searches:


In [None]:
with h5tbx.H5File(test_filename) as h5:
    print("{'attrvalue': {'$gt': 0}:\n\t", h5.find({'attrvalue': {'$gt': 0}}))
    print("\n{'standard_name':  {'$regex': '_coordinate$'}}:\n\t", h5.find({'standard_name': {'$regex': '_coordinate$'}}))
    print("\n{'$basename': 'x'}:\n\t", h5.find({'$basename': 'x'}))
    print("\n{'$ndim': 1)}:\n\t", h5.find({'$ndim': 1}))
    print("\n{'$shape': (2, )}:\n\t", h5.find({'$shape': (2, )}))

All query operators (that's what it is called in pymongo) implemented are:

- $gt : greater than

- $gte : geater than equal

- $lt : less than

- $lte : less than equal

- $eq : equal to

- $regex : Filter with regular expression



### Only get dataset or group as return
Pass `$dataset` or `$group` as the second argument

In [None]:
with h5tbx.H5File(test_filename) as h5:
    print(h5.find({'$basename': 'x'}, '$dataset'))

## H5Files - Accessing multiple HDF files

This concepts assumes that we already know the HDF files. We can apply the `find` or `find_one` methods, too:

In [None]:
from h5rdmtoolbox import tutorial
list_of_hdf_filenames = tutorial.Database.generate_test_files(3)
with h5db.H5Files(*list_of_hdf_filenames[0:2]) as h5files:
    print(h5files.keys())
    h5files[0].dump()
    h5files[1].dump()
    print(h5files.find({'$name': '/u'}))