# File exploration

To find data quickly or to collect certain objects the `h5RDMtoolbox` provides helpful methods. It follows mainly the `pymongo`-syntax.

In [1]:
import h5rdmtoolbox as h5tbx

As always, let's build a HDF5 file from scratch:

In [2]:
with h5tbx.H5File() as h5:
    h5.create_group('grp_1')
    h5.create_group('grp_2', long_name='my other group',
                    attrs=dict(one=2, two='a second attr'))
    h5.create_dataset('ds_1', shape=(2, 4), units='', long_name='dataset 1',
                      attrs=dict(mean=1.5))
    h5.create_dataset('ds_2', shape=(2, 4), units='', long_name='dataset 2',
                      attrs=dict(mean=2.0))
    h5.create_dataset('gr_1/ds_1', shape=(2, 4), units='', long_name='dataset 2',
                      attrs=dict(mean=1.5))
    filename = h5.hdf_filename

## Find
Syntax follows `pymongo`-syntax but not all queries may be implemented. Currently available:<br>
$\$$gt, $\$$gte, $\$$lt, $\$$lte, $\$$regex

You can decide to find only one appearance (`find_one`) or to find all (`find`).

### Find all groups/datasets in a level (or recursive through the file):
Use key "$\$$group" or "$\$$dataset". If the value is an empty string, all groups/datasets are returned:

In [3]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$group': ''}, rec=True))
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$dataset': ''}, rec=True))

[<HDF5 group "/gr_1" (1 members)>, <HDF5 group "/grp_1" (0 members)>, <HDF5 group "/grp_2" (0 members)>]
[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">, <HDF5 dataset "ds_2": shape (2, 4), type "<f4">, <HDF5 dataset "ds_1": shape (2, 4), type "<f4">]


### Find **one** specific group/dataset:

In [4]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$group': 'grp_1'}, rec=True))
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$dataset': 'ds_1'}, rec=True))

[<HDF5 group "/grp_1" (0 members)>]
[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">]


To use **regex** the value must be dict itself with key `$regex`:

In [5]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$group': {'$regex': 'grp_[0-9]'}}, rec=True))
with h5tbx.H5File(filename) as h5:
    print(h5.find({'$dataset': {'$regex': 'ds_[0-9]'}}, rec=True))

[<HDF5 group "/grp_1" (0 members)>, <HDF5 group "/grp_2" (0 members)>]
[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">, <HDF5 dataset "ds_2": shape (2, 4), type "<f4">, <HDF5 dataset "ds_1": shape (2, 4), type "<f4">]


### Find based on attributes:

In [6]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'long_name': 'dataset 1'}, rec=True))

[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">]


In [7]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'long_name': 'dataset 1'}, rec=True))

[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">]


In [8]:
with h5tbx.H5File(filename) as h5:
    print(h5.find({'mean': {'$gt': 1.}}, rec=True))

[<HDF5 dataset "ds_1": shape (2, 4), type "<f4">, <HDF5 dataset "ds_2": shape (2, 4), type "<f4">, <HDF5 dataset "ds_1": shape (2, 4), type "<f4">]
