# Import Module

Start by importing the `h5io_browser` module:

In [1]:
import h5io_browser as hb

From the `h5io_browser` module the `Pointer()` object is created to access a new HDF5 file named `new.h5`:

In [2]:
hp = hb.Pointer(file_name="new.h5")

# Write Data 

For demonstration three different objects are written to the HDF5 file: 

* a list with the numbers one and two is stored in the HDF5 path `data/a_list`
* an integer number is stored in the HDF5 path `data/an_integer_number`
* a dictionary is stored in the HDF5 path `data/sub_path/a_dictionary`

This can either be done using the edge notation, known from accessing python dictionaries, or alternatively using the `write_dict()` function which can store multiple objects in the HDF5 file, while opening it only once.

In [3]:
hp["data/a_list"] = [1, 2]
hp.write_dict(data_dict={
    "data/an_integer_number": 3,
    "data/sub_path/a_dictionary": {"d": 4, "e": 5},
})

# Read Data 

One strength of the `h5io_browser` package is the support for interactive python environments like, Jupyter notebooks. To browse the HDF5 file by executing the `Pointer()` object:

In [4]:
hp

<h5io_browser.pointer.Pointer at 0x105cc9e50>

In comparison the string representation lists the `file_name`, `h5_path` as well as the `nodes` and `groups` at this `h5_path`:

In [5]:
str(hp)

'Pointer(file_name="/Users/jan/notebooks/2023/2023-12-27-hdf5-browser/new.h5", h5_path="/") {\'groups\': [\'data\'], \'nodes\': []}'

List content of the HDF5 file at the current `h5_path` using the `list_all()` function: 

In [6]:
hp.list_all()

['data']

In analogy the `groups` and `nodes` of any `h5_path` either relative to the current `h5_path` or as absolute `h5_path` can be analysed using the `list_h5_path()`:

In [7]:
hp.list_h5_path(h5_path="data")

{'groups': ['sub_path'], 'nodes': ['a_list', 'an_integer_number']}

To continue browsing the HDF5 file the edge bracket notation can be used, just like it s commonly used for python dictionaries to browse the HDF5 file:

In [8]:
hp["data"].list_all()

['a_list', 'an_integer_number', 'sub_path']

The object which is returned is again a Pointer with the updated `h5_path`, which changed from `/` to `/data`:

In [9]:
hp.h5_path, hp["data"].h5_path

('/', '/data')

Finally, individual nodes of the HDF5 file can be loaded with the same syntax using the `/` notation known from the file system, or by combining multiple edge breakets:

In [10]:
hp["data/a_list"], hp["data"]["a_list"]

([1, 2], [1, 2])

# Convert to Dictionary 

To computationally browse through the contents of an HDF5 file, the `to_dict()` method extends the interactive browsing capabilities. By default it returns a flat dictionary with the keys representing the `h5_path` of the individual nodes and the values being the data stored in these nodes. Internally, this loads the whole tree structure, starting from the current `h5_path`, so depending on the size of the HDF5 file this can take quite some time:

In [11]:
hp.to_dict()

{'data/a_list': [1, 2],
 'data/an_integer_number': 3,
 'data/sub_path/a_dictionary': {'d': 4, 'e': 5}}

An alternative representation, is the hierarchical representation which can be enabled by the `hierarchical` being set to `True`. Then the data is represented as a nested dictionary: 

In [12]:
hp.to_dict(hierarchical=True)

{'data': {'a_list': [1, 2],
  'an_integer_number': 3,
  'sub_path': {'a_dictionary': {'d': 4, 'e': 5}}}}

# With Statement

For compatibility with other file access methods, the `h5io_browser` package also supports the with statement notation. Still technically this does not change the behavior, even when opened with a with statement the HDF5 file is closed between individual function calls.

In [13]:
with hb.Pointer(file_name="new.h5") as hp:
    print(hp["data/a_list"])

[1, 2]


# Delete Data

To delete data from an HDF5 file using the `h5io_browser` the standard python `del` function can be used in analogy to deleting items from a python dictionary. To demonstrate the deletion a new node is added named `data/new/entry/test`:

In [14]:
hp["data/new/entry/test"] = 4

To list the node, the `to_dict()` function is used with the `hierarchical` parameter to highlight the nested structure:

In [15]:
hp["data/new"].to_dict(hierarchical=True)

{'entry': {'test': 4}}

The node is then deleted using the `del` function. While this removes the node from the index the file size remains the same, which is one of the limitations of the HDF5 format. Consequently, it is not recommended to create and remove nodes in the HDF5 files frequently: 

In [16]:
hp.file_size()

18484

In [17]:
del hp["data/new/entry/test"]

In [18]:
hp.file_size()

18484

Even after the deletion of the last node the groups are still included in the HDF5 file. They are not listed by the `to_dict()` function, as it recursively iterates over all nodes below the current `h5_path`:

In [19]:
hp["data/new"].to_dict(hierarchical=True)

{}

Still with the `list_all()` function lists all nodes and groups at a current `h5_path` including empty groups, like the `entry` group in this case: 

In [20]:
hp["data/new"].list_all()

['entry']

To remove the group from the HDF5 file the same `del` command is used:

In [21]:
del hp["data/new"]

After deleting both the newly created groups and their nodes the original hierarchy of the HDF5 file is restored:

In [22]:
hp.to_dict(hierarchical=True)

{'data': {'a_list': [1, 2],
  'an_integer_number': 3,
  'sub_path': {'a_dictionary': {'d': 4, 'e': 5}}}}

Still even after deleting the nodes from the HDF5 file, the file size remains the same: 

In [23]:
hp.file_size()

18484

# Loop over Nodes

To simplify iterating recursively over all nodes contained in the selected `h5_path` the `Pointer()` object can be used as iterator:

In [24]:
hp_data = hp["data"]
{h5_path: hp_data[h5_path] for h5_path in hp_data}

{'a_list': [1, 2],
 'an_integer_number': 3,
 'sub_path/a_dictionary': {'d': 4, 'e': 5}}

# Copy Data

In addition to adding, browsing and removing data from an existing HDF5 file, the `Pointer()` object can also be used to copy data inside a given HDF5 file or copy data from one HDF5 file to another. A new HDF5 file is created, named `copy.h5`:

In [25]:
hp_copy = hb.Pointer(file_name="copy.h5")

The data is transferred from the existing `Pointer()` object to the new HDF5 file using the `copy_to()` functions:

In [26]:
hp["data"].copy_to(hp_copy)
hp_copy

<h5io_browser.pointer.Pointer at 0x108f46050>