# Tutorial

The following examples demonstrate how to use the ``high5py`` module to interact with HDF5 files quickly and easily.

## Setup

To set up for the tutorial, we first we import all the necessary modules:

In [None]:
import glob
import os
import numpy as np
import high5py as hi5

Then we remove any test files that may have been generated by a previous run of the tutorial:

In [None]:
for filepath in glob.glob('test*.h5'):
    os.remove(filepath)
for filepath in glob.glob('test*.npz'):
    os.remove(filepath)

Finally, we create some data that we will later save to disk:

In [None]:
x = np.random.random((10, 20))
y = 2. * x.T
z = x ** 2.

## Saving data

We can save data using the `save_dataset` function, which **overwrites the file by default** (to avoid this, see the section below on appending data):

In [None]:
hi5.save_dataset('test.h5', x)

Using the syntax above, the data is saved to the file "test.h5" with the default dataset name "data."
We can check this using the `info` function:

In [None]:
hi5.info('test.h5')

We see that as expected, the file ``test.h5`` contains a single dataset called "data."
To save the dataset with a custom name, we use the ``name`` parameter:

In [None]:
hi5.save_dataset('test.h5', x, name='x')
hi5.info('test.h5')

Now the dataset is called "x."

## Appending data (safe saving)

By default, `save_dataset` overwrites files.
To add a dataset to the file, we use `append_dataset`, which is equivalent to calling `save_dataset` with ``overwrite=False``:

In [None]:
hi5.append_dataset('test.h5', y, name='y')
hi5.info('test.h5')

Note that we can use `append_dataset` to save a new file as well:

In [None]:
hi5.append_dataset('test2.h5', y, name='y')
hi5.info('test2.h5')

However, if we attempt to append a dataset "y" to a file that already contains a dataset with that name, we will get an error:

In [None]:
try:
    hi5.append_dataset('test2.h5', y, name='y')
except RuntimeError as err:
    print('RuntimeError: {}'.format(err))

As such, users who wish to avoid overwriting files and/or datasets can use `append_dataset` as a safer alternative to `save_dataset`.

## Replacing data

It is sometimes desirable to overwrite a single dataset without overwriting the entire file. 
This can be done using `replace_dataset`, which deletes the existing dataset and replaces it with the specified one.
Here we will replace the dataset "x" with a scalar value of 0.:

In [None]:
hi5.replace_dataset('test.h5', 0., name='x')
hi5.info('test.h5', name='x')

Now we will replace it with its original values:

In [None]:
hi5.replace_dataset('test.h5', x, name='x')
hi5.info('test.h5', name='x')

## Saving data with descriptions

Since one of the advantages of HDF5 is that it is a self-describing file format, ``high5py`` provides an easy way to add descriptions when saving datasets.
To do so, simply use the ``description`` parameter (available for both `save_dataset` and `append_dataset`):

In [None]:
hi5.save_dataset('test.h5', x, name='x', description='x data')
hi5.append_dataset('test.h5', x, name='y', description='y data')

We can check the value of the dataset descriptions by using the `info` function with the appropriate ``name`` value:

In [None]:
hi5.info('test.h5', name='x')
print()
hi5.info('test.h5', name='y')

## Saving data in groups

We can also save data in groups by using the ``name`` parameter:

In [None]:
hi5.append_dataset('test.h5', x, name='xy_group/x')
hi5.append_dataset('test.h5', y, name='xy_group/y')

Now we see that ``test.h5`` contains two datasets ("x" and "y") and a group ("xy_group") at the root level:

In [None]:
hi5.info('test.h5')

We can get info on the contents of the group using the ``info`` function with the ``name`` parameter:

In [None]:
hi5.info('test.h5', name='xy_group')

## Loading data

Loading data is simple using `hi5.load_dataset`:

In [None]:
x_load = hi5.load_dataset('test.h5', name='x')
print('Max diff b/w orig and loaded x: {:.2e}'.format(np.abs(x - x_load).max()))
y_load = hi5.load_dataset('test.h5', name='xy_group/y')
print('Max diff b/w orig and loaded y: {:.2e}'.format(np.abs(y - y_load).max()))

Note that the ``name`` parameter defaults to "data," so that `save_dataset` and `load_dataset` have compatible defaults:

In [None]:
hi5.save_dataset('test_defaults.h5', x)
x_load = hi5.load_dataset('test_defaults.h5')
print('Max diff b/w orig and loaded x: {:.2e}'.format(np.abs(x - x_load).max()))


## Querying files

Sometimes it is useful to query a dataset and look at its contents.
As we have seen above, we can use `info` to get info on groups and datasets.  If we set ``return_info=True``, then we can also return a dictionary of the results:

In [None]:
print('FILE/ROOT INFO:')
hi5.info('test.h5')
print('\nGROUP INFO:')
hi5.info('test.h5', name='xy_group')
print('\nDATASET INFO:')
info = hi5.info('test.h5', name='xy_group/x', return_info=True)
print('\nDATASET INFO DICT:', info)

We can also check for the existence of a particular dataset or group using `exists`:

In [None]:
print('Dataset x exists:', hi5.exists('test.h5', 'x'))
print('Dataset z exists:', hi5.exists('test.h5', 'z'))

Finally, we can use `list_all` to recursively list the contents of a file or group, using the ``return_info`` parameter to return a dictionary of the results:

In [None]:
print('FILE/ROOT INFO:')
info = hi5.list_all('test.h5')
print('\nGROUP INFO:')
info = hi5.list_all('test.h5', name='xy_group', return_info=True)
print('\nGROUP INFO DICT:')
print(info)

## Saving attributes

As alluded to above, part of what makes HDF5 a self-describing file format is that groups and datasets can have associated attributes.
We can use `save_attributes` or `append_attributes` to add attributes to a group or dataset, with the former overwriting any existing attributes and the latter simply adding to them:

In [None]:
hi5.save_dataset('test.h5', 'x', name='x')
print('DATA W/O ATTRIBUTES')
hi5.info('test.h5', 'x')
hi5.save_attributes('test.h5', {'units': 'm/s', 'num_pts': x.size}, name='x')
print('\nDATA W/ATTRIBUTES')
hi5.info('test.h5', 'x')
hi5.append_attributes('test.h5', {'color': 'red'}, name='x')
print('\nDATA W/ADDED ATTRIBUTES')
hi5.info('test.h5', 'x')

## Renaming or deleting objects

We can easily rename a dataset or group using `rename`:

In [None]:
print('\nORIGINAL DATA')
hi5.info('test.h5')
hi5.info('test.h5', 'x')
print('\nRENAMED DATA')
hi5.rename('test.h5', 'x', 'x_new')
hi5.info('test.h5')
hi5.info('test.h5', 'x_new')

Similarly, we can delete a dataset or group using `delete`:

In [None]:
print('\nDELETED DATA')
hi5.delete('test.h5', 'x_new')
hi5.info('test.h5')

## Working with NPZ files

Sometimes when collaborating, it is useful to have code with as few dependencies as possible.
To help with that, ``high5py`` offers methods for converting HDF5 files to and from NPZ (numpy archive) format.
For instance, the following code saves data to HDF5, then converts the entire contents of that file to NPZ using `to_npz`:

In [None]:
hi5.save_dataset('test.h5', x, name='xy_group/x')
hi5.append_dataset('test.h5', y, name='xy_group/y')
hi5.append_dataset('test.h5', z, name='z1')
hi5.append_dataset('test.h5', 2. * z, name='z2')
hi5.to_npz('test.h5', 'test_all.npz')

We can also save single groups/datasets, or lists of groups/datasets:

In [None]:
hi5.to_npz('test.h5', 'test_z1.npz', name='z1')
hi5.to_npz('test.h5', 'test_z.npz', name=['z1', 'z2'])
hi5.to_npz('test.h5', 'test_xy_group.npz', name='xy_group')

To load data in an NPZ file, we can use the following syntax, noting that since NPZ files don't support groups, group/dataset paths have been altered by replacing slashes with underscores:

In [None]:
with np.load('test_all.npz', 'r') as data:
    print('NPZ contents:', data._files)
    x = data['xy_group_x']
    y = data['xy_group_y']
    z1 = data['z1']
    z2 = data['z2']
with np.load('test_z1.npz', 'r') as data:
    print('NPZ contents:', data._files)
    z1 = data['z1']
with np.load('test_z.npz', 'r') as data:
    print('NPZ contents:', data._files)
    z1 = data['z1']
    z2 = data['z2']
with np.load('test_xy_group.npz', 'r') as data:
    print('NPZ contents:', data._files)
    x = data['x']
    y = data['y']

When converting an NPZ file to HDF5, array names are preserved:

In [None]:
np.savez_compressed('test.npz', x_npz=x, y_npz=y)
hi5.from_npz('test.npz', 'test.h5')
hi5.info('test.h5')

## Cleanup

We finish by removing any generated test files:

In [None]:
for filepath in glob.glob('test*.h5'):
    os.remove(filepath)
for filepath in glob.glob('test*.npz'):
    os.remove(filepath)