hdf5plugin
hdf5plugin
allows using additional HDF5 compression filters with h5py for reading and writing compressed datasets.
In order to read compressed dataset with h5py, use:
import hdf5plugin
It registers hdf5plugin
supported compression filters with the HDF5 library used by h5py. Hence, HDF5 compressed datasets can be read as any other dataset (see h5py documentation).
Note
HDF5 datasets compressed with Blosc2 can require additional plugins to enable decompression, such as blosc2-grok or blosc2-openhtj2k. See list of Blosc2 filters and codecs.
As for reading compressed datasets, import hdf5plugin
is required to enable the supported compression filters.
To create a compressed dataset use h5py.Group.create_dataset and set the compression
and compression_opts
arguments.
hdf5plugin
provides helpers to prepare those compression options: Bitshuffle, Blosc, BZip2, FciDecomp, LZ4, SZ, SZ3, Zfp, Zstd.
Sample code:
import numpy
import h5py
import hdf5plugin
# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4())
f.close()
# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()
Relevant h5py documentation: Filter pipeline and Chunked Storage.
Bitshuffle
Blosc
Blosc2
BZip2
FciDecomp
LZ4
SZ
SZ3
Zfp
Zstd
Constants:
Functions:
get_filters
get_config
When imported, hdf5plugin initialises and registers the filters it embeds if there is no already registered filters for the corresponding filter IDs.
h5py gives access to HDF5 functions handling registered filters in h5py.h5z. This module allows checking the filter availability and registering/unregistering filters.
hdf5plugin provides an extra register function to register the filters it provides, e.g., to override an already loaded filters. Registering with this function is required to perform additional initialisation and enable writing compressed data with the given filter.
register
Non h5py or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the HDF5_PLUGIN_PATH
environment variable the value of hdf5plugin.PLUGIN_PATH
, which can be retrieved from the command line with:
python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)"
For instance:
export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)")
should allow MatLab or IDL users to read data compressed using the supported plugins.
Setting the HDF5_PLUGIN_PATH
environment variable allows already existing programs or Python code to read compressed data without any modification.