New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a interface for registering filter plugins #928
Comments
One way is through setuptools entrypoints. Kind of like pytest registers plugins for it, IIRC. |
@aragilar Any thoughts/progress re: this? |
I am waiting for HDF5 1.10.2 to be out to see if one can properly deal with plugins. In principle that version should allow us to check what plugins are available and to add new ones as needed thus solving the problems encountered in the hdf5plugin module (it has to be imported prior to import h5py, it cannot add plugins selectively...) |
HDF5 lib now provides a set of procedures to customize plugin discovery: |
@tacaswell @takluyver What do you think of providing a set of pre-built popular plugins as a separate pip-installable package, something like https://pypi.org/project/hdf5plugin? With the new Maybe even better would be to package each plugin as a separate pip package, so user has more fine-grained control over which plugins to install (note that they generally have different licenses). h5py's LZF implementation could be extracted into a one separate package too. |
Are there particular plugins you have in mind? My use cases so far haven't
involved any plugins, except experimenting with the lzf filter.
…On Fri, 21 Dec 2018, 9:17 a.m. Andrey Paramonov ***@***.*** wrote:
@tacaswell <https://github.com/tacaswell> @takluyver
<https://github.com/takluyver> What do you think of providing a set of
pre-built popular plugins as a separate pip-installable package, something
like https://pypi.org/project/hdf5plugin?
With the new H5PL* procedures it seems that the restriction "import
hdf5plugin *before* importing h5py" can be relieved.
Maybe even better would be to package each plugin as a separate pip
package, so user has more fine-grained control over which plugins to
install (note that they generally have different licenses). h5py's LZF
implementation could be extracted a one separate package too.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#928 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAUA9dqm6o6JLnDcl09AdBrMe4i0xPbwks5u7KcMgaJpZM4Pb4vc>
.
|
At this point in hdf5plugin we have put the plugins needed to read data from our detectors. Please consider that in order to be able to supply plugin filters usable everywhere we had to patch the direct calls of those plugins to the HDF5 library under linux and MacOS. If not, they are bound to a particular version of the HDF5 library. The price to pay is that one can only use those filters for reading under linux and MacOS. If one could ship them built with the same library as h5py then the limitation would disappear. |
I forgot to add that we would be willing to help. |
@takluyver So far I had great success with byte- (or bit-) shuffle + LZ4 combo: my data (mass spectra) gets typically compressed ~10x (slightly worse than zlib), compression speed is fast (unlike zlib), and decompression speed beats uncompressed read from SSD. @vasole The inconvenient architecture of HDF5 filter plugins (HDF5 library dynamically loads plugins, but plugins link against and dynamically load the HDF5 library) is a known problem, unfortunately: |
Well, the LZ4 filter plugin does not suffer from that problem while the Bitshuflle+LZ4 does. My understanding is that if a filter follows the directives of the HDF5 group it does not need to link to the library. |
Sorry, comment updated. |
@vasole Quite contrary, see H5allocate_memory docs: But in practice I found that much more fragile compared to just ensuring the same memory manager is used inside the library and the plugin (on POSIX that's 99.9% reliable assumption anyway). And many plugin authors (incl. your obedient servant) do ignore the guideline and just go ahead with Another aspect is inspecting the data type/data space. Byte-level compressors rarely need this but it's crucial for e.g. fpzip or MAFISC. For that, linking to HDF5 is the only elegant option (duplicating data type/data space description in the filter parameters not considered "elegant"; plus the filter parameter interfaces are already settled). |
Aha, so it's mostly about different compression filters? If you're interested, go ahead. I haven't investigated the H5PL machinery, but it sounds like that gives you a neat way to make it usable. Does this require any change in h5py itself? I'm not currently interested in working on compression plugins myself at the moment - we want our data files to be readily readable to different tools written in different languages, not just Python, so for now the downsides of requiring a plugin to read data outweigh the benefits. |
1 similar comment
Aha, so it's mostly about different compression filters? If you're interested, go ahead. I haven't investigated the H5PL machinery, but it sounds like that gives you a neat way to make it usable. Does this require any change in h5py itself? I'm not currently interested in working on compression plugins myself at the moment - we want our data files to be readily readable to different tools written in different languages, not just Python, so for now the downsides of requiring a plugin to read data outweigh the benefits. |
@vasole Thanks for lettings us know (and for your filter work). |
Currently filters either need to be either built into HDF5, packaged as part of h5py, or the filter provider has to modify HDF5_PLUGIN_PATH, none of which are ideal, and can lead to hard-to-debug problems (see #923). There exists a function which appends to the internal path (rather than an environment variable), which h5py currently does not wrap. We should wrap this, provide fallbacks for older HDF5 versions, and provide a way of getting the current path (if possible).
See silx-kit/hdf5plugin#14 for further discussion about what the interface should look like.
The text was updated successfully, but these errors were encountered: