-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression in h5py 3.4.0: fletcher32 filter on variable length strings dataset #1948
Comments
I think the only relevant change in h5py 3.4 is that the pre-built wheels bundle HDF5 1.12.1, and it's a change in HDF5 itself that's affecting you. The 'not suitable for filters' message does seem to have been added in 1.12.1; the original commit is here: From the code, it looks like it will refuse to apply any filters to vlen strings. I wonder if previously filters were either being ignored, or applied to the pointers to vlen data rather than the data itself, making them mostly pointless. You can get in touch with HDF group via help@hdfgroup.org or the HDF forum. |
Actually this is not related to strings, it fails the same e.g. with ints. IIU the code correctly, hdf5 now rejects fletcher32 for any variable length array: bad_for_filters = (H5S_NULL == space_class || H5S_SCALAR == space_class
|| H5T_VLEN == type_class ||
...; (This also causes pytables tests to fail with hdf5-1.10.7-2.fc35.x86_64.) |
hdf5 doesn't like that: h5py/h5py#1948 HDFGroup/hdf5@16349c5 > The 'not suitable for filters' message does seem to have been added > in 1.12.1 From the code, it looks like it will refuse to apply any > filters to vlen strings. This patch is based on the assumption that the hdf5 code is *correct*, and indeed fletcher32 shouldn't be used in that vlarrays. But if hdf5 is wrong, then the fix should be their side. It might make sense to apply this to get the tests passing again, even if ultimately hdf5 is adjusted too. Fixes PyTables#845.
I created a thread at the HDF forum: https://forum.hdfgroup.org/t/fletcher32-filter-on-variable-length-string-datasets-not-suitable-for-filters/9038 |
…er versions of h5py/HDF5 (h5py/h5py#1948)
It seems like what @takluyver hypothesized is true. fletcher32 and compression are pointless with vlen dtype datasets. So this is not an h5py bug. It would still be nice to have a more elaborate error message in h5py. |
I'm hesitant to try to be clever from the h5py side. If we add a check and raise an error before creating the dataset, and then a future version of HDF5 makes checksumming vlen data valid, then that's a bug in h5py. And automatically diagnosing errors after the fact is hard. There are plenty of errors where the message we get from HDF5 is not especially clear or specific (this example is pretty clear compared to some). I'd rather not set a precedent that h5py should be trying to intercept them and provide better error messages, because a) that's a mammoth task, and b) it sounds like a bug minefield. |
That sounds reasonable. Thanks for your help! |
Yeah, I agree… Trying to improve this in h5py would only complicate things in the long run. |
Thanks for being understanding about it! 🙂 |
hdf5 doesn't like that: h5py/h5py#1948 HDFGroup/hdf5@16349c5 > The 'not suitable for filters' message does seem to have been added > in 1.12.1 From the code, it looks like it will refuse to apply any > filters to vlen strings. This patch is based on the assumption that the hdf5 code is *correct*, and indeed fletcher32 shouldn't be used in that vlarrays. But if hdf5 is wrong, then the fix should be their side. It might make sense to apply this to get the tests passing again, even if ultimately hdf5 is adjusted too. Fixes PyTables#845.
Summary of the h5py configuration:
h5py 3.4.0
HDF5 1.12.1
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
sys.platform linux (I am on Ubuntu 20.04.2 LTS)
sys.maxsize 9223372036854775807
numpy 1.19.5
cython (built with) 0.29.24
numpy (built against) 1.17.5
HDF5 (built against) 1.12.1
The following code works with h5py<3.4.0:
With h5py 3.4.0, I get the error:
The error goes away when I remove
fletcher32=True
. But I would like to have that extra check, so this looks like a regression to me.The text was updated successfully, but these errors were encountered: