Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creation of expandable external datasets #2398

Merged
merged 3 commits into from Mar 22, 2024

Conversation

ajelenak
Copy link
Contributor

@ajelenak ajelenak commented Mar 21, 2024

External HDF5 datasets can be expanded along their first dimension but cannot have chunked layout. This PR avoids setting chunked layout when creating a dataset if both maxshape and external keyword arguments are present.

Fixes #2396.

Copy link

codecov bot commented Mar 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.53%. Comparing base (d051d24) to head (e506268).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2398   +/-   ##
=======================================
  Coverage   89.53%   89.53%           
=======================================
  Files          17       17           
  Lines        2380     2380           
=======================================
  Hits         2131     2131           
  Misses        249      249           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

h5py/_hl/filters.py Outdated Show resolved Hide resolved
@tacaswell
Copy link
Member

Should there also be work upstream in libhdf5 to reject creating these broken data sets?

With this change can the user still explicitly ask for a broken dataset?

@takluyver
Copy link
Member

I think HDF5 already rejects the combination of chunked & external storage (from #2394: ValueError: Unable to synchronously create dataset (external storage not supported with chunked layout)). But external & expandable (along axis 0) should work.

@tacaswell
Copy link
Member

I think HDF5 already rejects the combination of chunked & external storage

But then how did the user end up with a hdf5 header which is invalid? Or am I confused by what is going on and we using this code on access?

@ajelenak
Copy link
Contributor Author

Indeed, in other cases I tried libhdf5 raised errors.

@ajelenak
Copy link
Contributor Author

That HDF5 file is not invalid. I got educated afterward about expanding external dataset along first dimension without the need for chunking. That's why I made this PR, to allow doing so from h5py.

@tacaswell
Copy link
Member

That HDF5 file is not invalid.

Will this fix our reading as well then?

@takluyver
Copy link
Member

The code involved here is only called when creating datasets, so I don't believe it will fix reading. @ajelenak spotted a separate issue while investigating #2394, which remains mysterious AFAIK.

ajelenak and others added 2 commits March 22, 2024 11:58
Co-authored-by: Thomas Kluyver <takowl@gmail.com>
@ajelenak ajelenak merged commit 0020ab9 into h5py:master Mar 22, 2024
41 checks passed
@ajelenak ajelenak deleted the fix-issue-2396 branch March 22, 2024 19:22
@takluyver takluyver added this to the 3.11 milestone Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix maxshape and external keywords interaction when creating an external dataset
3 participants