Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve h5py.Dataset filter settings on export #153

Merged
merged 17 commits into from
Jan 12, 2024
Merged

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented Dec 31, 2023

Motivation

Fix #152 Preserve h5py.Dataset filter settings on export

How to test the behavior?

  • Add unit tests
  • Add functions to ZarrDataIO to wrap h5py.Dataset and infer filter settings
  • Update ZarrIO to wrap h5py.Dataset on write to preserve filter settings

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov-commenter
Copy link

codecov-commenter commented Jan 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9692a98) 84.51% compared to head (4bdb47a) 85.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #153      +/-   ##
==========================================
+ Coverage   84.51%   85.07%   +0.56%     
==========================================
  Files           5        5              
  Lines        1104     1146      +42     
  Branches      296      311      +15     
==========================================
+ Hits          933      975      +42     
  Misses        115      115              
  Partials       56       56              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@oruebel oruebel marked this pull request as ready for review January 1, 2024 07:51
@oruebel oruebel requested a review from rly January 1, 2024 07:52
@oruebel oruebel added the category: enhancement improvements of code or code behavior label Jan 1, 2024
@oruebel oruebel added this to the Next Release milestone Jan 1, 2024
@oruebel
Copy link
Contributor Author

oruebel commented Jan 1, 2024

@sinha-r @rly this PR is related to the cloud project to allow conversion from HDF5 to Zarr while preserving chunking and compression settings if possible

@oruebel
Copy link
Contributor Author

oruebel commented Jan 10, 2024

@rly @mavaylon1 can you please review

@rly
Copy link
Contributor

rly commented Jan 10, 2024

The code looks good to me. There are a bunch of branches in the code that translates the compression. I know it's tedious, but could you write tests for those branches? That would help make sure those lines work as intended and prevent our code coverage from dropping 1% to 83.4%. We might want to include hdmf-zarr in our core packages for the U24 at some point.

@oruebel
Copy link
Contributor Author

oruebel commented Jan 10, 2024

There are a bunch of branches in the code that translates the compression. I know it's tedious, but could you write tests for those branches?

The main reason I didn't add tests for all the branches is because in order to cover those cases I believe we need to install custom HDF5 compressors, i.e., add hdf5_plugins as a dependency just to run tests and szip may not be possible to test because it requires a license. In principle, we just need a HDF5 dataset that looks like it uses those filters, but I'm not sure if we can "fake" that easily.

@rly
Copy link
Contributor

rly commented Jan 10, 2024

add hdf5_plugins as a dependency just to run tests

I think that is OK. It would be added to https://github.com/hdmf-dev/hdmf-zarr/blob/dev/requirements-dev.txt

szip may not be possible to test because it requires a license

We can just exclude that one from coverage

@oruebel
Copy link
Contributor Author

oruebel commented Jan 11, 2024

@rly Can you please take another look. I added a new unit test module for ZarrDataIO. As mentioned above, I had to add hdf5plugin as a requirement for these tests. I added @unittest.skipIf to the test cases that use hdf5plugin just in case the library is not installed. The tests now cover 100% of the patch.

Screen Shot 2024-01-11 at 1 31 26 AM

tests/unit/test_zarrdataio.py Outdated Show resolved Hide resolved
@rly
Copy link
Contributor

rly commented Jan 11, 2024

Thanks! Looks good to me.

@oruebel oruebel merged commit 71bfa6d into dev Jan 12, 2024
24 checks passed
@oruebel oruebel deleted the enh/pres_h5_filters branch January 12, 2024 00:15
@oruebel oruebel mentioned this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Preserve HDF5 dataset filter settings on export to Zarr if possible
3 participants