Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline_threshold kwarg not changing result of SingleHdf5ToZarr call #445

Closed
TomNicholas opened this issue Apr 5, 2024 · 8 comments · Fixed by #456
Closed

inline_threshold kwarg not changing result of SingleHdf5ToZarr call #445

TomNicholas opened this issue Apr 5, 2024 · 8 comments · Fixed by #456

Comments

@TomNicholas
Copy link

TomNicholas commented Apr 5, 2024

I'm trying to create a reference dict with inlined data, but the inline_threshold argument to SingleHdf5ToZarr doesn't seem to be doing anything. I feel like I'm missing something obvious.

In [1]: import xarray as xr

In [2]: ds = xr.tutorial.open_dataset('air_temperature')

In [3]: ds.to_netcdf('air.nc')
/Users/tom/miniconda3/envs/dev3.11/lib/python3.12/site-packages/IPython/core/interactiveshell.py:3548: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs
  exec(code_obj, self.user_global_ns, self.user_ns)

In [4]: from kerchunk.hdf import SingleHdf5ToZarr

In [5]: refs = SingleHdf5ToZarr('air.nc', spec=1, inline_threshold=0).translate()

In [6]: refs_inlined = SingleHdf5ToZarr('air.nc', spec=1, inline_threshold=1e9).translate()

In [7]: refs == refs_inlined
Out[7]: True

I was expecting these two dictionaries to be different - refs_inlined to have actual data values in it.

@TomNicholas TomNicholas changed the title inline_threshold not doing anything inline_threshold not doing anything with SingleHdf5ToZarr Apr 5, 2024
@TomNicholas TomNicholas changed the title inline_threshold not doing anything with SingleHdf5ToZarr inline_threshold kwarg not changing result of SingleHdf5ToZarr call Apr 5, 2024
@norlandrhagen
Copy link
Contributor

I just ran your example with kerchunk==0.2.2 and got inlining and refs == refs_inlined to be False. @TomNicholas just to check, were you running kerchunk==0.2.4?

Unless this is behavior is expected, it seems like there might be an issue with in-lining introduced in 0.2.3.

@TomNicholas
Copy link
Author

TomNicholas commented Apr 5, 2024

Good question @norlandrhagen - I should have reported the version

I used

In [9]: kerchunk.__version__
Out[9]: '0.2.2.post55'

What does that post55 even mean?

@norlandrhagen
Copy link
Contributor

No idea! We could try pinning 0.2.2 and running the test suite.

@TomNicholas
Copy link
Author

The CI in https://github.com/TomNicholas/VirtualiZarr/actions/runs/8572561510/job/23495399856?pr=73 looks like it created a references dict with no inlining (i.e. the same behaviour that I see locally). And that's with kerchunk=0.2.4

@norlandrhagen
Copy link
Contributor

norlandrhagen commented Apr 5, 2024

kerchunk<=0.2.2 worked. kerchunk>=0.2.3 failed to create inlining. 🤷

norlandrhagen added a commit to zarr-developers/VirtualiZarr that referenced this issue Apr 5, 2024
Testing PR to pin kerchunk==0.2.2 to see if inlining references works.

fsspec/kerchunk#445
@martindurant
Copy link
Member

Let's make a reproducer in the form of a test. This rings a bell, something I thought I had fixed.

@TomNicholas
Copy link
Author

The code above is a reproducer of the bug.

I would be happy to make a PR, but I would need guidance on how to actually fix the bug.

@jsignell
Copy link
Contributor

I just came to this issue and tried to reproduce the original issue on main. It works as expected there, but I did see the behavior that Tom is describing on 2.3.0 and 2.4.0. Seems likely that it was fixed here: #432 which was just after the 2.4.0 release.

I wrote a test with a little subset of air.nc (ds.sel(time="2014-10", lat=slice(100, 50), lon=slice(250, 300)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants