Parametrized tests for netcdf encoding options #274

rabernat · 2022-12-19T14:30:05Z

This PR adds a test to verify that kerchunk can decode all possible flavors of netcdf4.

This reveals that fletcher32 decoding is broken, despite #34 and #35. Attempting to decode data written with zlib=True and fletcher32=True always results in

        zlib.error: Error -5 while decompressing data: incomplete or truncated stream

xref pydata/xarray#7388

martindurant · 2022-12-19T15:15:36Z

Thanks for making sure this got tested.
The documentation misled me: the 4-byte Fletcher checksum is added to the byte buffer before compression, not after, it seems.

rabernat · 2022-12-19T15:20:21Z

Awesome! Great to hear that you can see a solution. Feel free to push to my branch if you want to just fix it in this PR.

martindurant · 2022-12-19T15:20:55Z

Already did :)

rabernat · 2022-12-19T15:24:09Z

😆 you're too fast Martin!

A great next step would be to actually implement the real checksum filter in numcodecs, rather than a dummy passthrough.

I found implementations here: https://github.com/njaladan/hashpy/blob/master/hashpy/fletcherNbit.py

martindurant · 2022-12-19T15:26:21Z

Agree - but kerchunk is all about fast and functional!

https://en.wikipedia.org/wiki/Fletcher%27s_checksum#Fletcher-32

https://gist.github.com/AJPoulter-Soton/9d0d2505af64f0719bdee59b9a4533ba ?

rabernat · 2022-12-19T15:27:22Z

I can't imagine that computational cost of fletcher32 would be comparable to the actual decompression step (not to mention the i/o itself).

rabernat · 2022-12-19T15:31:04Z

kerchunk/codecs.py

+        return buff[:-4]
+
+    def encode(self, buf):
+        pass


Wouldn't you want to append 4 empty bytes here, just in case someone tried to use it in a round-trip context?

Hm, I'm not sure. Maybe raising NotImplemented is the correct thing to do, as the output can't be used by anything that really does the check.

martindurant · 2022-12-20T16:49:25Z

Incoming new codec: zarr-developers/numcodecs#412 (so there would be no need for one here once that is in)

martindurant · 2023-01-15T18:15:39Z

I can change this to point to the new fletcher codec in numcodecs. Is the null version, skipping the CPU time to do the comparison, useful? I suppose fletcher ought to be pretty cheap to compute.

rabernat · 2023-01-15T18:41:32Z

I personally consider it quite dangerous to pass through the data without actually checking the checksum. Since we now have a fast fletcher32 codec in numcodecs, I see no reason to keep the passthrough codec.

Profiling would be easy to do if we want some actual numbers.

martindurant · 2023-01-16T01:45:12Z

We'll need a numcodecs release anyway

rabernat and others added 3 commits December 19, 2022 09:28

add parametrized tests for netcdf encoding options

90dbb68

pre-commit

5d8ae21

fix fletcher

6115010

rabernat commented Dec 19, 2022

View reviewed changes

rabernat mentioned this pull request Dec 19, 2022

Fletcher checksum codec zarr-developers/numcodecs#410

Closed

rabernat mentioned this pull request Dec 20, 2022

implement fletcher32 zarr-developers/numcodecs#412

Merged

7 tasks

martindurant mentioned this pull request Oct 24, 2023

Do not handle shuffle twice #383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parametrized tests for netcdf encoding options #274

Parametrized tests for netcdf encoding options #274

rabernat commented Dec 19, 2022 •

edited

martindurant commented Dec 19, 2022 •

edited

rabernat commented Dec 19, 2022

martindurant commented Dec 19, 2022

rabernat commented Dec 19, 2022 •

edited

martindurant commented Dec 19, 2022

rabernat commented Dec 19, 2022

rabernat Dec 19, 2022

martindurant Dec 19, 2022

martindurant commented Dec 20, 2022

martindurant commented Jan 15, 2023

rabernat commented Jan 15, 2023

martindurant commented Jan 16, 2023

Parametrized tests for netcdf encoding options #274

Are you sure you want to change the base?

Parametrized tests for netcdf encoding options #274

Conversation

rabernat commented Dec 19, 2022 • edited

martindurant commented Dec 19, 2022 • edited

rabernat commented Dec 19, 2022

martindurant commented Dec 19, 2022

rabernat commented Dec 19, 2022 • edited

martindurant commented Dec 19, 2022

rabernat commented Dec 19, 2022

rabernat Dec 19, 2022

Choose a reason for hiding this comment

martindurant Dec 19, 2022

Choose a reason for hiding this comment

martindurant commented Dec 20, 2022

martindurant commented Jan 15, 2023

rabernat commented Jan 15, 2023

martindurant commented Jan 16, 2023

rabernat commented Dec 19, 2022 •

edited

martindurant commented Dec 19, 2022 •

edited

rabernat commented Dec 19, 2022 •

edited