-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: correctly pad netCDF files with null bytes #8170
Conversation
Per the netCDF spec, variable names and attributes should be padded with null padding to the nearest 4-byte boundary, but scipy has been padding with b'0' instead: http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html This surfaced recently after the netCDF-C library added more stringent checks in version 4.5.0 which rendered it unable to read files written with SciPy: Unidata/netcdf-c#657 This change to netCDF-C is likely going to be rolled back, but it's still a good idea for us to write compliant files. Apparently there are other netCDF implementations that also insist on padding with nulls.
Note: in general I would always add a unit test for bug fixes, but I'm not really sure how to do that in this case short of manually verifying the bytes for a full serialized file (which seems excessive). |
@shoyer Why not install netcdf from pypi and then use that to read the files for the test? |
This seems rather heavy-weight -- libnetcdf is a big binary dependency that can be tricky to install. Though I suppose if we make the test dependency optional it could still be a good way to pick-up incompatibilities. I think a better testing approach would be to hand-write a minimal netCDF file according to spec, and verify that scipy produces a bit-wise identical result. This shouldn't be too onerous: a minimal example with xarray is only 108 bytes: |
Not tricky to install at all: pip install netcdf4 Also there are no issues with including additional test dependencies, because the test will be skipped if the module cannot be imported. |
Okay, I suppose we do have binary wheels these days :). This still isn't the best test for this issue -- netCDF4 will read old files created by SciPy just fine, unless you are using libnetcdf=4.5.0. |
4faafd1
to
69b9e38
Compare
69b9e38
to
001db96
Compare
Thanks @shoyer. Test that was added LGTM. We have Might warrant backporting if that change to netCDF-C is not rolled back. Will merge in a day or so unless there are further comments. |
I might add this in another PR, but I'm not sure it's really worth the trouble. As far as we know, there are no netCDF applications that check trailing bytes on variables, and pre-filling variable data would entail a slight performance hit over just using
We are good here, the netCDF-C change was indeed rolled back. |
7dd4138
to
113f922
Compare
1eb975a
to
b9ba9c3
Compare
Test failures are real. |
b9ba9c3
to
c8dc558
Compare
Tests are passing now. As you can probably tell, I have not yet gone to the trouble of setting up a local development environment :). I realized that I could indeed check the For invalid fill values, I decided to use the default fill value for the data type instead. In principle, we might raise an error for this instead, but this should be the least disruptive change for scipy users. |
LGTM, thanks Stephan. |
I wonder if it is related to this issue but ncview will no longer open netcdf files created with scipy.io |
@bderembl I think this should only be a problem with netCDF-C version 4.5.0. If you're using either an older or newer release netCDF should be able to read these files. |
indeed. |
Per the netCDF spec, variable names and attributes should be padded with null
padding to the nearest 4-byte boundary, but scipy has been padding with b'0'
instead:
http://www.unidata.ucar.edu/software/netcdf/docs/file_format_specifications.html
This surfaced recently after the netCDF-C library added more stringent checks
in version 4.5.0 which rendered it unable to read files written with SciPy:
Unidata/netcdf-c#657
This change to netCDF-C is likely going to be rolled back, but it's still a
good idea for us to write compliant files. Apparently there are other netCDF
implementations that also insist on padding with nulls.