New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve chunks in CF Writer #1254
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1254 +/- ##
=======================================
Coverage 90.51% 90.52%
=======================================
Files 228 228
Lines 33334 33377 +43
=======================================
+ Hits 30173 30214 +41
- Misses 3161 3163 +2
Continue to review full report at Codecov.
|
I just noticed that also coordinate variables like latitude or longitude can be chunked. |
c27beb2
to
1951de4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is sound, so I'm good with this. I have two comments about cleanliness though.
@@ -616,6 +632,7 @@ def save_datasets(self, datasets, filename=None, groups=None, header_attrs=None, | |||
root.attrs['Conventions'] = CF_VERSION | |||
|
|||
# Remove satpy-specific kwargs | |||
to_netcdf_kwargs = copy.deepcopy(to_netcdf_kwargs) # may contain dictionaries (encoding) | |||
satpy_kwargs = ['overlay', 'decorate', 'config_files'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't do this, but it is too bad we have to do this. I'm sure it will come to bite us in the future.
@sfinkens what is the status on this? Do you think you can have it ready by friday? |
@mraspaud Yes, I think that's possible! |
Replace DataArray.drop with .drop_vars
@mraspaud Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two small style comments inline.
This PR updates the encoding applied by the CF writer so that generated netCDF files are chunked like the original dask arrays. Chunks specified by the user via
encoding={'foo': {'chunksizes': (1024, 1024)}}
take precedence.I tested this with a LEO scene (
avhrr_gaclac_l1b
), a single geostationary scene (ahi_hsd
) and a timeseries ofahi_hsd
scenes (via Multiscene). The latter has some problems with time bounds (see #1242), but the chunking works fine.I also tested both the
netCDF4
and theh5netcdf
backend.Edit: I also removed the new
_satpy*
attributes before writing datasets to netcdf`.flake8 satpy