New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to do eager saving #138
Conversation
Codecov Report
@@ Coverage Diff @@
## main #138 +/- ##
==========================================
- Coverage 95.56% 95.48% -0.08%
==========================================
Files 11 11
Lines 2365 2372 +7
==========================================
+ Hits 2260 2265 +5
- Misses 105 107 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Here's a short scipt using Satpy directly that demonstrates the random crashes: #!/usr/bin/env python
import glob
import numpy as np
from satpy import Scene
from satpy.writers import compute_writer_results
COMPUTE = False
def main():
fnames = glob.glob(
"/home/lahtinep/data/satellite/polar/ears_pps/*48965.nc")
dsets = ['cma', 'ct', 'ctth_alti', 'ctth_pres', 'ctth_tempe']
glbl = Scene(reader='nwcsaf-pps_nc', filenames=fnames)
glbl.load(dsets)
dtype_int16 = {'dtype': np.int16}
encoding = {'cma': dtype_int16, 'ct': dtype_int16}
res = glbl.save_datasets(writer='cf', filename="/tmp/pps_test.nc",
encoding=encoding, include_lonlats=True, compute=COMPUTE)
if not COMPUTE:
compute_writer_results([res])
if __name__ == "__main__":
main() I'm trying to find an example that wouldn't require actual data. |
And the corresponding version as product_list:
output_dir:
/tmp/
fname_pattern:
"{start_time:%Y%m%d_%H%M}_{platform_name}_{areaname}_EARS_PPS.nc"
reader: nwcsaf-pps_nc
subscribe_topics:
- /test/ears/avhrr/pps/gatherer
eager_writing: True
areas:
null:
priority: 1
areaname: swath
products:
("cma", "ct", "ctth_alti", "ctth_pres", "ctth_tempe"):
formats:
- format: nc
writer: cf
encoding:
cma:
dtype: !!python/name:numpy.int16
ct:
dtype: !!python/name:numpy.int16
include_lonlats: True
workers:
- fun: !!python/name:trollflow2.plugins.create_scene
- fun: !!python/name:trollflow2.plugins.load_composites
- fun: !!python/name:trollflow2.plugins.resample
- fun: !!python/name:trollflow2.plugins.save_datasets |
Another pure-Satpy example using SEVIRI HRIT data: #!/usr/bin/env python
import glob
from satpy import Scene
from satpy.writers import compute_writer_results
COMPUTE = False
def main():
fnames = glob.glob(
"/home/lahtinep/data/satellite/geo/0deg/*202202031045*")
dsets = ['VIS006', 'VIS008']
glbl = Scene(reader='seviri_l1b_hrit', filenames=fnames)
glbl.load(dsets)
res = glbl.save_datasets(writer='cf', filename="/tmp/seviri_test.nc",
include_lonlats=True, compute=COMPUTE)
if not COMPUTE:
compute_writer_results([res])
if __name__ == "__main__":
main() |
UPDATED VERSION And a version that fails without any Pytroll code: #!/usr/bin/env python
import datetime as dt
import numpy as np
import dask.array as da
import xarray as xr
COMPUTE = False
FNAME = "/tmp/xr_test.nc"
def main():
y = np.arange(1000, dtype=np.uint16)
x = np.arange(2000, dtype=np.uint16)
now = dt.datetime.utcnow()
times = xr.DataArray(np.array([now + dt.timedelta(seconds=i) for i in range(y.size)], dtype=np.datetime64),
coords={'y': y})
# Write root
root = xr.Dataset({}, attrs={'global': 'attribute'})
written = [root.to_netcdf(FNAME, mode='w')]
# Write first dataset
data1 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset1 = xr.Dataset({'data1': data1})
written.append(dset1.to_netcdf(FNAME, mode='a', compute=COMPUTE))
# Write second dataset using the same time coordinates
data2 = xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset2 = xr.Dataset({'data2': data2})
written.append(dset2.to_netcdf(FNAME, mode='a', compute=COMPUTE))
if not COMPUTE:
da.compute(written)
if __name__ == "__main__":
main() |
Created an issue to XArray: pydata/xarray#6300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature. I think it just needs adding in the general documentation that this option will most likely slow down processing significantly.
I added a note to the example configuration file. This will most likely be a temporary solution until pydata/xarray#6300 is fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When saving multiple datasets to a single CF/NetCDF4 file using the syntax introduced in #51 , I got random crashes resulting in
RuntimeError: NetCDF: Not a valid ID
within the XArray library. Some internet searching suggested that this is due to trying to use dimensions that have not yet been defined.My solution to this is adding a config option that forces an eager saving instead of delaying the saving and calling
compute_writer_results()
afterwards. With this PR, I haven't seen this happen a single time over 50 consecutive runs.flake8 trollflow2