You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Saving xr.Dataset() lazily to NetCDF4 (dset.to_netcdf(..., compute=False)) fails seemingly randomly if an array is used either as a coordinate to multiple variables, or saved with different names as standalone variable. The trace I get is shown below in the log section.
What did you expect to happen?
The saving should work consistently between different runs.
Minimal Complete Verifiable Example
#!/usr/bin/env pythonimportdatetimeasdtimportnumpyasnpimportdask.arrayasdaimportxarrayasxrCOMPUTE=FalseFNAME="xr_test.nc"defmain():
y=np.arange(1000, dtype=np.uint16)
x=np.arange(2000, dtype=np.uint16)
# Create a time array that is used as a Y-coordinate for the datanow=dt.datetime.utcnow()
time_arr=np.array([now+dt.timedelta(seconds=i) foriinrange(y.size)], dtype=np.datetime64)
times=xr.DataArray(time_arr, coords={'y': y})
# Write rootroot=xr.Dataset({}, attrs={'global': 'attribute'})
written= [root.to_netcdf(FNAME, mode='w')]
# Write first datasetdata1=xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset1=xr.Dataset({'data1': data1})
written.append(dset1.to_netcdf(FNAME, mode='a', compute=COMPUTE))
# Write second dataset using the same time coordinatesdata2=xr.DataArray(da.random.random((y.size, x.size)), dims=['y', 'x'],
coords={'y': y, 'x': x, 'time': times})
dset2=xr.Dataset({'data2': data2})
written.append(dset2.to_netcdf(FNAME, mode='a', compute=COMPUTE))
ifnotCOMPUTE:
da.compute(written)
if__name__=="__main__":
main()
The above script fails randomly, thus it should be run several times. Out of ten runs I got the trace twice. If COMPUTE = True, the script works every time (after ~100 tries, at least).
The same behaviour is seen if the time coordinates are removed completely and data1 is used also in dset2 in place of data2.
Environment
INSTALLED VERSIONS
commit: None
python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.13.0-30-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
What happened?
Saving
xr.Dataset()
lazily to NetCDF4 (dset.to_netcdf(..., compute=False)
) fails seemingly randomly if an array is used either as a coordinate to multiple variables, or saved with different names as standalone variable. The trace I get is shown below in the log section.What did you expect to happen?
The saving should work consistently between different runs.
Minimal Complete Verifiable Example
Relevant log output
Anything else we need to know?
The above script fails randomly, thus it should be run several times. Out of ten runs I got the trace twice. If
COMPUTE = True
, the script works every time (after ~100 tries, at least).The same behaviour is seen if the
time
coordinates are removed completely anddata1
is used also indset2
in place ofdata2
.Environment
INSTALLED VERSIONS
commit: None
python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.13.0-30-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.20.2
pandas: 1.3.5
numpy: 1.22.0
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: 0.13.0
h5py: 3.6.0
Nio: None
zarr: 2.10.3
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.01.0
distributed: 2022.01.0
matplotlib: 3.5.1
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: None
sparse: None
setuptools: 59.8.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 8.0.0
sphinx: 4.3.2
The text was updated successfully, but these errors were encountered: