Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot properly .close() a dataset opened with chunks argument? #2862

Open
lorenzori opened this issue Apr 1, 2019 · 2 comments
Open

cannot properly .close() a dataset opened with chunks argument? #2862

lorenzori opened this issue Apr 1, 2019 · 2 comments

Comments

@lorenzori
Copy link

lorenzori commented Apr 1, 2019

I want to do operations on a copy of a dataset and then overwrite the NetCDF it was read from:

f = xr.open_dataset('dataset.nc')
n = f.copy(deep=True)
f.close()
n.to_netcdf(path='dataset.nc')

Problem description

The above works, however if I use the chunks argument while opening the dataset, a KeyError: 'tcw' is thrown. The NetCDF on disk is also corrupted. It happens both with deep=True or deep=False .

Expected Output

Although not an expert of Dask, it kind of makes sense that close doesn't really close when evaluated lazily if I still need things from it afterwards, so maybe that is the correct behaviour and we should just handle the exception ? or should I compute() after closing? really not sure.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None

xarray: 0.11.0
pandas: 0.24.1
numpy: 1.15.4
scipy: None
netCDF4: 1.4.2
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: 1.0.13
iris: None
bottleneck: None
cyordereddict: None
dask: 1.1.4
distributed: 1.26.0
matplotlib: None
cartopy: None
seaborn: None
setuptools: 40.7.3
pip: 19.0.1
conda: None
pytest: 4.2.1
IPython: 7.2.0
sphinx: None

@shoyer
Copy link
Member

shoyer commented Apr 1, 2019

Something like this should definitely work:

f = xr.open_dataset('dataset.nc')
n = f.compute()
f.close()
n.to_netcdf(path='dataset.nc')

Deep copying maintains dask arrays, so they are still linked to the original file on disk. If you close that file, then dask is definitely going to error when you attempt to use it. I agree that there is an opportunity for better error messages here, though.

@lorenzori
Copy link
Author

correct, that does the trick. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants