Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_zarr with region and write_empty_chunks keywords raises ReadOnlyError if using consolidated metadata #8323

Closed
5 tasks done
Metamess opened this issue Oct 17, 2023 · 1 comment · Fixed by #8326
Closed
5 tasks done
Labels
bug topic-zarr Related to zarr storage library

Comments

@Metamess
Copy link
Contributor

Metamess commented Oct 17, 2023

What happened?

When using Dataset.to_zarr with the region keyword provided, if the keyword write_empty_chunks is set to False (or True), the operation will fail with a ReadOnlyError: object is read-only. This occurs when keyword consolidated is not provided (and thus defaults to None) or if it is set to True. It does not happen when consolidated=False is provided.

From what I can find, while the error stems from the zarr library, the root cause of this bug occurs in line 677 of xarray/backends/zarr.py. Basically, a new zarr Array is created with store=self.zarr_group.store (which is a read-only zarr.storage.ConsolidatedMetadataStore instance when using consolidated metadata), instead of store=self.zarr_group.chunk_store (which is a normal zarr.storage.DirectoryStore instance), which is effectively the instance that is used in all other cases.

Replacing store=self.zarr_group.store with store=self.zarr_group.chunk_store seems to solve the issue.

What did you expect to happen?

I expected not to get an error and instead to write the data to the given zarr region, whilst respecting the write_empty_chunks=False behavior.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

store_path = "test.zarr"
# First, create an empty Zarr store using compute=False
ds = xr.Dataset(coords={"x": range(1000), "y": range(2000)}, data_vars={"foo": (("x", "y"), np.zeros((1000, 2000), dtype="float32"))})
ds.to_zarr(store_path, compute=False)

# Attempts to write to the zarr using 'region' and 'write_empty_chunks=False'
region = {"x": slice(0, 1000), "y": slice(0, 2000)}
# Without using consolidated metadata works:
ds.to_zarr(store_path, region=region, write_empty_chunks=False, consolidated=False)
# But when using consolidated metadata, it raises a ReadOnlyError
ds.to_zarr(store_path, region=region, write_empty_chunks=False)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

ReadOnlyError                             Traceback (most recent call last)
<ipython-input-2-9da43054efc1> in <module>
----> 1 ds.to_zarr(store_path, region=region, write_empty_chunks=False)

venv/lib/python3.10/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   2472         from xarray.backends.api import to_zarr
   2473
-> 2474         return to_zarr(  # type: ignore[call-overload,misc]
   2475             self,
   2476             store=store,

venv/lib/python3.10/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, write_empty_chunks, chunkmanager_store_kwargs)
   1704     writer = ArrayWriter()
   1705     # TODO: figure out how to properly handle unlimited_dims
-> 1706     dump_to_store(dataset, zstore, writer, encoding=encoding)
   1707     writes = writer.sync(
   1708         compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs

venv/lib/python3.10/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1306         variables, attrs = encoder(variables, attrs)
   1307
-> 1308     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1309
   1310

venv/lib/python3.10/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    627             self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims)
    628
--> 629         self.set_variables(
    630             variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims
    631         )

venv/lib/python3.10/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    732
    733             region = tuple(write_region[dim] for dim in dims)
--> 734             writer.add(v.data, zarr_array, region)
    735
    736     def close(self):

venv/lib/python3.10/site-packages/xarray/backends/common.py in add(self, source, target, region)
    239         else:
    240             if region:
--> 241                 target[region] = source
    242             else:
    243                 target[...] = source

venv/lib/python3.10/site-packages/zarr/core.py in __setitem__(self, selection, value)
   1493             self.vindex[selection] = value
   1494         elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
-> 1495             self.set_orthogonal_selection(pure_selection, value, fields=fields)
   1496         else:
   1497             self.set_basic_selection(pure_selection, value, fields=fields)

venv/lib/python3.10/site-packages/zarr/core.py in set_orthogonal_selection(self, selection, value, fields)
   1682         indexer = OrthogonalIndexer(selection, self)
   1683
-> 1684         self._set_selection(indexer, value, fields=fields)
   1685
   1686     def set_coordinate_selection(self, selection, value, fields=None):

venv/lib/python3.10/site-packages/zarr/core.py in _set_selection(self, indexer, value, fields)
   2036
   2037                 # put data
-> 2038                 self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
   2039         else:
   2040             lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)

venv/lib/python3.10/site-packages/zarr/core.py in _chunk_setitem(self, chunk_coords, chunk_selection, value, fields)
   2302
   2303         with lock:
-> 2304             self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, fields=fields)
   2305
   2306     def _chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields=None):

venv/lib/python3.10/site-packages/zarr/core.py in _chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields)
   2312             self._chunk_delitem(ckey)
   2313         else:
-> 2314             self.chunk_store[ckey] = self._encode_chunk(cdata)
   2315
   2316     def _process_for_setitem(self, ckey, chunk_selection, value, fields=None):

venv/lib/python3.10/site-packages/zarr/storage.py in __setitem__(self, key, value)
   2973
   2974     def __setitem__(self, key, value):
-> 2975         raise ReadOnlyError()
   2976
   2977     def getsize(self, path):

ReadOnlyError: object is read-only

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.90.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.9.0
pandas: 2.1.1
numpy: 1.25.2
scipy: 1.11.2
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.9.3
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.6.0
pip: 23.3
conda: None
pytest: 7.4.2
mypy: 1.5.1
IPython: 7.31.1
sphinx: None

@Metamess Metamess added bug needs triage Issue that has not been reviewed by xarray team member labels Oct 17, 2023
@dcherian dcherian added topic-zarr Related to zarr storage library and removed needs triage Issue that has not been reviewed by xarray team member labels Oct 17, 2023
@alexamici
Copy link
Collaborator

@Metamess thanks for the analysis and the work-around! I was hitting the same bug and your high quality issue saved my day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants