-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Writing an array containing string coordinate values using the Numcodecs.Categorize filter succeeds, but Zarr group cannot be read back into a DataArray.
What did you expect to happen?
To get back the same array that I wrote.
Minimal Complete Verifiable Example
import numcodecs
import xarray as xr
da = xr.DataArray(coords={"x": ("x", np.array(["a", "b"], dtype=object))}, dims=("x",))
codec = numcodecs.Categorize(labels=["a", "b"], dtype=object)
encoding = {
"x": {
"filters": [codec],
},
}
da.to_zarr("/tmp/foo.zarr", mode="w", encoding=encoding)
da = xr.open_dataarray("/tmp/foo.zarr") # crashesMVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Traceback (most recent call last):
File "/home/yang.yang/foo.py", line 58, in <module>
da = xr.open_dataarray("/tmp/foo.zarr")
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 851, in open_dataarray
dataset = open_dataset(
filename_or_obj,
...<15 lines>...
**kwargs,
)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 670, in open_dataset
backend_ds = backend.open_dataset(
filename_or_obj,
...<2 lines>...
**kwargs,
)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 1524, in open_dataset
ds = store_entrypoint.open_dataset(
store,
...<6 lines>...
decode_timedelta=decode_timedelta,
)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/store.py", line 59, in open_dataset
ds = Dataset(vars, attrs=attrs)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/dataset.py", line 746, in __init__
variables, coord_names, dims, indexes, _ = merge_data_and_coords(
~~~~~~~~~~~~~~~~~~~~~^
data_vars, coords
^^^^^^^^^^^^^^^^^
)
^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/dataset.py", line 459, in merge_data_and_coords
return merge_core(
[data_vars, coords],
...<5 lines>...
skip_align_args=[1],
)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/merge.py", line 699, in merge_core
collected = collect_variables_and_indexes(aligned, indexes=indexes)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/merge.py", line 362, in collect_variables_and_indexes
idx, idx_vars = create_default_index_implicit(variable)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 1425, in create_default_index_implicit
index = PandasIndex.from_variables(dim_var, options={})
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 654, in from_variables
obj = cls(data, dim, coord_dtype=var.dtype)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 589, in __init__
index = safe_cast_to_index(array)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 469, in safe_cast_to_index
index = pd.Index(np.asarray(array), **kwargs)
~~~~~~~~~~^^^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexing.py", line 514, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
~~~~~~~~~~~~~~~~~~~^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/common.py", line 268, in get_duck_array
return self[key] # type: ignore[index]
~~~~^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 226, in __getitem__
return indexing.explicit_indexing_adapter(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
key, array.shape, indexing.IndexingSupport.VECTORIZED, method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexing.py", line 1018, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 216, in _getitem
return self._array[key]
~~~~~~~~~~~^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 797, in __getitem__
result = self.get_basic_selection(pure_selection, fields=fields)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 923, in get_basic_selection
return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 965, in _get_basic_selection_nd
return self._get_selection(indexer=indexer, out=out, fields=fields)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 1340, in _get_selection
self._chunk_getitems(
~~~~~~~~~~~~~~~~~~~~^
lchunk_coords,
^^^^^^^^^^^^^^
...<4 lines>...
fields=fields,
^^^^^^^^^^^^^^
)
^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2185, in _chunk_getitems
self._process_chunk(
~~~~~~~~~~~~~~~~~~~^
out,
^^^^
...<6 lines>...
partial_read_decode=partial_read_decode,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2098, in _process_chunk
chunk = self._decode_chunk(cdata)
File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2361, in _decode_chunk
chunk = f.decode(chunk)
File "numcodecs/vlen.pyx", line 141, in numcodecs.vlen.VLenUTF8.decode
File "/home/yang.yang/.venv/lib/python3.13/site-packages/numcodecs/compat.py", line 149, in ensure_contiguous_ndarray
ensure_contiguous_ndarray_like(buf, max_buffer_size=max_buffer_size, flatten=flatten)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang.yang/.venv/lib/python3.13/site-packages/numcodecs/compat.py", line 99, in ensure_contiguous_ndarray_like
raise TypeError("object arrays are not supported")
TypeError: object arrays are not supportedAnything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]
python-bits: 64
OS: Linux
OS-release: 6.8.0-1017-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.1.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None