Skip to content

Cannot round-trip DataArray with string coordinates using numcodecs.Categorize filter #9863

@y4n9squared

Description

@y4n9squared

What happened?

Writing an array containing string coordinate values using the Numcodecs.Categorize filter succeeds, but Zarr group cannot be read back into a DataArray.

What did you expect to happen?

To get back the same array that I wrote.

Minimal Complete Verifiable Example

import numcodecs
import xarray as xr

da = xr.DataArray(coords={"x": ("x", np.array(["a", "b"], dtype=object))}, dims=("x",))
codec = numcodecs.Categorize(labels=["a", "b"], dtype=object)
encoding = {
    "x": {
        "filters": [codec],
    },
}

da.to_zarr("/tmp/foo.zarr", mode="w", encoding=encoding)
da = xr.open_dataarray("/tmp/foo.zarr")  # crashes

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/home/yang.yang/foo.py", line 58, in <module>
    da = xr.open_dataarray("/tmp/foo.zarr")
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 851, in open_dataarray
    dataset = open_dataset(
        filename_or_obj,
    ...<15 lines>...
        **kwargs,
    )
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/api.py", line 670, in open_dataset
    backend_ds = backend.open_dataset(
        filename_or_obj,
    ...<2 lines>...
        **kwargs,
    )
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 1524, in open_dataset
    ds = store_entrypoint.open_dataset(
        store,
    ...<6 lines>...
        decode_timedelta=decode_timedelta,
    )
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/store.py", line 59, in open_dataset
    ds = Dataset(vars, attrs=attrs)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/dataset.py", line 746, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
                                               ~~~~~~~~~~~~~~~~~~~~~^
        data_vars, coords
        ^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/dataset.py", line 459, in merge_data_and_coords
    return merge_core(
        [data_vars, coords],
    ...<5 lines>...
        skip_align_args=[1],
    )
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/merge.py", line 699, in merge_core
    collected = collect_variables_and_indexes(aligned, indexes=indexes)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/merge.py", line 362, in collect_variables_and_indexes
    idx, idx_vars = create_default_index_implicit(variable)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 1425, in create_default_index_implicit
    index = PandasIndex.from_variables(dim_var, options={})
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 654, in from_variables
    obj = cls(data, dim, coord_dtype=var.dtype)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 589, in __init__
    index = safe_cast_to_index(array)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexes.py", line 469, in safe_cast_to_index
    index = pd.Index(np.asarray(array), **kwargs)
                     ~~~~~~~~~~^^^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexing.py", line 514, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
                      ~~~~~~~~~~~~~~~~~~~^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/common.py", line 268, in get_duck_array
    return self[key]  # type: ignore[index]
           ~~~~^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 226, in __getitem__
    return indexing.explicit_indexing_adapter(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        key, array.shape, indexing.IndexingSupport.VECTORIZED, method
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/core/indexing.py", line 1018, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/xarray/backends/zarr.py", line 216, in _getitem
    return self._array[key]
           ~~~~~~~~~~~^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 797, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 923, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 965, in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 1340, in _get_selection
    self._chunk_getitems(
    ~~~~~~~~~~~~~~~~~~~~^
        lchunk_coords,
        ^^^^^^^^^^^^^^
    ...<4 lines>...
        fields=fields,
        ^^^^^^^^^^^^^^
    )
    ^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2185, in _chunk_getitems
    self._process_chunk(
    ~~~~~~~~~~~~~~~~~~~^
        out,
        ^^^^
    ...<6 lines>...
        partial_read_decode=partial_read_decode,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2098, in _process_chunk
    chunk = self._decode_chunk(cdata)
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/zarr/core.py", line 2361, in _decode_chunk
    chunk = f.decode(chunk)
  File "numcodecs/vlen.pyx", line 141, in numcodecs.vlen.VLenUTF8.decode
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/numcodecs/compat.py", line 149, in ensure_contiguous_ndarray
    ensure_contiguous_ndarray_like(buf, max_buffer_size=max_buffer_size, flatten=flatten)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang.yang/.venv/lib/python3.13/site-packages/numcodecs/compat.py", line 99, in ensure_contiguous_ndarray_like
    raise TypeError("object arrays are not supported")
TypeError: object arrays are not supported

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.13.0 (main, Oct 16 2024, 03:23:02) [Clang 18.1.8 ]
python-bits: 64
OS: Linux
OS-release: 6.8.0-1017-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.1.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions