What happened?
If a dataset is missing a index, it can use the index from further up the datatree.
If a dataset and it's ancestors are missing an index, then an index will be created from the datasets coordinates, but not from ancestors.
Together, these create an inconsistency. It would be more logical that if an index cannot be found, then a coordinate is searched for using the same logic.
As it is, you can get different behaviour for create_default_indices=True/False, when I feel the intention was that it just disables eager loading.
What did you expect to happen?
I would expect that open_datatree(create_default_indexes=False) and open_datatree(create_default_indexes=True) give similar, differing only in the timing of when an index is saved, and if it is persisted.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import xarray as xr
child = xr.Dataset(
data_vars={
"a": (["x"], [0, 1]),
},
)
dt = xr.DataTree(
dataset=xr.Dataset(
data_vars={
"b": (["x"], [5, 6]),
},
coords={
"x": [10, 20],
},
),
children={"child": xr.DataTree(dataset=child)}
)
dt.to_zarr("test.zarr", mode="w")
dt2 = xr.open_datatree("test.zarr", create_default_indexes=True)
dt2.b.sel(x=10).values # 5
dt2.child.a.sel(x=10).values # 0
dt2 = xr.open_datatree("test.zarr", create_default_indexes=False)
dt2.b.sel(x=10).values # 5
dt2.child.a.sel(x=10).values # BoundsCheckError
dt2.ds = dt2.ds.assign_coords(xr.Coordinates({'x': [10, 20]}))
dt2.child.a.sel(x=10).values # 0
Steps to reproduce
No response
MVCE confirmation
Relevant log output
last):
File "/root/test/test_script.py", line 48, in <module>
dt2.child.a.sel(x=10).values # BoundsCheckError
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/dataarray.py", line 803, in values
return self.variable.values
^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/variable.py", line 555, in values
return _as_array_or_item(self._data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/variable.py", line 335, in _as_array_or_item
data = np.asarray(data)
^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/indexing.py", line 604, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/indexing.py", line 970, in get_duck_array
duck_array = self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/indexing.py", line 924, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/indexing.py", line 764, in get_duck_array
array = self.array[self.key]
~~~~~~~~~~^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/backends/zarr.py", line 316, in __getitem__
return indexing.explicit_indexing_adapter(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/core/indexing.py", line 1156, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/xarray/backends/zarr.py", line 279, in _getitem
return self._array[key]
~~~~~~~~~~~^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/zarr/core/array.py", line 2832, in __getitem__
return self.get_orthogonal_selection(pure_selection, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/zarr/core/array.py", line 3302, in get_orthogonal_selection
indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/zarr/core/indexing.py", line 920, in __init__
dim_indexer = IntDimIndexer(dim_sel, dim_len, dim_chunk_len)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/zarr/core/indexing.py", line 387, in __init__
object.__setattr__(self, "dim_sel", normalize_integer_selection(dim_sel, dim_len))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.uv/cache/environments-v2/test-script-6aac3b5ed3a4de9a/lib/python3.12/site-packages/zarr/core/indexing.py", line 355, in normalize_integer_selection
raise BoundsCheckError(msg)
zarr.errors.BoundsCheckError: index out of bounds for dimension with length 2
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.11 (main, Mar 17 2025, 21:02:09) [Clang 20.1.0 ]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2026.4.0
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: None
iris: None
bottleneck: None
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 82.0.1
pip: 26.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 9.10.1
sphinx: 7.3.7
What happened?
If a dataset is missing a index, it can use the index from further up the datatree.
If a dataset and it's ancestors are missing an index, then an index will be created from the datasets coordinates, but not from ancestors.
Together, these create an inconsistency. It would be more logical that if an index cannot be found, then a coordinate is searched for using the same logic.
As it is, you can get different behaviour for create_default_indices=True/False, when I feel the intention was that it just disables eager loading.
What did you expect to happen?
I would expect that open_datatree(create_default_indexes=False) and open_datatree(create_default_indexes=True) give similar, differing only in the timing of when an index is saved, and if it is persisted.
Minimal Complete Verifiable Example
Steps to reproduce
No response
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS ------------------ commit: None python: 3.11.11 (main, Mar 17 2025, 21:02:09) [Clang 20.1.0 ] python-bits: 64 OS: Linux OS-release: 5.15.0-139-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.6 libnetcdf: 4.9.3xarray: 2026.4.0
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: None
iris: None
bottleneck: None
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 82.0.1
pip: 26.1
conda: None
pytest: 7.4.3
mypy: None
IPython: 9.10.1
sphinx: 7.3.7