Skip to content

to_zarr with append can break existing dataset #10766

@d70-t

Description

@d70-t

What happened?

When appending an incomplete dataset to an existing zarr store, the existing zarr store can end up in a broken state, because not all dimension sizes of existing variables are updated consistently.

What did you expect to happen?

I would have expected that xarray doesn't break my existing dataset, i.e. either:

  • xarray should raise an exception when trying to append (such that the existing dataset doesn't break) or
  • appending ds2 should appropriately fill the missing array cells of the b variable with missing values

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.13"
# dependencies = [
#     "xarray",
#     "zarr",
# ]
# ///

import xarray as xr
xr.show_versions()

m = {}
ds1 = xr.Dataset({"a": (("x",), [1, 2]), "b": (("x",), [10, 20])})
ds2 = xr.Dataset({"a": (("x",), [3, 4])})
ds1.to_zarr(m)
ds2.to_zarr(m, append_dim="x")
xr.open_dataset(m, engine="zarr")

Steps to reproduce

uv run issue.py

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

ValueError: conflicting sizes for dimension 'x': length 2 on 'b' and length 4 on {'x': 'a'}

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.13.5 (main, Jun 11 2025, 15:36:57) [Clang 17.0.0 (clang-1700.0.13.3)] python-bits: 64 OS: Darwin OS-release: 24.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.3.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions