Skip to content

Dataset.to_zarr append_dim does not overwrite existing entries #10820

@rbuehler

Description

@rbuehler

What happened?

I have several dataset that are continuously updated in time. Adding them to the zarr file works well with the Dataset.to_zarr(append_dim="time", mode"a") option. However, often I do a reanalysis of already stored time bins and would like to overwrite the dataset for re-analysed times. This does not work for me, datasets are always appended, even if the time entry already exists.

What did you expect to happen?

I would have expected that the first existing dataset is overwritten (The doc says: 'option " “a” means override all existing variables including dimension coordinates (create if does not exist)' .
Perhaps it is intended this way to avoid accidental overwrites, nevertheless, it seems at odds to the documentation to me. Also, I did not find another way to implement my use case.

Minimal Complete Verifiable Example

import xarray as xr

# Create dummy datasets
ds = xr.Dataset()
ds["test"] = xr.DataArray(data=[0, 1], dims=["x"], coords={"x": [1, 2]})
ds = ds.expand_dims(time=[1])
ds2 = ds.copy()

# Write to zarr
test_file = "./test.zarr"
ds.to_zarr(test_file)
ds2.to_zarr(test_file, append_dim="time", mode="a")

# Read and print resulting file
ds3 = xr.open_zarr(test_file)
print(ds3)

Steps to reproduce

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-85-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development

xarray: 2025.9.1
pandas: 2.3.2
numpy: 2.0.1
scipy: 1.16.1
netCDF4: 1.7.2
pydap: 3.5.6
h5netcdf: 1.6.4
h5py: 3.14.0
zarr: 3.1.2
cftime: 1.6.4.post1
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2025.9.1
distributed: 2025.9.1
matplotlib: 3.9.2
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.2
fsspec: 2025.9.0
cupy: None
pint: None
sparse: 0.17.0
flox: 0.10.6
numpy_groupies: 0.11.3
setuptools: 78.1.1
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions