-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Recent versions of xarray have several regressions when dealing with zarrs by way of ZipStores. First, there is no longer transparent creation of a ZipStore-based zarr on ds.to_zarr():
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: xr.__version__
Out[2]: '2025.10.0'
In [3]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})
In [4]: out_store = ds.to_zarr('./in_2024_01_is_a.zarr.zip',compute=True,zarr_format=2,consolidated=True)
In [5]: out_store.close()
In [7]: !ls -aR ./in_2024_01_is_a.zarr.zip
./in_2024_01_is_a.zarr.zip:
. .. foo .zattrs .zgroup .zmetadata
./in_2024_01_is_a.zarr.zip/foo:
. .. 0 .zarray .zattrsSecond, when forcing use of a Zip output via zarr.storage.ZipStore, the to_zarr module writes the metadata files several times, leading to duplicate zip entries even without consolidated metadata (it's also present with):
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})
In [3]: zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)
In [5]: out_store = ds.to_zarr(zipstore,compute=True,zarr_format=2,consolidated=False)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zgroup'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zattrs'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zarray'
return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zattrs'
return self._open_to_write(zinfo, force_zip64=force_zip64)
In [6]: out_store.close()
In [8]: zipstore.close()
In [9]: !unzip -l zipstore.zarr.zip
Archive: zipstore.zarr.zip
Length Date Time Name
--------- ---------- ----- ----
22 10-07-2025 11:28 .zgroup
2 10-07-2025 11:28 .zattrs
22 10-07-2025 11:28 .zgroup
2 10-07-2025 11:28 .zattrs
292 10-07-2025 11:28 foo/.zarray
2 10-07-2025 11:28 foo/.zattrs
292 10-07-2025 11:28 foo/.zarray
42 10-07-2025 11:28 foo/.zattrs
40 10-07-2025 11:28 foo/0
--------- -------
716 9 filesFinally, once created a ZipStore can no longer be transparently read from load_dataset or open_zarr by path; it has to go via intermediate ZipStore.
In [1]: import xarray as xr; import zarr; import numpy as np
In [2]: ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2) # similar error with open_zarr
---------------------------------------------------------------------------
FileExistsError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2)
[...]
FileExistsError: [Errno 17] File exists: '/fs/site6/eccc/mrd/rpnatm/csu001/ppp6/nnja/zipstore.zarr.zip'All of this was apparently working in 2024.0.1, and I suspect the regressions happened around the time of Zarr3 adoption.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import numpy as np
import zarr
import os
import warnings
ds = xr.Dataset(data_vars = {'foo' : (('dim1',), np.arange(10))})
# Reproduction 1: .zarr.zip filename creates a directory
out_store = ds.to_zarr('test.zarr.zip',zarr_format=2)
out_store.close()
assert(not os.path.isdir('./test.zarr.zip'))
# Reproduction 2: duplicate metadata entries when writing with ZipStore
zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)
with warnings.catch_warnings():
warnings.simplefilter(action='error',category=UserWarning)
out_store = ds.to_zarr(zipstore,zarr_format=2,consolidated=False)
out_store.close()
zipstore.close()Steps to reproduce
No response
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-240.el8.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2025.10.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.7.0
distributed: None
matplotlib: 3.10.6
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None