Skip to content

zarr-ZipStore problems: no filename support, duplicate metadata on ZipStore-write, can't read from ZipStore via filename #10827

@csubich

Description

@csubich

What happened?

Recent versions of xarray have several regressions when dealing with zarrs by way of ZipStores. First, there is no longer transparent creation of a ZipStore-based zarr on ds.to_zarr():

In [1]: import xarray as xr; import zarr; import numpy as np

In [2]: xr.__version__
Out[2]: '2025.10.0'

In [3]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})

In [4]: out_store = ds.to_zarr('./in_2024_01_is_a.zarr.zip',compute=True,zarr_format=2,consolidated=True)

In [5]: out_store.close()

In [7]: !ls -aR ./in_2024_01_is_a.zarr.zip
./in_2024_01_is_a.zarr.zip:
.  ..  foo  .zattrs  .zgroup  .zmetadata

./in_2024_01_is_a.zarr.zip/foo:
.  ..  0  .zarray  .zattrs

Second, when forcing use of a Zip output via zarr.storage.ZipStore, the to_zarr module writes the metadata files several times, leading to duplicate zip entries even without consolidated metadata (it's also present with):

In [1]: import xarray as xr; import zarr; import numpy as np

In [2]: ds = xr.Dataset(data_vars={'foo' : np.arange(3)})

In [3]: zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)

In [5]: out_store = ds.to_zarr(zipstore,compute=True,zarr_format=2,consolidated=False)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zgroup'
  return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: '.zattrs'
  return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zarray'
  return self._open_to_write(zinfo, force_zip64=force_zip64)
/home/csu001/data/ppp5/conda_env/nnja/lib/python3.12/zipfile/__init__.py:1611: UserWarning: Duplicate name: 'foo/.zattrs'
  return self._open_to_write(zinfo, force_zip64=force_zip64)
In [6]: out_store.close()

In [8]: zipstore.close()

In [9]: !unzip -l zipstore.zarr.zip
Archive:  zipstore.zarr.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       22  10-07-2025 11:28   .zgroup
        2  10-07-2025 11:28   .zattrs
       22  10-07-2025 11:28   .zgroup
        2  10-07-2025 11:28   .zattrs
      292  10-07-2025 11:28   foo/.zarray
        2  10-07-2025 11:28   foo/.zattrs
      292  10-07-2025 11:28   foo/.zarray
       42  10-07-2025 11:28   foo/.zattrs
       40  10-07-2025 11:28   foo/0
---------                     -------
      716                     9 files

Finally, once created a ZipStore can no longer be transparently read from load_dataset or open_zarr by path; it has to go via intermediate ZipStore.

In [1]: import xarray as xr; import zarr; import numpy as np

In [2]: ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2) # similar error with open_zarr
---------------------------------------------------------------------------
FileExistsError                           Traceback (most recent call last)
Cell In[2], line 1
----> 1 ds = xr.load_dataset('./zipstore.zarr.zip',engine='zarr',zarr_format=2)
[...]
FileExistsError: [Errno 17] File exists: '/fs/site6/eccc/mrd/rpnatm/csu001/ppp6/nnja/zipstore.zarr.zip'

All of this was apparently working in 2024.0.1, and I suspect the regressions happened around the time of Zarr3 adoption.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

import xarray as xr
xr.show_versions()
# your reproducer code ...

import numpy as np
import zarr
import os
import warnings

ds = xr.Dataset(data_vars = {'foo' : (('dim1',), np.arange(10))})

# Reproduction 1: .zarr.zip filename creates a directory

out_store = ds.to_zarr('test.zarr.zip',zarr_format=2)
out_store.close()
assert(not os.path.isdir('./test.zarr.zip'))

# Reproduction 2: duplicate metadata entries when writing with ZipStore

zipstore = zarr.storage.ZipStore('zipstore.zarr.zip',mode='w',read_only=False)
with warnings.catch_warnings():
    warnings.simplefilter(action='error',category=UserWarning)
    out_store = ds.to_zarr(zipstore,zarr_format=2,consolidated=False)
    out_store.close()
    zipstore.close()

Steps to reproduce

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-240.el8.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2025.10.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.2
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.7.0
distributed: None
matplotlib: 3.10.6
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions