Skip to content

to_zarr() ignore chunks with zarr-python3 #10117

@amano-takahisa

Description

@amano-takahisa

What happened?

Saving a dataset or DataArray with .to_zarr() doesn't chunk file with zarr-python>=3 when I initialize array with np.zeros(), np.empty().

To reproduce, run the following.

#!/usr/bin/env python
import dask.array as da
import numpy as np
import xarray as xr

# Create a chunked DataArray and convert to Dataset
empty_ds = (
    xr.DataArray(
        np.zeros((2, 1000, 1000), dtype=np.float32),               # not chunked in file tree
        # np.ones((2, 1000, 1000), dtype=np.float32),                # chunked in file tree
        # np.random.random(2 * 1000 * 1000).reshape(2, 1000, 1000),  # chunked in file tree
        # np.empty((2, 1000, 1000), dtype=np.float32),               # Depends on the state of memory at runtime
        coords=[[100, 200], np.arange(1000), np.arange(1000)],
        dims=['sensor', 'x', 'y'],
    )
    .chunk({'sensor': 1, 'x': 200, 'y': 200})
    .to_dataset(name='value')
)

# Write to zarr
empty_ds.to_zarr(
    "test.zarr",
    mode='w',
)

Output zarr structure is as follows.

$ eza --tree --classify=auto -h --long test.zarr/
Permissions Size User     Date Modified Name
drwxr-xr-x     - takahisa 21 Mar 12:28  test.zarr/
drwxr-xr-x     - takahisa 21 Mar 12:28  ├── sensor/
drwxr-xr-x     - takahisa 21 Mar 12:28  │   ├── c/
.rw-r--r--    25 takahisa 21 Mar 12:28  │   │   └── 0
.rw-r--r--   654 takahisa 21 Mar 12:28  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:28  ├── value/
.rw-r--r--   758 takahisa 21 Mar 12:28  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:28  ├── x/
drwxr-xr-x     - takahisa 21 Mar 12:28  │   ├── c/
.rw-r--r--  1.6k takahisa 21 Mar 12:28  │   │   └── 0
.rw-r--r--   655 takahisa 21 Mar 12:28  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:28  ├── y/
drwxr-xr-x     - takahisa 21 Mar 12:28  │   ├── c/
.rw-r--r--  1.6k takahisa 21 Mar 12:28  │   │   └── 0
.rw-r--r--   655 takahisa 21 Mar 12:28  │   └── zarr.json
.rw-r--r--  4.0k takahisa 21 Mar 12:28  └── zarr.json

What did you expect to happen?

with np.ones((2, 1000, 1000), dtype=np.float32), output zarr file with zarr version 3 is as follows.
I expected the same file structures also from np.zeros() / np.empty().

Permissions Size User     Date Modified Name
drwxr-xr-x     - takahisa 21 Mar 12:30  test.zarr/
drwxr-xr-x     - takahisa 21 Mar 12:30  ├── sensor/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   ├── c/
.rw-r--r--    25 takahisa 21 Mar 12:30  │   │   └── 0
.rw-r--r--   654 takahisa 21 Mar 12:30  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:30  ├── value/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   ├── c/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   ├── 0/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   │   ├── 0/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   │   ├── 1/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   │   ├── 2/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   │   ├── 3/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │   │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   │   └── 4/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │       ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │       ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │       ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │       ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │   │       └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │   └── 1/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │       ├── 0/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │       ├── 1/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │       ├── 2/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │       ├── 3/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │       │   └── 4
drwxr-xr-x     - takahisa 21 Mar 12:30  │   │       └── 4/
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │           ├── 0
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │           ├── 1
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │           ├── 2
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │           ├── 3
.rw-r--r--    35 takahisa 21 Mar 12:30  │   │           └── 4
.rw-r--r--   758 takahisa 21 Mar 12:30  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:30  ├── x/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   ├── c/
.rw-r--r--  1.6k takahisa 21 Mar 12:30  │   │   └── 0
.rw-r--r--   655 takahisa 21 Mar 12:30  │   └── zarr.json
drwxr-xr-x     - takahisa 21 Mar 12:30  ├── y/
drwxr-xr-x     - takahisa 21 Mar 12:30  │   ├── c/
.rw-r--r--  1.6k takahisa 21 Mar 12:30  │   │   └── 0
.rw-r--r--   655 takahisa 21 Mar 12:30  │   └── zarr.json
.rw-r--r--  4.0k takahisa 21 Mar 12:30  └── zarr.json

Minimal Complete Verifiable Example

Run the code above in the environment with zarr-python 3.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Anything else we need to know?

No response

Environment

Details INSTALLED VERSIONS ------------------ commit: None python: 3.12.9 | packaged by conda-forge | (main, Mar 4 2025, 22:48:41) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 5.15.167.4-microsoft-standard-WSL2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 2025.3.0
pandas: 2.2.3
numpy: 2.2.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.5
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.2.0
distributed: 2025.2.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.2
pip: None
conda: None
pytest: None
mypy: None
IPython: 9.0.2
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions