-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Saving a dataset or DataArray with .to_zarr() doesn't chunk file with zarr-python>=3 when I initialize array with np.zeros(), np.empty().
To reproduce, run the following.
#!/usr/bin/env python
import dask.array as da
import numpy as np
import xarray as xr
# Create a chunked DataArray and convert to Dataset
empty_ds = (
xr.DataArray(
np.zeros((2, 1000, 1000), dtype=np.float32), # not chunked in file tree
# np.ones((2, 1000, 1000), dtype=np.float32), # chunked in file tree
# np.random.random(2 * 1000 * 1000).reshape(2, 1000, 1000), # chunked in file tree
# np.empty((2, 1000, 1000), dtype=np.float32), # Depends on the state of memory at runtime
coords=[[100, 200], np.arange(1000), np.arange(1000)],
dims=['sensor', 'x', 'y'],
)
.chunk({'sensor': 1, 'x': 200, 'y': 200})
.to_dataset(name='value')
)
# Write to zarr
empty_ds.to_zarr(
"test.zarr",
mode='w',
)Output zarr structure is as follows.
$ eza --tree --classify=auto -h --long test.zarr/
Permissions Size User Date Modified Name
drwxr-xr-x - takahisa 21 Mar 12:28 test.zarr/
drwxr-xr-x - takahisa 21 Mar 12:28 ├── sensor/
drwxr-xr-x - takahisa 21 Mar 12:28 │ ├── c/
.rw-r--r-- 25 takahisa 21 Mar 12:28 │ │ └── 0
.rw-r--r-- 654 takahisa 21 Mar 12:28 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:28 ├── value/
.rw-r--r-- 758 takahisa 21 Mar 12:28 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:28 ├── x/
drwxr-xr-x - takahisa 21 Mar 12:28 │ ├── c/
.rw-r--r-- 1.6k takahisa 21 Mar 12:28 │ │ └── 0
.rw-r--r-- 655 takahisa 21 Mar 12:28 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:28 ├── y/
drwxr-xr-x - takahisa 21 Mar 12:28 │ ├── c/
.rw-r--r-- 1.6k takahisa 21 Mar 12:28 │ │ └── 0
.rw-r--r-- 655 takahisa 21 Mar 12:28 │ └── zarr.json
.rw-r--r-- 4.0k takahisa 21 Mar 12:28 └── zarr.jsonWhat did you expect to happen?
with np.ones((2, 1000, 1000), dtype=np.float32), output zarr file with zarr version 3 is as follows.
I expected the same file structures also from np.zeros() / np.empty().
Permissions Size User Date Modified Name
drwxr-xr-x - takahisa 21 Mar 12:30 test.zarr/
drwxr-xr-x - takahisa 21 Mar 12:30 ├── sensor/
drwxr-xr-x - takahisa 21 Mar 12:30 │ ├── c/
.rw-r--r-- 25 takahisa 21 Mar 12:30 │ │ └── 0
.rw-r--r-- 654 takahisa 21 Mar 12:30 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:30 ├── value/
drwxr-xr-x - takahisa 21 Mar 12:30 │ ├── c/
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ ├── 0/
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ │ ├── 0/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ │ ├── 1/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ │ ├── 2/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ │ ├── 3/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ │ └── 4/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ └── 1/
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ ├── 0/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ ├── 1/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ ├── 2/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ ├── 3/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ │ └── 4
drwxr-xr-x - takahisa 21 Mar 12:30 │ │ └── 4/
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ ├── 0
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ ├── 1
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ ├── 2
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ ├── 3
.rw-r--r-- 35 takahisa 21 Mar 12:30 │ │ └── 4
.rw-r--r-- 758 takahisa 21 Mar 12:30 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:30 ├── x/
drwxr-xr-x - takahisa 21 Mar 12:30 │ ├── c/
.rw-r--r-- 1.6k takahisa 21 Mar 12:30 │ │ └── 0
.rw-r--r-- 655 takahisa 21 Mar 12:30 │ └── zarr.json
drwxr-xr-x - takahisa 21 Mar 12:30 ├── y/
drwxr-xr-x - takahisa 21 Mar 12:30 │ ├── c/
.rw-r--r-- 1.6k takahisa 21 Mar 12:30 │ │ └── 0
.rw-r--r-- 655 takahisa 21 Mar 12:30 │ └── zarr.json
.rw-r--r-- 4.0k takahisa 21 Mar 12:30 └── zarr.jsonMinimal Complete Verifiable Example
Run the code above in the environment with zarr-python 3.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS ------------------ commit: None python: 3.12.9 | packaged by conda-forge | (main, Mar 4 2025, 22:48:41) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 5.15.167.4-microsoft-standard-WSL2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: Nonexarray: 2025.3.0
pandas: 2.2.3
numpy: 2.2.4
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.5
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.2.0
distributed: 2025.2.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.2
pip: None
conda: None
pytest: None
mypy: None
IPython: 9.0.2
sphinx: None