Is your feature request related to a problem?
xarray.backends.zarr.FillValueCoder.decode requires _FillValue attribute values on Zarr arrays to be in HDF5-style form (base64-encoded bytes for floats; specific encoded shapes for ints / strings). Zarr metadata is JSON-native, so the natural shape for any non-xarray Zarr writer is a plain JSON scalar, but that's rejected on read by xarray.
MVCE:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "numpy",
# "zarr>=3.0",
# "xarray",
# ]
# ///
import shutil
from pathlib import Path
import numpy as np
import xarray as xr
import zarr
path = Path("test.zarr")
if path.exists():
shutil.rmtree(path)
root = zarr.open_group(path, mode="w", zarr_format=3)
arr = root.create_array(
"data",
shape=(3,),
chunks=(3,),
dtype="float32",
fill_value=0.0,
dimension_names=["x"],
attributes={"_FillValue": 0.0},
)
arr[:] = np.array([1.0, 2.0, 3.0], dtype="float32")
xr.open_zarr(path, zarr_format=3, consolidated=False).load()
# TypeError: Failed to decode fill_value: expected str or bytes for dtype float32, got float
The same failure shape occurs for _FillValue: NaN, _FillValue: -1, etc. Integer dtypes hit a parallel code path with the same root cause. String dtypes (|S*, StringDType, kind O) raise ValueError: Failed to decode fill_value. Unsupported dtype ....
It's difficult to produce Xarray-compitible Zarr datasets that CF-style _FillValue using libraries other than Xarray.
Describe the solution you'd like
FillValueCoder.decode accepts JSON-native scalars in addition to the existing base64-encoded-bytes form, per dtype kind:
class FillValueCoder:
@classmethod
def decode(cls, value, dtype):
if value is None:
return None
# New: accept JSON-native scalars directly.
if dtype.kind in "iuf" and isinstance(value, (int, float)) and not isinstance(value, bool):
return np.asarray(value, dtype=dtype)[()]
if dtype.kind == "b" and isinstance(value, bool):
return np.asarray(value, dtype=dtype)[()]
if dtype.kind in "SU" and isinstance(value, str):
return np.asarray(value, dtype=dtype)[()]
# Existing: fall through to the HDF5-style base64-bytes path.
...
Decoding is the read path, so relaxing it is strictly additive. Files written by older xarray versions (base64-encoded _FillValue) continue to work unchanged and files written by other Zarr tools (JSON-native _FillValue) start to work. No existing reader behavior breaks.
Describe alternatives you've considered
- Symmetric encode change. Switch
FillValueCoder.encode to emit JSON-native scalars on the zarr backend too. Drawback: older xarray versions reading newer xarray-written files would break. Probably worth doing eventually but as a separate, gated change.
- Vendoring
FillValueCoder in each non-xarray producer. Documented in places already (e.g. virtualizarr's custom_parsers.md). Drawback: requires every Zarr writer to re-implement xarray's HDF5-style encoding; defeats the point of JSON-native metadata.
- Waiting for
Optional[T] data type. The long-term replacement for "missing data" semantics (zarr-extensions#33). Drawback: spec timeline; doesn't help users today.
Additional context
User reports of this exact problem:
This limitation of the current non-Zarr-native solution was flagged by @d-v-b.
Not in scope:
- Changing
_FillValue semantics (CF mask-and-scale vs Zarr storage default).
- The encode path (see alternative # 1).
- Compound / structured dtypes (separate issue).
Is your feature request related to a problem?
xarray.backends.zarr.FillValueCoder.decoderequires_FillValueattribute values on Zarr arrays to be in HDF5-style form (base64-encoded bytes for floats; specific encoded shapes for ints / strings). Zarr metadata is JSON-native, so the natural shape for any non-xarray Zarr writer is a plain JSON scalar, but that's rejected on read by xarray.MVCE:
The same failure shape occurs for
_FillValue: NaN,_FillValue: -1, etc. Integer dtypes hit a parallel code path with the same root cause. String dtypes (|S*,StringDType, kindO) raiseValueError: Failed to decode fill_value. Unsupported dtype ....It's difficult to produce Xarray-compitible Zarr datasets that CF-style
_FillValueusing libraries other than Xarray.Describe the solution you'd like
FillValueCoder.decodeaccepts JSON-native scalars in addition to the existing base64-encoded-bytes form, per dtype kind:Decoding is the read path, so relaxing it is strictly additive. Files written by older xarray versions (base64-encoded
_FillValue) continue to work unchanged and files written by other Zarr tools (JSON-native_FillValue) start to work. No existing reader behavior breaks.Describe alternatives you've considered
FillValueCoder.encodeto emit JSON-native scalars on the zarr backend too. Drawback: older xarray versions reading newer xarray-written files would break. Probably worth doing eventually but as a separate, gated change.FillValueCoderin each non-xarray producer. Documented in places already (e.g. virtualizarr'scustom_parsers.md). Drawback: requires every Zarr writer to re-implement xarray's HDF5-style encoding; defeats the point of JSON-native metadata.Optional[T]data type. The long-term replacement for "missing data" semantics (zarr-extensions#33). Drawback: spec timeline; doesn't help users today.Additional context
User reports of this exact problem:
_FillValue: null; xarray rejects on read._FillValue=-1hits the matching int-side assertion.This limitation of the current non-Zarr-native solution was flagged by @d-v-b.
Not in scope:
_FillValuesemantics (CF mask-and-scale vs Zarr storage default).