# Lack of resilience towards missing `_ARRAY_DIMENSIONS` xarray's special Zarr attribute 

In [1]:
from pathlib import Path
import json
from typing import Any

import numpy as np
import xarray as xr

## \_Utilities

_This section only declares utilities functions and do not contain any additional value for the reader_


In [2]:
# Set to True to get rich HTML representations in an interactive Notebook session
# Set to False to get textual representations ready to be converted to markdown for issue report

INTERACTIVE = False

# Convert to markdown with
# jupyter nbconvert --to markdown notebooks/datatree-zarr.ipynb

In [3]:
def show(obj: Any) -> Any:
    if isinstance(obj, Path):
        if INTERACTIVE:
            return obj.resolve()
        else:
            print(obj)
    else:
        if INTERACTIVE:
            return obj
        else:
            print(obj)


def load_json(path: Path) -> dict:
    with open(path, encoding="utf-8") as fp:
        return json.load(fp)

## Data Creation

I create a dummy Dataset containing a single `(label, z)`-dimensional DataArray named `my_xda`.


In [4]:
xda = xr.DataArray(
    np.arange(3 * 18).reshape(3, 18),
    coords={"label": list("abc"), "z": list(range(18))},
)
xda = xda.chunk({"label": 2, "z": 4})
show(xda)

<xarray.DataArray (label: 3, z: 18)>
dask.array<xarray-<this-array>, shape=(3, 18), dtype=int64, chunksize=(2, 4), chunktype=numpy.ndarray>
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17


In [5]:
xds = xr.Dataset({"my_xda": xda})
show(xds)

<xarray.Dataset>
Dimensions:  (label: 3, z: 18)
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Data variables:
    my_xda   (label, z) int64 dask.array<chunksize=(2, 4), meta=np.ndarray>


## Data Writing

I persist the Dataset to Zarr


In [6]:
zarr_path = Path() / "../generated/zarrounet.zarr"
xds.to_zarr(zarr_path, mode="w")
show(zarr_path)

../generated/zarrounet.zarr


## Data Initial Reading

I read successfully the Dataset


In [7]:
show(xr.open_zarr(zarr_path).my_xda)

<xarray.DataArray 'my_xda' (label: 3, z: 18)>
dask.array<open_dataset-my_xda, shape=(3, 18), dtype=int64, chunksize=(2, 4), chunktype=numpy.ndarray>
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17


## Data Alteration


Then, I alter the Zarr by removing successively all of the `_ARRAY_DIMENSIONS` from all of the variables' `.zattrs`: `z`, `label`, `my_xda`, and try to reopen the Zarr. It is in all cases a success. ✔️


In [8]:
# corrupt the variables' `_ARRAY_DIMENSIONS` xarray's attribute
for varname in ("z/.zattrs", "label/.zattrs", "my_xda/.zattrs"):
    zattrs_path = zarr_path / varname
    assert zattrs_path.is_file()
    zattrs_path.write_text("{}")

# Note: it has no impact, only the root .zmetdata seems to be used

In [9]:
show(xr.open_zarr(zarr_path))

<xarray.Dataset>
Dimensions:  (label: 3, z: 18)
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Data variables:
    my_xda   (label, z) int64 dask.array<chunksize=(2, 4), meta=np.ndarray>


However, the last alteration, which is removing the `_ARRAY_DIMENSIONS` key-value pair from one of the variables in the `.zmetadata` file present at the root of the zarr, results in an exception when reading. The error message is explicit: `KeyError: '_ARRAY_DIMENSIONS'` ❌

This means xarray cannot open any Zarr file, but only those who possess an xarray's special
private attribute, `_ARRAY_DIMENSIONS`. 

> Because of these choices, Xarray cannot read arbitrary array data, but only Zarr data with valid `_ARRAY_DIMENSIONS` 

See https://docs.xarray.dev/en/latest/internals/zarr-encoding-spec.html

In a first phase, the error message can probably be more explicit (better than a low-level `KeyError`), explaining that xarray cannot yet open arbitrary Zarr data.

In [10]:
zmetadata_path = zarr_path / ".zmetadata"
assert zmetadata_path.is_file()
zmetadata = load_json(zmetadata_path)
zmetadata["metadata"]["z/.zattrs"] = {}
zmetadata_path.write_text(json.dumps(zmetadata, indent=4))

1925

In [11]:
show(xr.open_zarr(zarr_path))

TypeError: the JSON object must be str, bytes or bytearray, not dict