Skip to content

Storing np.datetime64 attributtes in zarr files #9567

@CarlAndersson

Description

@CarlAndersson

What happened?

I have a dataset with an attribute which is a time, stored as a np.datetime64 value with nanosecond precision. Saving this to a zarr store and loading the dataset again drops the type of this attribute and loads it as an integer.

Example dataset:

<xarray.DataArray (x: 5)> Size: 20B
array([0, 1, 2, 3, 4])
Dimensions without coordinates: x
Attributes:
    time:     2024-10-02T07:39:39.000000000

gets loaded back as

<xarray.DataArray (x: 5)> Size: 20B
[5 values with dtype=int32]
Dimensions without coordinates: x
Attributes:
    time:     1727854779000000000

Using second precision for the datetime64 (instead of nanosecond) raises an error on json serialization, since it gets converted into a datetime at some point.

What did you expect to happen?

The time gets stored and read back properly.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

arr = xr.DataArray(
    np.arange(5),
    dims="x",
    attrs={"time": np.datetime64("now", "ns")},
)
print(arr)
arr.to_zarr("temp.zarr", mode="w")
print(xr.open_dataarray("temp.zarr", engine="zarr"))

arr = xr.DataArray(
    np.arange(5),
    dims="x",
    attrs={"time": np.datetime64("now", "s")},
)
print(arr)
arr.to_zarr("temp.zarr", mode="w")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 395, in _put_attrs
    zarr_obj.attrs.put(attrs)
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 124, in put
    self._write_op(self._put_nosync, d)
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 83, in _write_op
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 150, in _put_nosync
    self.store[self.key] = json_dumps(d)
                           ^^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\util.py", line 69, in json_dumps
    return json.dumps(
           ^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 202, in encode
    chunks = list(chunks)
             ^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\util.py", line 64, in default
    return json.JSONEncoder.default(self, o)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\core\dataarray.py", line 4355, in to_zarr
    return to_zarr(  # type: ignore[call-overload,misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\api.py", line 1784, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\api.py", line 1467, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 720, in store
    self.set_variables(
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 831, in set_variables
    zarr_array = _put_attrs(zarr_array, encoded_attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 397, in _put_attrs
    raise TypeError("Invalid attribute in Dataset.attrs.") from e
TypeError: Invalid attribute in Dataset.attrs.

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 17:48:58) [MSC v.1941 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: None
libnetcdf: None

xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions