-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
I have a dataset with an attribute which is a time, stored as a np.datetime64 value with nanosecond precision. Saving this to a zarr store and loading the dataset again drops the type of this attribute and loads it as an integer.
Example dataset:
<xarray.DataArray (x: 5)> Size: 20B
array([0, 1, 2, 3, 4])
Dimensions without coordinates: x
Attributes:
time: 2024-10-02T07:39:39.000000000
gets loaded back as
<xarray.DataArray (x: 5)> Size: 20B
[5 values with dtype=int32]
Dimensions without coordinates: x
Attributes:
time: 1727854779000000000
Using second precision for the datetime64 (instead of nanosecond) raises an error on json serialization, since it gets converted into a datetime at some point.
What did you expect to happen?
The time gets stored and read back properly.
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
arr = xr.DataArray(
np.arange(5),
dims="x",
attrs={"time": np.datetime64("now", "ns")},
)
print(arr)
arr.to_zarr("temp.zarr", mode="w")
print(xr.open_dataarray("temp.zarr", engine="zarr"))
arr = xr.DataArray(
np.arange(5),
dims="x",
attrs={"time": np.datetime64("now", "s")},
)
print(arr)
arr.to_zarr("temp.zarr", mode="w")MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Traceback (most recent call last):
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 395, in _put_attrs
zarr_obj.attrs.put(attrs)
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 124, in put
self._write_op(self._put_nosync, d)
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 83, in _write_op
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\attrs.py", line 150, in _put_nosync
self.store[self.key] = json_dumps(d)
^^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\util.py", line 69, in json_dumps
return json.dumps(
^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 202, in encode
chunks = list(chunks)
^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 432, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 406, in _iterencode_dict
yield from chunks
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 439, in _iterencode
o = _default(o)
^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\zarr\util.py", line 64, in default
return json.JSONEncoder.default(self, o)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\json\encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\core\dataarray.py", line 4355, in to_zarr
return to_zarr( # type: ignore[call-overload,misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\api.py", line 1784, in to_zarr
dump_to_store(dataset, zstore, writer, encoding=encoding)
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\api.py", line 1467, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 720, in store
self.set_variables(
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 831, in set_variables
zarr_array = _put_attrs(zarr_array, encoded_attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\<username>\repos\test\.pixi\envs\default\Lib\site-packages\xarray\backends\zarr.py", line 397, in _put_attrs
raise TypeError("Invalid attribute in Dataset.attrs.") from e
TypeError: Invalid attribute in Dataset.attrs.Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 17:48:58) [MSC v.1941 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: None
libnetcdf: None
xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.1
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None