Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_netcdf() fails because of datetime encoding #2512

Closed
nick-weber opened this issue Oct 25, 2018 · 2 comments · Fixed by #2513
Closed

to_netcdf() fails because of datetime encoding #2512

nick-weber opened this issue Oct 25, 2018 · 2 comments · Fixed by #2513
Labels

Comments

@nick-weber
Copy link

Simple example:

import numpy as np
from datetime import datetime, timedelta
import xarray

# "time" coordinate
dt = datetime(1999, 1, 1)
dts = np.array([dt + timedelta(days=x) for x in range(10)])
coords = {'time': dts}

# simple float data
data = np.arange(10)
vrbls = {'foo': (('time',), data)}

# create the Dataset
ds = xarray.Dataset(vrbls, coords)

# encode the time coordinate
units = 'days since 1900-01-01'
ds.time.encoding['units'] = units

# write to netcdf
ds.to_netcdf('test.nc')

Problem description

When I run the above, I get the following error when executing the last line:
ValueError: unsupported dtype for netCDF4 variable: datetime64[ns]

The documentation indicates that datetime and datetime64 objects are both supported by xarray and should write to netcdf just fine when supplied "units" for encoding (this code fails with or without the encoding lines). Any Idea when is going wrong here?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-8-amd64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.9
pandas: 0.20.3
numpy: 1.13.1
scipy: 0.19.1
netCDF4: 1.4.2
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.0
distributed: 1.20.1
matplotlib: 2.1.0
cartopy: None
seaborn: 0.8.0
setuptools: 27.2.0
pip: 9.0.1
conda: 4.5.11
pytest: 3.1.3
IPython: 6.1.0
sphinx: 1.6.2

@spencerkclark
Copy link
Member

I think this is a bug. The error message in particular makes things confusing. The problem stems from this line, which results in an array of dtype object (see the explanation below):

dts = np.array([dt + timedelta(days=x) for x in range(10)])

As a temporary workaround, if you keep dts as a list before passing it to the Dataset constructor, things will work as you expect:

dts = [dt + timedelta(days=x) for x in range(10)]

By default NumPy will cast a list of datetime.datetime objects to an array with dtype object:

In [4]: dts = np.array([dt + timedelta(days=x) for x in range(10)])

In [5]: dts
Out[5]:
array([datetime.datetime(1999, 1, 1, 0, 0),
       datetime.datetime(1999, 1, 2, 0, 0),
       datetime.datetime(1999, 1, 3, 0, 0),
       datetime.datetime(1999, 1, 4, 0, 0),
       datetime.datetime(1999, 1, 5, 0, 0),
       datetime.datetime(1999, 1, 6, 0, 0),
       datetime.datetime(1999, 1, 7, 0, 0),
       datetime.datetime(1999, 1, 8, 0, 0),
       datetime.datetime(1999, 1, 9, 0, 0),
       datetime.datetime(1999, 1, 10, 0, 0)], dtype=object)

If you specify an array of dtype object as a coordinate, xarray currently has some logic that requires that it be cast to a generic pandas.Index with dtype object (and will preserve the datetime.datetime elements of the array).

In [14]: da = xarray.DataArray(range(10), coords=[dts], dims=['time'])

In [15]: da
Out[15]:
<xarray.DataArray (time: 10)>
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Coordinates:
  * time     (time) object 1999-01-01 1999-01-02 1999-01-03 1999-01-04 ...

In [16]: da.indexes['time']
Out[16]:
Index([1999-01-01 00:00:00, 1999-01-02 00:00:00, 1999-01-03 00:00:00,
       1999-01-04 00:00:00, 1999-01-05 00:00:00, 1999-01-06 00:00:00,
       1999-01-07 00:00:00, 1999-01-08 00:00:00, 1999-01-09 00:00:00,
       1999-01-10 00:00:00],
      dtype='object', name='time')

The code where this happens is here:

else:
kwargs = {}
if hasattr(array, 'dtype') and array.dtype.kind == 'O':
kwargs['dtype'] = object
index = pd.Index(np.asarray(array), **kwargs)

In practice, in the event that an object type array contains datetime.datetime objects, this should probably convert them to np.datetime64 instead and return a pandas.DatetimeIndex. This would be more consistent with how xarray currently handles passing an object array of datetime.datetime objects as the data argument to a DataArray; there it converts things automatically:

In [18]: xarray.DataArray(dts)
Out[18]:
<xarray.DataArray (dim_0: 10)>
array(['1999-01-01T00:00:00.000000000', '1999-01-02T00:00:00.000000000',
       '1999-01-03T00:00:00.000000000', '1999-01-04T00:00:00.000000000',
       '1999-01-05T00:00:00.000000000', '1999-01-06T00:00:00.000000000',
       '1999-01-07T00:00:00.000000000', '1999-01-08T00:00:00.000000000',
       '1999-01-09T00:00:00.000000000', '1999-01-10T00:00:00.000000000'],
      dtype='datetime64[ns]')
Dimensions without coordinates: dim_0

Xarray's logic to encode dates (i.e. what is used when saving datetime data to files) requires that dates are either of type np.datetime64 or cftime.datetime (datetime.datetime objects are not supported there). If datetime.datetime arrays were automatically converted to np.datetime64 arrays in all cases then this would not be an issue.

The error message actually results from the dates being converted to np.datetime64 after the datetime encoding logic is skipped (it happens in the Variable constructor here) and netCDF4 doesn't know how to deal with that data type, but I think the fundamental issue relates to the lines of code I referenced above.

@shoyer
Copy link
Member

shoyer commented Oct 25, 2018

I agree, this is definitely a bug. We do have logic that is supposed to automatically convert datetime object arrays into datetime64, but for some reason it isn't being triggered here:

if isinstance(data, np.ndarray):
if data.dtype.kind == 'O':
data = _possibly_convert_objects(data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants