Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection with datetime64[ns] fails with Pandas 1.1.0 #4283

Closed
dopplershift opened this issue Jul 29, 2020 · 2 comments · Fixed by #4292
Closed

Selection with datetime64[ns] fails with Pandas 1.1.0 #4283

dopplershift opened this issue Jul 29, 2020 · 2 comments · Fixed by #4292
Labels

Comments

@dopplershift
Copy link
Contributor

I ran into this issue with a netCDF file with the following time variable:

	double time1(time1) ;
		time1:_FillValue = NaN ;
		time1:standard_name = "time" ;
		time1:long_name = "time" ;
		time1:udunits = "Hour since 2017-09-05T12:00:00Z" ;
		time1:units = "Hour since 2017-09-05T12:00:00+00:00" ;
		time1:calendar = "proleptic_gregorian" ;

 time1 = 0, 3, 6, 9, 12, 15, 18, 21, 24 ;

but we can reproduce the problem with something as simple as:

import numpy as np
import xarray as xr

t = np.array(['2017-09-05T12:00:00.000000000', '2017-09-05T15:00:00.000000000'], dtype='datetime64[ns]')
da = xr.DataArray(np.ones(t.shape), dims=('time',), coords=(t,))

da.loc[{'time':t[0]}]  # Works on pandas 1.0.5

this produces:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-3e0afa0bd195> in <module>
----> 1 da.loc[{'time':t[0]}]

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
    196             labels = indexing.expanded_indexer(key, self.data_array.ndim)
    197             key = dict(zip(self.data_array.dims, labels))
--> 198         return self.data_array.sel(**key)
    199 
    200     def __setitem__(self, key, value) -> None:

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1147 
   1148         """
-> 1149         ds = self._to_temp_dataset().sel(
   1150             indexers=indexers,
   1151             drop=drop,

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2099         """
   2100         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2101         pos_indexers, new_indexes = remap_label_indexers(
   2102             self, indexers=indexers, method=method, tolerance=tolerance
   2103         )

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
    394     }
    395 
--> 396     pos_indexers, new_indexes = indexing.remap_label_indexers(
    397         obj, v_indexers, method=method, tolerance=tolerance
    398     )

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    268             coords_dtype = data_obj.coords[dim].dtype
    269             label = maybe_cast_to_coords_dtype(label, coords_dtype)
--> 270             idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
    271             pos_indexers[dim] = idxr
    272             if new_idx is not None:

~/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
    187                 indexer = index.get_loc(label.item())
    188             else:
--> 189                 indexer = index.get_loc(
    190                     label.item(), method=method, tolerance=tolerance
    191                 )

~/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
    620         else:
    621             # unrecognized type
--> 622             raise KeyError(key)
    623 
    624         try:

KeyError: 1504612800000000000

what's interesting is changing the units of datetime64 to [s] works:

import numpy as np
import xarray as xr

t = np.array(['2017-09-05T12:00:00.000000000', '2017-09-05T15:00:00.000000000'], dtype='datetime64[s]')
da = xr.DataArray(np.ones(t.shape), dims=('time',), coords=(t,))
da.loc[{'time':t[0]}]  # Works

Environment:
Python 3.8 from conda-forge on macOS 10.15.4

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:06:20)
[Clang 10.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.0
pandas: 1.1.0
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.3
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.21.0
matplotlib: 3.3.0
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: 0.14
setuptools: 49.2.0.post20200712
pip: 20.1.1
conda: None
pytest: 6.0.0
IPython: 7.16.1
sphinx: 2.4.4

@dopplershift
Copy link
Contributor Author

Looks like (to my eye anyway) it stems from:

import numpy as np
import pandas as pd
t = np.array(['2017-09-05T12:00:00.000000000', '2017-09-05T15:00:00.000000000'], dtype='datetime64[ns]')
index = pd.DatetimeIndex(t)

index.get_loc(t[0].item()) # Fails with KeyError
index.get_loc(t[0])  # Works

Fails on 1.1.0. What I have no idea is whether the .item() call is supposed to work.

@SpacemanPaul
Copy link

The OpenDataCube is being affected by this also.

@shoyer shoyer added the bug label Jul 30, 2020
shoyer added a commit to shoyer/xarray that referenced this issue Jul 31, 2020
Fixes pydata#4283

The underlying issue is that calling `.item()` on a NumPy array with
`dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64
scalar. This is somewhat baffling but works this way because `.item()`
returns native Python types, but `datetime.datetime` doesn't support
nanosecond precision.

`pandas.Index.get_loc` used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars
instead of calling `array.item()`.

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.
shoyer added a commit to shoyer/xarray that referenced this issue Jul 31, 2020
Fixes pydata#4283

The underlying issue is that calling `.item()` on a NumPy array with
`dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64
scalar. This is somewhat baffling but works this way because `.item()`
returns native Python types, but `datetime.datetime` doesn't support
nanosecond precision.

`pandas.Index.get_loc` used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars
instead of calling `array.item()`.

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.
leifdenby added a commit to leifdenby/lagtraj that referenced this issue Aug 4, 2020
There is an issue with pandas and xarray wrt to datetime indexing
(pydata/xarray#4283), until this is fixed in
xarray we'll pin the pandas version.
Chilipp added a commit to psyplot/psyplot that referenced this issue Aug 27, 2020
shoyer added a commit that referenced this issue Sep 16, 2020
* Fix indexing with datetime64[ns] with pandas=1.1

Fixes #4283

The underlying issue is that calling `.item()` on a NumPy array with
`dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64
scalar. This is somewhat baffling but works this way because `.item()`
returns native Python types, but `datetime.datetime` doesn't support
nanosecond precision.

`pandas.Index.get_loc` used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars
instead of calling `array.item()`.

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.

* lint fix

* add a test checking the datetime indexer

* use label.item() for non-datetime / timedelta labels

* unpin pandas in the docs

* ignore the future warning about deprecated arguments to pandas.Grouper

* Update xarray/core/indexing.py

Co-authored-by: keewis <keewis@users.noreply.github.com>

* Add whatsnew note

Co-authored-by: Keewis <keewis@posteo.de>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: keewis <keewis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants