Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use xarrays own times for indexing #1240

Closed
gerritholl opened this issue Jan 30, 2017 · 9 comments · Fixed by #1998
Closed

Cannot use xarrays own times for indexing #1240

gerritholl opened this issue Jan 30, 2017 · 9 comments · Fixed by #1998
Labels

Comments

@gerritholl
Copy link
Contributor

I need to get the first Δt from the start of my dataset, i.e. ds.sel(start_time, start_time + timedelta). However, due to pandas using M8[ns] but datetime.datetime not supporting this, the index gets converted to an int and indexing fails. Inspection tells me that by the time the index reaches pandas it is already an int. This is ultimately due to the numpy problem that timedelta64(0, 'ns').item() is an int, but it would be very nice if xarray had a workaround so that we can use indexing such as shown below.

In [282]: time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)

In [283]: ds = xarray.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})

In [284]: ds.sel(time=slice(ds["time"][0], ds["time"][10]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-284-a101e126e1b0> in <module>()
----> 1 ds.sel(time=slice(ds["time"][0], ds["time"][10]))

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/xarray/core/dataset.py in sel(self, method, tolerance, drop, **indexers)
   1180         """
   1181         pos_indexers, new_indexes = indexing.remap_label_indexers(
-> 1182             self, indexers, method=method, tolerance=tolerance
   1183         )
   1184         result = self.isel(drop=drop, **pos_indexers)

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    286         else:
    287             idxr, new_idx = convert_label_indexer(index, label,
--> 288                                                   dim, method, tolerance)
    289             pos_indexers[dim] = idxr
    290             if new_idx is not None:

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
    183         indexer = index.slice_indexer(_try_get_item(label.start),
    184                                       _try_get_item(label.stop),
--> 185                                       _try_get_item(label.step))
    186         if not isinstance(indexer, slice):
    187             # unlike pandas, in xarray we never want to silently convert a slice

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/tseries/index.py in slice_indexer(self, start, end, step, kind)
   1496 
   1497         try:
-> 1498             return Index.slice_indexer(self, start, end, step, kind=kind)
   1499         except KeyError:
   1500             # For historical reasons DatetimeIndex by default supports

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/indexes/base.py in slice_indexer(self, start, end, step, kind)
   2995         """
   2996         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 2997                                                  kind=kind)
   2998 
   2999         # return a slice

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   3174         start_slice = None
   3175         if start is not None:
-> 3176             start_slice = self.get_slice_bound(start, 'left', kind)
   3177         if start_slice is None:
   3178             start_slice = 0

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/indexes/base.py in get_slice_bound(self, label, side, kind)
   3113         # For datetime indices label may be a string that has to be converted
   3114         # to datetime boundary according to its resolution.
-> 3115         label = self._maybe_cast_slice_bound(label, side, kind)
   3116 
   3117         # we need to look up the label

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/tseries/index.py in _maybe_cast_slice_bound(self, label, side, kind)
   1444 
   1445         if is_float(label) or isinstance(label, time) or is_integer(label):
-> 1446             self._invalid_indexer('slice', label)
   1447 
   1448         if isinstance(label, compat.string_types):

/dev/shm/gerrit/venv/stable-3.5/lib/python3.5/site-packages/pandas/indexes/base.py in _invalid_indexer(self, form, key)
   1282                         "indexers [{key}] of {kind}".format(
   1283                             form=form, klass=type(self), key=key,
-> 1284                             kind=type(key)))
   1285 
   1286     def get_duplicates(self):

TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [946684800000000000] of <class 'int'>
@shoyer
Copy link
Member

shoyer commented Jan 30, 2017

Yes, this is annoying. We already have some work arounds for this in xarray -- see _as_array_or_item in xarray/core/variable.py. I would be happy to add a similar work around to fix slicing.

@jfburkhart
Copy link

I get the same error when trying to do:
depo[depo.sel(time=slice(tmp.time[0],tmp.time[-1]))] = tmp
Where both depo and tmp are DataArrays. Do you have an example solution/workaround?

@ulijh
Copy link
Contributor

ulijh commented Nov 21, 2017

Hi, this is still the case for version 0.10.0.

arr = xr.DataArray(np.random.rand(10, 3),
    ...:                    [('time', pd.date_range('2000-01-01', periods=10)),
    ...:                    ('space', ['IA', 'IL', 'IN'])])
    ...:                    
arr.loc[arr.time[2]:arr.time[5]]

fails, but doing the same thing on a pandas dataframe works just fine:

dfr = arr.to_dataframe(name='dfr')
dfr.loc[arr.time[2]:arr.time[5]]

I'd really appreciate see this working on a DataArray.

@jhamman
Copy link
Member

jhamman commented Nov 21, 2017

@ulijh - you should be able to use the time index:

In [11]: arr.loc[arr.indexes['time'][2]:arr.indexes['time'][5]]
Out[11]:
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.422991,  0.513276,  0.762432],
       [ 0.111123,  0.371109,  0.697921],
       [ 0.029415,  0.215116,  0.451697],
       [ 0.670181,  0.855551,  0.319134]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 2000-01-06
  * space    (space) <U2 'IA' 'IL' 'IN'

@ulijh
Copy link
Contributor

ulijh commented Nov 21, 2017

@jhamman - thanks, this should be usefull...

@shoyer shoyer added the bug label Nov 21, 2017
@shoyer
Copy link
Member

shoyer commented Nov 21, 2017

Sorry for letting this one linger for so long... I added the "bug" tag so we don't forget about it for the next release.

@gerritholl
Copy link
Contributor Author

This was closed and was solved for slicing, but not for element indexing:

import xarray as xr
import numpy as np
da = xr.DataArray([0, 1], dims=("time",), coords={"time": np.array([0, 1], dtype="M8[s]")})
da.sel(time=da.coords["time"][0])

results in

Traceback (most recent call last):
  File "mwe83.py", line 4, in <module>
    da.sel(time=da.coords["time"][0])
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1142, in sel
    ds = self._to_temp_dataset().sel(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py", line 2096, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py", line 395, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 270, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 189, in convert_label_indexer
    indexer = index.get_loc(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 622, in get_loc
    raise KeyError(key)
KeyError: 0

using xarray 0.15.2.dev64+g2542a63f (latest master). I think it would be desirable that it works in both cases. Should we reopen this issue or should I open a new?

@keewis
Copy link
Collaborator

keewis commented Aug 28, 2020

What's your pandas version? If it's pandas>=1.1.0, we already have #4283. Otherwise you might want to update your repository, we're at 13caf96 right now (the latest release is 0.16.0).

@gerritholl
Copy link
Contributor Author

gerritholl commented Aug 28, 2020

I fixed my conda environment now (something was wrong as I appeared to have two xarray installations in parallel). I still get the KeyError with latest xarray master and latest pandas master:

$ conda list | egrep -w '(pandas|xarray)'
pandas                    1.2.0.dev0+167.g1f35b0621          pypi_0    pypi
xarray                    0.16.1.dev65+g13caf96e          pypi_0    pypi
$ python mwe83.py
Traceback (most recent call last):
  File "mwe83.py", line 5, in <module>
    da.sel(time=da.coords["time"][0])
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataarray.py", line 1142, in sel
    ds = self._to_temp_dataset().sel(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/dataset.py", line 2096, in sel
    pos_indexers, new_indexes = remap_label_indexers(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/coordinates.py", line 395, in remap_label_indexers
    pos_indexers, new_indexes = indexing.remap_label_indexers(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 270, in remap_label_indexers
    idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance)
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py", line 189, in convert_label_indexer
    indexer = index.get_loc(
  File "/data/gholl/miniconda3/envs/py38/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 622, in get_loc
    raise KeyError(key)
KeyError: 0
$ cat mwe83.py 
import xarray as xr
import numpy as np
da = xr.DataArray([0, 1], dims=("time",), coords={"time": np.array([0, 1], dtype="M8[s]")})
da.sel(time=slice(da.coords["time"][0], da.coords["time"][1]))
da.sel(time=da.coords["time"][0])

Oops, by "already have" you meant it's already been reported, I thought you meant it had already been fixed. All clear then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants