Indexing a RangeIndexed' DataArray with a RangeIndex returns a deprecated Int64Index #6256

hrzn · 2022-02-09T09:55:06Z

What happened?

First, apology if this is not actually a bug - I'm not too sure of what the intended behaviour should be. But I find this counter-intuitive.

When indexing a DataArray that is indexed using a RangeIndex, the resulting index is an Int64Index:

my_da.get_index('time')
>>> RangeIndex(start=0, stop=100, step=1, name='time')

a = my_da.sel({'time': pd.RangeIndex(0,2)})
a.get_index('time')
>>> Int64Index([0, 1], dtype='int64', name='time')

Setting the index to the desired RangeIndex using assign_coords() then works. But I find it a bit problematic that sel() returns an Int64Index even when used with a RangeIndex. Also because Int64Index has been recently deprecated in Pandas 1.4.

What did you expect to happen?

I would have expected the resulting DataArray to be indexed with the same RangeIndex used in sel().

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import pandas as pd

my_da = xr.DataArray(np.random.rand(100,),
                     dims=('time'),
                     coords={'time': pd.RangeIndex(0, 100)})

print(my_da.get_index('time'))
a = my_da.sel({'time': pd.RangeIndex(0,2)})
print(a.get_index('time'))

Relevant log output

RangeIndex(start=0, stop=100, step=1, name='time')
Int64Index([0, 1], dtype='int64', name='time')

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 0.20.2
pandas: 1.4.0
numpy: 1.22.1
scipy: 1.7.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.5.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: None
sparse: None
setuptools: 59.5.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 8.0.1
sphinx: 4.3.2

The text was updated successfully, but these errors were encountered:

mathause · 2022-02-21T14:56:19Z

Thanks for the report - I guess RangeIndex was never very thoroughly tested. This may or may not change with #5692 (which is hopefully merged in the near future). So I suggest to wait for this.

benbovy · 2022-02-21T21:28:56Z

This is still the same behavior with #5692.

We would need to handle pd.RangeIndex (and perhaps range?) label indexers similarly to slice label indexers, i.e., use pd.Index.slice_indexer internally to return integer indexers as slices (*).

b = my_da.sel(time=slice(0, 2))
b.get_index('time')
# RangeIndex(start=0, stop=3, step=1, name='time')

Otherwise, label indexers get internally converted to arrays. Note that the conversion to an Int64Index is done in pandas (nothing specific is done on the Xarray side), so I expect that this will be eventually addressed in pandas. This conversion may not be too problematic if we consider this as an implementation detail (although I might be missing some important aspect).

idx = pd.RangeIndex(0, 100)

idx[slice(0, 3)]
# RangeIndex(start=0, stop=3, step=1)

idx[[0, 1, 2]]
# Int64Index([0, 1, 2], dtype='int64')

(*) One major difference is that in Xarray slice label indexers are upper-bound inclusive, while pd.RangeIndex and range are not!

hrzn added bug needs triage Issue that has not been reviewed by xarray team member labels Feb 9, 2022

hrzn mentioned this issue Feb 9, 2022

Fix/int index warnings unit8co/darts#777

Merged

mathause added topic-indexing topic-internals and removed needs triage Issue that has not been reviewed by xarray team member labels Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing a RangeIndexed' DataArray with a RangeIndex returns a deprecated Int64Index #6256

Indexing a RangeIndexed' DataArray with a RangeIndex returns a deprecated Int64Index #6256

hrzn commented Feb 9, 2022 •

edited

mathause commented Feb 21, 2022 •

edited

benbovy commented Feb 21, 2022

Indexing a RangeIndexed' DataArray with a RangeIndex returns a deprecated Int64Index #6256

Indexing a RangeIndexed' DataArray with a RangeIndex returns a deprecated Int64Index #6256

Comments

hrzn commented Feb 9, 2022 • edited

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

mathause commented Feb 21, 2022 • edited

benbovy commented Feb 21, 2022

hrzn commented Feb 9, 2022 •

edited

mathause commented Feb 21, 2022 •

edited