RecursionError when selecting single column from IntervalIndex #26490

sbitzer · 2019-05-22T14:11:24Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
        np.ones((3, 4)), 
        columns=pd.IntervalIndex.from_breaks(np.arange(5)))
df[0.5]
df.loc[:, 0.5]

Problem description

Instead of returning the selected column, either calling df[0.5] or df.loc[:, 0.5] raises RecursionError.

The issue is in frame.__getitem__ where key in self.columns == False. The code then correctly identifies the desired column by integer index using self.columns.get_loc(key), but then goes on to call

data = self._take(indexer, axis=1)
data = data[key]

Because self._take returns a DataFrame, data[key] enters frame.__getitem__ again and we are caught in an infinite loop.

Expected Output

The selected column as Series.

Output of `pd.show_versions()`

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.7
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

jschendel · 2019-05-22T15:15:56Z

Thanks, I can confirm this issue on master. Using an IntervalIndex as columns isn't well tested (xref #17130), so any bug reports or fixes are appreciated.

sbitzer · 2019-05-24T15:11:08Z

This worked in pandas 0.23.4. My impression is that we can at least get back to the old behaviour without breaking anything by changing

pandas/pandas/core/frame.py

Line 2840 in ae40904

if self.columns.is_unique and key in self.columns:

to

if self.columns.is_unique and self.columns.contains(key):

but I'm unsure whether I miss some side-effects of this change for other index types. This change will prevent the RecursionError, because self.columns.contains(scalar) will evaluate to true while scalar in self.columns evaluates to false for IntervalIndex.

jschendel · 2019-05-24T21:25:58Z

Using the contains method is probably not the right choice for a few reasons. The is_unique condition isn't actually sufficient for scalar lookups in a IntervalIndex, as you can get duplicates from a unique but overlapping IntervalIndex, which would cause this codepath to fail.

Additionally, IIRC there were talks of deprecating the contains method for indexes, as I believe it's identical to __contains__ for every index other than IntervalIndex, and instead make IntervalIndex.contains operate elementwise.

A simple fix looks to be converting data[key] --> data.squeeze() here:

pandas/pandas/core/frame.py

Lines 2889 to 2890 in b730ab3

    
           if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex): 
        
               data = data[key]

…dev#26490

…26490 (#37152) * split off of 37150 * Troubleshoot Co-authored-by: Jeff Reback <jeff@reback.net>

…andas-dev#26490 (pandas-dev#37152) * split off of 37150 * Troubleshoot Co-authored-by: Jeff Reback <jeff@reback.net>

sbitzer changed the title ~~RecursionError when selecting single column of IntervalIndex~~ RecursionError when selecting single column from IntervalIndex May 22, 2019

jschendel added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type labels May 22, 2019

jschendel added this to the Contributions Welcome milestone May 22, 2019

jschendel mentioned this issue May 30, 2019

BUG: Fix RecursionError when using a scalar point to select IntervalIndex columns #26570

Closed

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 May 30, 2019

jorisvandenbossche modified the milestones: 0.25.0, Contributions Welcome Jun 30, 2019

mroeschke added the Bug label Apr 27, 2020

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Oct 15, 2020

BUG: DataFrame.__getitem__(number) with IntervalIndex columns pandas-…

388ec49

…dev#26490

jbrockmendel mentioned this issue Oct 15, 2020

BUG: indexing bugs #26490, #13691 #37150

Closed

6 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Oct 16, 2020

jbrockmendel mentioned this issue Oct 16, 2020

BUG: RecursionError when selecting single column from IntervalIndex #26490 #37152

Merged

5 tasks

jbrockmendel closed this as completed in #37152 Oct 17, 2020

jbrockmendel added a commit that referenced this issue Oct 17, 2020

BUG: RecursionError when selecting single column from IntervalIndex #…

dcba608

…26490 (#37152) * split off of 37150 * Troubleshoot Co-authored-by: Jeff Reback <jeff@reback.net>

JulianWgs pushed a commit to JulianWgs/pandas that referenced this issue Oct 26, 2020

BUG: RecursionError when selecting single column from IntervalIndex p…

f22e55d

…andas-dev#26490 (pandas-dev#37152) * split off of 37150 * Troubleshoot Co-authored-by: Jeff Reback <jeff@reback.net>

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020

BUG: RecursionError when selecting single column from IntervalIndex p…

a809286

…andas-dev#26490 (pandas-dev#37152) * split off of 37150 * Troubleshoot Co-authored-by: Jeff Reback <jeff@reback.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError when selecting single column from IntervalIndex #26490

RecursionError when selecting single column from IntervalIndex #26490

sbitzer commented May 22, 2019

jschendel commented May 22, 2019 •

edited

Loading

sbitzer commented May 24, 2019

jschendel commented May 24, 2019

RecursionError when selecting single column from IntervalIndex #26490

RecursionError when selecting single column from IntervalIndex #26490

Comments

sbitzer commented May 22, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jschendel commented May 22, 2019 • edited Loading

sbitzer commented May 24, 2019

jschendel commented May 24, 2019

Output of `pd.show_versions()`

jschendel commented May 22, 2019 •

edited

Loading