Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError when selecting single column from IntervalIndex #26490

Closed
sbitzer opened this issue May 22, 2019 · 3 comments · Fixed by #37152
Closed

RecursionError when selecting single column from IntervalIndex #26490

sbitzer opened this issue May 22, 2019 · 3 comments · Fixed by #37152
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Milestone

Comments

@sbitzer
Copy link

sbitzer commented May 22, 2019

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
        np.ones((3, 4)), 
        columns=pd.IntervalIndex.from_breaks(np.arange(5)))
df[0.5]
df.loc[:, 0.5]

Problem description

Instead of returning the selected column, either calling df[0.5] or df.loc[:, 0.5] raises RecursionError.

The issue is in frame.__getitem__ where key in self.columns == False. The code then correctly identifies the desired column by integer index using self.columns.get_loc(key), but then goes on to call

data = self._take(indexer, axis=1)
data = data[key]

Because self._take returns a DataFrame, data[key] enters frame.__getitem__ again and we are caught in an infinite loop.

Expected Output

The selected column as Series.

Output of pd.show_versions()

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.7
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@sbitzer sbitzer changed the title RecursionError when selecting single column of IntervalIndex RecursionError when selecting single column from IntervalIndex May 22, 2019
@jschendel
Copy link
Member

jschendel commented May 22, 2019

Thanks, I can confirm this issue on master. Using an IntervalIndex as columns isn't well tested (xref #17130), so any bug reports or fixes are appreciated.

@jschendel jschendel added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type labels May 22, 2019
@jschendel jschendel added this to the Contributions Welcome milestone May 22, 2019
@sbitzer
Copy link
Author

sbitzer commented May 24, 2019

This worked in pandas 0.23.4. My impression is that we can at least get back to the old behaviour without breaking anything by changing

if self.columns.is_unique and key in self.columns:

to

if self.columns.is_unique and self.columns.contains(key):

but I'm unsure whether I miss some side-effects of this change for other index types. This change will prevent the RecursionError, because self.columns.contains(scalar) will evaluate to true while scalar in self.columns evaluates to false for IntervalIndex.

@jschendel
Copy link
Member

Using the contains method is probably not the right choice for a few reasons. The is_unique condition isn't actually sufficient for scalar lookups in a IntervalIndex, as you can get duplicates from a unique but overlapping IntervalIndex, which would cause this codepath to fail.

Additionally, IIRC there were talks of deprecating the contains method for indexes, as I believe it's identical to __contains__ for every index other than IntervalIndex, and instead make IntervalIndex.contains operate elementwise.

A simple fix looks to be converting data[key] --> data.squeeze() here:

pandas/pandas/core/frame.py

Lines 2889 to 2890 in b730ab3

if data.shape[1] == 1 and not isinstance(self.columns, MultiIndex):
data = data[key]

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 30, 2019
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.25.0, Contributions Welcome Jun 30, 2019
@mroeschke mroeschke added the Bug label Apr 27, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Oct 16, 2020
jbrockmendel added a commit that referenced this issue Oct 17, 2020
…26490 (#37152)

* split off of 37150

* Troubleshoot

Co-authored-by: Jeff Reback <jeff@reback.net>
JulianWgs pushed a commit to JulianWgs/pandas that referenced this issue Oct 26, 2020
…andas-dev#26490 (pandas-dev#37152)

* split off of 37150

* Troubleshoot

Co-authored-by: Jeff Reback <jeff@reback.net>
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
…andas-dev#26490 (pandas-dev#37152)

* split off of 37150

* Troubleshoot

Co-authored-by: Jeff Reback <jeff@reback.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Projects
None yet
5 participants