Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When .loc returns IndexError rather than KeyError #12527

Closed
itcarroll opened this issue Mar 4, 2016 · 2 comments
Closed

When .loc returns IndexError rather than KeyError #12527

itcarroll opened this issue Mar 4, 2016 · 2 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@itcarroll
Copy link

Updated

I have a MultiIndex'd DataFrame that returns a KeyError for one integer and an IndexError for a different integer, neither integer is in the first level of the index. This only occurs when attempting to access a scalar value, a slice always give KeyError. The behavior does not occur on a cut down version of the (>200M when pickled) data frame, or I would attach a working example. Can send the file if needed though.

>>> isinstance(n, int)
True
>>> df.loc[(n, 0), 'dest']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1196, in __getitem__
    return self._getitem_tuple(key)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 709, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 817, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 889, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1343, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 86, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 1483, in xs
    drop_level=drop_level)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/index.py", line 5432, in get_loc_level
    return (self._engine.get_loc(_values_from_object(key)),
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
  File "pandas/index.pyx", line 146, in pandas.index.IndexEngine.get_loc (pandas/index.c:3693)
  File "pandas/src/util.pxd", line 41, in util.get_value_at (pandas/index.c:13199)
IndexError: index out of bounds

Expected Output

>>> df.loc[(m, 0), 'dest']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1196, in __getitem__
    return self._getitem_tuple(key)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 709, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 817, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 889, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1343, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexing.py", line 86, in _get_label
    return self.obj._xs(label, axis=axis)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/generic.py", line 1483, in xs
    drop_level=drop_level)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/index.py", line 5432, in get_loc_level
    return (self._engine.get_loc(_values_from_object(key)),
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
  File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:3719)
KeyError: (300067502, 0)

The expected behavior occurs for nearly all integers I try that are not in the first level of the index. How could special integers give an IndexError?

output of pd.show_versions()

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 8.0.2
setuptools: 19.4
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.16.1
statsmodels: None
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.6
matplotlib: 1.5.0
openpyxl: None
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None

@itcarroll itcarroll changed the title When .at returns IndexError rather than KeyError When .loc returns IndexError rather than KeyError Mar 4, 2016
@TomAugspurger TomAugspurger added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 4, 2016
@TomAugspurger TomAugspurger added this to the 0.18.1 milestone Mar 4, 2016
@TomAugspurger
Copy link
Contributor

@itcarroll thanks for the report! It's helpful to include examples that are fully reproducible. Sometimes there's something unique about the file that's causing the bug, but most of the time it's something else. Fortunately this time it looks like it's just the length of the DataFrame:

# KeyError
df = pd.DataFrame(1, index=pd.MultiIndex.from_product([[1, 2], range(499999)]), columns=['dest']); df.loc[(3, 0), 'dest']

# IndexError
df = pd.DataFrame(1, index=pd.MultiIndex.from_product([[1, 2], range(500000)]), columns=['dest']); df.loc[(3, 0), 'dest']

@gliptak
Copy link
Contributor

gliptak commented Apr 18, 2016

Seem to be hitting (in pandas/index.pyx):

# Don't populate hash tables in monotonic indexes larger than this
_SIZE_CUTOFF = 1000000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants