Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: MultiIndex.get_loc errors if get_loc on level returns a slice #24263

Closed
shoyer opened this issue Dec 13, 2018 · 2 comments · Fixed by #37707
Closed

BUG: MultiIndex.get_loc errors if get_loc on level returns a slice #24263

shoyer opened this issue Dec 13, 2018 · 2 comments · Fixed by #37707
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Dec 13, 2018

Code Sample, a copy-pastable example if possible

With the dev version of pandas:

In [1]: import pandas as pd

In [2]: index = pd.date_range('2001-01-01', periods=100)

In [3]: mindex = pd.MultiIndex.from_arrays([index])

In [4]: mindex.get_loc('2001-01')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-1914bb512715> in <module>
----> 1 mindex.get_loc('2001-01')

~/dev/pandas/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2257
   2258         if not isinstance(key, tuple):
-> 2259             loc = self._get_level_indexer(key, level=0)
   2260             return _maybe_to_slice(loc)
   2261

~/dev/pandas/pandas/core/indexes/multi.py in _get_level_indexer(self, key, level, indexer)
   2525                 return locs
   2526
-> 2527             i = labels.searchsorted(code, side='left')
   2528             j = labels.searchsorted(code, side='right')
   2529             if i == j:

~/dev/pandas/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    175                 else:
    176                     kwargs[new_arg_name] = new_arg_value
--> 177             return func(*args, **kwargs)
    178         return wrapper
    179     return _deprecate_kwarg

~/dev/pandas/pandas/core/indexes/frozen.py in searchsorted(self, value, side, sorter)
    181         # xref: https://github.com/numpy/numpy/issues/5370
    182         try:
--> 183             value = self.dtype.type(value)
    184         except ValueError:
    185             pass

TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'

Problem description

This appears to have been introduced by #22230 by @toobaz, which deleted a check for isinstance(loc, slice). I'm not quite sure why, though it looks like this line didn't have test coverage.

I could not figure out how to trigger this directly on a pandas.Series. Indexing like series.loc['2001-01'] appears to go through a different code path not involving get_loc.

Expected Output

With pandas 0.23.4, this returns slice(0, 31, None) (which is the correct result)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 2f6d682
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+2160.g2f6d682a0
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.1
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: 0.11.0
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

shoyer added a commit to shoyer/xarray that referenced this issue Jan 13, 2019
It was a nice idea to support CFTimeIndex in a pandas.MultiIndex, but pandas
seems to have inadvertently broken this, see
pandas-dev/pandas#24263
shoyer added a commit to pydata/xarray that referenced this issue Jan 13, 2019
It was a nice idea to support CFTimeIndex in a pandas.MultiIndex, but pandas
seems to have inadvertently broken this, see
pandas-dev/pandas#24263
@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jan 13, 2019
shoyer added a commit to pydata/xarray that referenced this issue Jan 24, 2019
It was a nice idea to support CFTimeIndex in a pandas.MultiIndex, but pandas
seems to have inadvertently broken this, see
pandas-dev/pandas#24263
@toobaz
Copy link
Member

toobaz commented Jul 1, 2019

Indeed I made a mistake in not considering the possibility that single levels might return more than one code for a single key.

@shoyer were you able to reproduce with anything else than 1-level MultiIndexes where the first level has dates? (partial date indexing is the only case I can think of in which a single key corresponds to multiple locations having different labels)

@toobaz
Copy link
Member

toobaz commented Jul 1, 2019

As suggested by @jorisvandenbossche , IntervalIndex can have the same behavior... but yes, I think this is specific to 1-level MultiIndexes. Will write a PR making such cases follow the "list of labels" code path (depending on codes identified).

@jbrockmendel jbrockmendel added this to Index Methods (e.g. get_loc) in Indexing Feb 18, 2020
@jreback jreback added this to the 1.2 milestone Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
No open projects
Indexing
Index Methods (e.g. get_loc)
Development

Successfully merging a pull request may close this issue.

4 participants