Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.loc with (scalar, slice(None)) on MultiIndex does not drop first level #18631

Open
toobaz opened this issue Dec 4, 2017 · 4 comments
Open
Labels
Bug Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@toobaz
Copy link
Member

toobaz commented Dec 4, 2017

Code Sample, a copy-pastable example if possible

In [2]: df = pd.DataFrame(index=pd.MultiIndex.from_product([[1,2], [3, 4], [5, 6]]), columns=['a', 'b'])

In [3]: df.loc[(1,), :]
Out[3]: 
       a    b
3 5  NaN  NaN
  6  NaN  NaN
4 5  NaN  NaN
  6  NaN  NaN

In [4]: df.loc[(1,3), :]
Out[4]: 
     a    b
5  NaN  NaN
6  NaN  NaN

In [5]: df.loc[(1,[3,4]), :]
Out[5]: 
         a    b
1 3 5  NaN  NaN
    6  NaN  NaN
  4 5  NaN  NaN
    6  NaN  NaN

Problem description

Pandas should drop a level when it is indexed with a scalar. This happens in the first two examples, but not in the third (level 1 is not dropped).

This is vaguely related to #10521 and closely related to #12827 (which should be reopened, by the way).

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 6e56195
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.22.0.dev0+275.g6e56195fc
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

@jorisvandenbossche
Copy link
Member

The same is true for series (which makes the example a little bit simpler):

In [114]: s = pd.Series(index=pd.MultiIndex.from_product([[1,2], [3, 4], [5, 6]]))

In [115]: s
Out[115]: 
1  3  5   NaN
      6   NaN
   4  5   NaN
      6   NaN
2  3  5   NaN
      6   NaN
   4  5   NaN
      6   NaN
dtype: float64

In [116]: s.loc[1,]
Out[116]: 
3  5   NaN
   6   NaN
4  5   NaN
   6   NaN
dtype: float64

In [117]: s.loc[1,3]
Out[117]: 
5   NaN
6   NaN
dtype: float64

In [118]: s.loc[1,[3, 4]]
Out[118]: 
1  3  5   NaN
      6   NaN
   4  5   NaN
      6   NaN
dtype: float64

@toobaz
Copy link
Member Author

toobaz commented Jan 1, 2018

This is also related to #10552 , which I had missed.

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Sep 21, 2020
@mroeschke mroeschke added the Bug label Jun 12, 2021
@rhshadrach
Copy link
Member

Would this require a deprecation? I think it would be good to use the future option if so.

@jorisvandenbossche
Copy link
Member

I suppose it does? I don't know how many people will rely on that, but it does change the index of the result, so subsequent indexing might become invalid. It's of course an inconsistecy/bug, but if it so longstanding ..

@rhshadrach rhshadrach added the Deprecate Functionality to remove in pandas label Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

6 participants