Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mi.drop(x).get_loc_level(x) returns empty slice (rather than raising KeyError) #22221

Closed
HuntJSparra opened this issue Aug 6, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@HuntJSparra
Copy link

commented Aug 6, 2018

Code Sample

import pandas as pd

df = pd.DataFrame(dict(value=[0, 1], group=['filled','empty']))
groups = df.groupby('group')

def remove1 (group):
	return group[group.value != 1]['value']
trimmed = groups.apply(remove1)


print("Trimmed.loc['empty']:\n",trimmed.loc['empty'])

Problem description

Since Pandas 0.23, if a group (ex. 'empty') is emptied by GroupBy.apply(), then accessing the result (ex. trimmed) through trimmed.loc['empty'] returns the error KeyError: 'the label [empty] is not in the [index]' (full output below). This is possible related to #21624.

Despite what the error says, printing trimmed.indexes shows that 'empty' is a valid index:

MultiIndex(levels=[['empty', 'filled'], [0, 2]],
           labels=[[1, 1], [0, 1]],
           names=['group', None])

Full Output:

Traceback (most recent call last):
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1790, in _validate_key
    error()
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1785, in error
    axis=self.obj._get_axis_name(axis)))
KeyError: 'the label [empty] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandasExample.py", line 14, in <module>
    print("Trimmed.loc['empty']:\n",trimmed.loc['empty'])
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1911, in _getitem_axis
    self._validate_key(key, axis)
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1798, in _validate_key
    error()
  File ".../python3.6/site-packages/pandas/core/indexing.py", line 1785, in error
    axis=self.obj._get_axis_name(axis)))
KeyError: 'the label [empty] is not in the [index]'

Expected Output

Prior to 0.23, an empty Series would be returned.

0.22.0 output:

Trimmed.loc['empty']:
 Series([], Name: value, dtype: int64)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.3
pytest: 3.6.4
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.6
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

Can you try on master? This seemed to be working for me.

@HuntJSparra

This comment has been minimized.

Copy link
Author

commented Aug 6, 2018

Thanks for the quick response! Am I going to need to clone and build locally, or is the master on Conda or PIP?

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

0.23.4 was just released so you can try upgrading to that and see if the issue is there (I have not personally), but if not you would clone and build

@HuntJSparra

This comment has been minimized.

Copy link
Author

commented Aug 6, 2018

I upgraded to 0.23.4 and had the same issue. However, it is fixed on master.

@WillAyd WillAyd added Needs Discussion and removed Needs Info labels Aug 6, 2018

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

Thanks for confirming. Good to know its "fixed," though I'm actually not sure if empty really should be part of the resulting indexing given there is nothing being returned for it as part of the apply operation. If so then we could close this out with a test case to ensure this behavior remains in place, but if not we may want to rethink what we allow here.

@jreback any thoughts?

@HuntJSparra

This comment has been minimized.

Copy link
Author

commented Aug 6, 2018

In the original program where I encountered the error, GroupBy.apply() was being used to reduce a dataset to its outliers, which might or might not exist for a given group. Reverting to the 0.23.0-0.23.4 behavior would mean that I would have to implement a try-catch when going back through the now-trimmed outlier dataset since its MultiIndex still shows the indexes for emptied groups. If the MultiIndex returned by GroupBy.apply() was changed to reflect the behavior, I would then be required to check against the MultiIndex, still adding additional code/complexity.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

Understood on the specific problem you are trying to solve but I'm not aware of any other places where this type of behavior is encouraged in the code-base, i.e. it may be non-idiomatic.

@toobaz may have some thoughts on this as well

@toobaz

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

I upgraded to 0.23.4 and had the same issue. However, it is fixed on master.

I'm afraid the fact that this works on master is the actual bug. This is related to #19027 .

@HuntJSparra I can understand what you have in mind, but you are simply looking in an index for a value which is not there, and this would only work in the past - and works again in master - because of an implementation glitch. To be honest I don't even see the try-catch as the right way to go: since some groups just don't have outliers, you might want to loop over trimmed.index.unique(level='group') rather than on all original groups.

In any case, we must understand what happened in master that made this work again.

@toobaz toobaz added Regression and removed Groupby labels Aug 7, 2018

@toobaz toobaz changed the title 0.23 GroupBy.apply().loc() Empty Group Crash mi.drop(x).get_loc_level(x) returns empty slice (rather than raising KeyError) Aug 7, 2018

@toobaz

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

Simpler way to reproduce:

In [2]: pd.MultiIndex.from_product([[0, 1], [2, 3]]).drop(1).get_loc_level(1)
Out[2]: (slice(2, 2, None), Int64Index([], dtype='int64'))

toobaz added a commit to toobaz/pandas that referenced this issue Aug 7, 2018

toobaz added a commit to toobaz/pandas that referenced this issue Aug 7, 2018

@jreback jreback added this to the 0.24.0 milestone Aug 7, 2018

jreback added a commit that referenced this issue Aug 9, 2018

BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label (
…#22230)

* BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label

closes #22221

* TST: test groupby.apply() with user-defined function returning an empty chunk

* CLN: remove named lambda

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this issue Aug 12, 2018

BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label (
…pandas-dev#22230)

* BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label

closes pandas-dev#22221

* TST: test groupby.apply() with user-defined function returning an empty chunk

* CLN: remove named lambda
@choma1

This comment has been minimized.

Copy link

commented Aug 13, 2018

Why was returning a KeyError here determined to be preferable to returning an empty DataFrame?

@toobaz

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

Why was returning a KeyError here determined to be preferable to returning an empty DataFrame?

Because the label is not in the index/level, and .loc will raise KeyError when the items are not found

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label (
…pandas-dev#22230)

* BUG: raise KeyError if MultiIndex.get_loc_level is asked unused label

closes pandas-dev#22221

* TST: test groupby.apply() with user-defined function returning an empty chunk

* CLN: remove named lambda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.