Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy.apply() throws erroneous ValueError with duplicate axes #35441

Merged
merged 13 commits into from
Aug 6, 2020

Conversation

smithto1
Copy link
Member

@smithto1 smithto1 commented Jul 28, 2020

Behavioural Change
GroupBy.apply would sometimes throw an erroneous ValueError: cannot reindex from duplicate axis because when it checked if the output was mutated from its original shape it would only check the index, when sometimes it should check the columns.

Tests
One new test added on test_apply.py which fails on master because of the bug; passes with this PR. One existing test was marked xfail but the xfail is now removed. One existing test previously had to manipulate the expected index, but that is no longer necessary.

All of the copy-pasteable examples from #16646 are fixed with this PR.

@smithto1 smithto1 changed the title BUG: DataFrame.GroupBy.apply throws erroneous ValueError with duplicate axes BUG: GroupBy.apply() throws erroneous ValueError with duplicate axes Jul 29, 2020
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good - just some minor comments.

pandas/tests/groupby/test_apply.py Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
@rhshadrach rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform labels Jul 30, 2020
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one small detail I missed previously.

pandas/tests/groupby/test_apply.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 1.2 milestone Aug 3, 2020
@jreback
Copy link
Contributor

jreback commented Aug 3, 2020

lgtm. pls merge master and ping on green.

@smithto1
Copy link
Member Author

smithto1 commented Aug 5, 2020

@jreback I think this is ready to merge.

@jreback jreback merged commit 522f855 into pandas-dev:master Aug 6, 2020
@jreback
Copy link
Contributor

jreback commented Aug 6, 2020

thanks @smithto1 very nice

@@ -940,10 +940,6 @@ def test_frame_describe_multikey(tsframe):
groupedT = tsframe.groupby({"A": 0, "B": 0, "C": 1, "D": 1}, axis=1)
result = groupedT.describe()
expected = tsframe.describe().T
expected.index = pd.MultiIndex(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One existing test previously had to manipulate the expected index, but that is no longer necessary.

@smithto1 I don't understand the change to the expected here, can you expand on why the new value is correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the bugfix in this PR, I think .describe() on axis=1 is now encountering the bug reported in #26805, #22848 #22546 where group_keys=True is ignored.

The behaviour of the new code with group_keys=True/False is the same as the old code with group_keys=False.

I'll try to take a look at the 'group_keys' bug; if that is fixed I think this test will probably be reverted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be addressed by #34998 but for the axis=1 case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"cannot reindex from a duplicate axis" when groupby().apply() on MultiIndex columns
4 participants