Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy.value_counts sorting order #56016

Merged
merged 5 commits into from
Nov 17, 2023

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Groupby Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Nov 17, 2023
@pytest.mark.parametrize("normalize", [True, False])
def test_value_counts_sort(sort, vc_sort, normalize):
# GH#55951
df = DataFrame({"a": [2, 1, 1, 1], 0: [3, 4, 3, 3]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also tests with Categorical?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - good call. I added a test and found a bug calling .sortlevel() in the categorical section. But the categorical behavior is still buggy here; namely

multi_index, _ = MultiIndex.from_product(
levels_list, names=[ping.name for ping in groupings]
).sortlevel()

is constructing the index by taking the cartesian product across all groupings; it should only be doing this for the categorical variables. That's not an easy fix, but will be handled in #55738 where we can just remove that whole block of code.

@mroeschke mroeschke added this to the 2.2 milestone Nov 17, 2023
@mroeschke mroeschke merged commit dbf8aaf into pandas-dev:main Nov 17, 2023
40 checks passed
@mroeschke
Copy link
Member

Thanks @rhshadrach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: DataFrame.value_counts(sort=False) sorts
2 participants