Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement nlargest and nsmallest for DataFrameGroupBy like SeriesGroupBy #46924

Open
yoch opened this issue May 2, 2022 · 3 comments
Open
Assignees
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Groupby Series Series data structure

Comments

@yoch
Copy link

yoch commented May 2, 2022

The DataFrameGroupBy should get a new method nlargest to allow selecting the N largest rows for each group.

Currently this is doable by using df.sort_values(...).groupby(...).head(n) but I want to use the keep='all' parameter of nlargest which cannot be obtained with head

@yoch yoch added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 2, 2022
@yoch yoch changed the title ENH: implement nlargest for DataFrameGroupBy ENH: implement nlargest and nsmallest for DataFrameGroupBy like SeriesGroupBy May 2, 2022
@yoch
Copy link
Author

yoch commented May 2, 2022

In fact, this can be done exactly in the same way than SeriesGroupBy.nlargest and SeriesGroupBy.nsmallest

@doc(Series.nlargest)
def nlargest(self, n: int = 5, keep: str = "first"):
f = partial(Series.nlargest, n=n, keep=keep)
data = self._obj_with_exclusions
# Don't change behavior if result index happens to be the same, i.e.
# already ordered and n >= all group sizes.
result = self._python_apply_general(f, data, not_indexed_same=True)
return result
@doc(Series.nsmallest)
def nsmallest(self, n: int = 5, keep: str = "first"):
f = partial(Series.nsmallest, n=n, keep=keep)
data = self._obj_with_exclusions
# Don't change behavior if result index happens to be the same, i.e.
# already ordered and n >= all group sizes.
result = self._python_apply_general(f, data, not_indexed_same=True)
return result

@simonjayhawkins simonjayhawkins added Groupby Numeric Operations Arithmetic, Comparison, and Logical operations API - Consistency Internal Consistency of API/Behavior labels May 4, 2022
@lorentzbao
Copy link
Contributor

If still open, I would like to work on this issue.

@lorentzbao
Copy link
Contributor

take

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 18, 2022
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Aug 18, 2022
@lithomas1 lithomas1 added the Series Series data structure label Aug 18, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Groupby Series Series data structure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants