Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: GroupBy.idxmax/idxmin with EA dtypes #38733

Merged
merged 11 commits into from
Jan 19, 2021

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jorisvandenbossche thoughts on how/where to handle the skipna kwarg?

@rhshadrach rhshadrach added Groupby ExtensionArray Extending pandas with custom dtypes or arrays. labels Dec 28, 2020
pandas/core/arrays/base.py Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member

thoughts on how/where to handle the skipna kwarg?

We don't yet handle that -> #35178, #33941, #33942.
Also #37924 fixing idxmin/max for Series with EAs is not yet tackling it.

So I think this is something we first need to decide upon / fix for the Series case, before doing this for groupby.

@jorisvandenbossche
Copy link
Member

To be clear, the open issues are mainly about the behaviour for argmin/max with skipna=False, not for idxmin/max (but since idxmin/max is typically implemented based on argmin/max, it's still impacted)

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche do the blockers you mentioned have some momentum to them? i.e. if I want to get this fixed, do i need to go work on those directly or can i just be patient for a bit?

@jbrockmendel
Copy link
Member Author

gentle ping @jorisvandenbossche thoughts on how to make this actionable?

@jorisvandenbossche
Copy link
Member

By looking into those linked issues, and trying to move them forward ..

Now, I think this PR can already fix the non-skipna case in the meantime?

@@ -596,7 +596,7 @@ def argsort(
mask=np.asarray(self.isna()),
)

def argmin(self):
def argmin(self, axis=None, skipna: bool = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The axis keyword shouldn't be added, since it's being ignored. It can be accepted through args/kwargs, and then validated with our numpy compat (+ tests for the numpy compat)

@@ -611,9 +611,12 @@ def argmin(self):
--------
ExtensionArray.argmax
"""
validate_bool_kwarg(skipna, "skipna")
if not skipna and self.isna().any():
return -1
Copy link
Member

@jorisvandenbossche jorisvandenbossche Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those should not return -1, IMO. I would prefer to not put this "wrong" behaviour in the EAs (but keep it in Series were it already has this behaviour), as long as we didn't decide on the preferred behaviour.

Short term, could eg raise an error saying that skipna=False is not yet implemented, but by accepting skipna, it can already fix the default groupby case (see my comment at #38733 (comment))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could eg raise an error saying that skipna=False is not yet implemented

so... like i had in the previous commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... like i had in the previous commit?

Yes, that would be my preference.

(sorry for not being explicit about it, but my earlier comment of "this PR can already fix the non-skipna case in the meantime?" quite well matched the state of the PR, it was mainly to mean: we don't have to solve the skipna=False discussion to merge a PR like this to fix the skipna=True case)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a whatsnew note?

@jbrockmendel
Copy link
Member Author

Add a whatsnew note?

updated

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jbrockmendel
Copy link
Member Author

updated + green

return nargminmax(self, "argmin")

def argmax(self):
def argmax(self, skipna: bool = True) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the Parameters section here (and for argmin)

@@ -628,6 +631,9 @@ def argmax(self):
--------
ExtensionArray.argmin
"""
validate_bool_kwarg(skipna, "skipna")
if not skipna and self.isna().any():
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that hits this (and argmin)

@jbrockmendel
Copy link
Member Author

updated docstrings, not sure about testing for the NotImplementedError cases since that isnt actually desired behavior

@jreback
Copy link
Contributor

jreback commented Jan 19, 2021

updated docstrings, not sure about testing for the NotImplementedError cases since that isnt actually desired behavior

sure but we want to lock it down to make sure that it is not actually changed w/o doing anything about it.

@jbrockmendel
Copy link
Member Author

test added + greenish

@jreback jreback added this to the 1.3 milestone Jan 19, 2021
@jreback jreback merged commit a2db6fd into pandas-dev:master Jan 19, 2021
@jbrockmendel jbrockmendel deleted the bug-gb-idxmax branch January 19, 2021 23:28
nofarm3 pushed a commit to nofarm3/pandas that referenced this pull request Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants