Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix groupby std() with nullable dtypes #37433

Merged

Conversation

jorisvandenbossche
Copy link
Member

Fixes #37415

@jorisvandenbossche jorisvandenbossche added this to the 1.1.4 milestone Oct 26, 2020
@jorisvandenbossche jorisvandenbossche added Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Oct 26, 2020
@jorisvandenbossche
Copy link
Member Author

At some point we need to properly support nullable dtypes in the cython groupby operations (by passing+using the mask), but for now this ensure it at least works (although performance will not be ideal)

@@ -277,3 +277,38 @@ def test_read_only_buffer_source_agg(agg):
expected = df.copy().groupby(["species"]).agg({"sepal_length": agg})

tm.assert_equal(result, expected)


@pytest.mark.parametrize(
Copy link
Member

@simonjayhawkins simonjayhawkins Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use all_numeric_reductions fixture (although slightly different) doesn't include sem or count but does include kurt and skew

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kurt is not implemented for groupby, it seems. I mainly copied this from another test in this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah can you create an issue for folks to replace these with fixtures (good first)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorisvandenbossche lgtm pending green.

(Did try to bisect to be sure of changes that caused regression but 1.0.0.rc raised TypeError: cannot safely cast non-equivalent float64 to int32)

@simonjayhawkins
Copy link
Member

(Did try to bisect to be sure of changes that caused regression but 1.0.0.rc raised TypeError: cannot safely cast non-equivalent float64 to int32)

rerun bisect with updated code sample for runner see #37415 (comment), so fix seems reasonable

@jorisvandenbossche
Copy link
Member Author

Ah, yes, sorry I should have mentioned, but I already narrowed it down to that PR. Thanks anyway!

Ci failure is unrelated I think

df2 = df.assign(B=df["B"].astype("float64"))
expected = getattr(df2.groupby("A")["B"], op_name)()

if op_name != "count":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah also an issue to fix this up (I think we have another issue about this) as this is slightly tricky, e.g. are we returning a nullable for the count itself (I don't know if we ever decided that)

@jreback jreback merged commit c642fda into pandas-dev:master Oct 27, 2020
@jreback
Copy link
Contributor

jreback commented Oct 27, 2020

thanks @jorisvandenbossche

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 27, 2020
simonjayhawkins pushed a commit that referenced this pull request Oct 27, 2020
…able dtypes) (#37444)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche deleted the fix-groupby-std-nullable branch October 29, 2020 19:08
@jorisvandenbossche
Copy link
Member Author

For follow-ups, I opened #37493 for general "native" support for masked arrays in groupby algos, and #37494 for the specific issue to use the correct dtype for the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
3 participants