Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: SeriesGroupBy reduction with numeric_only=True #41342

Merged
merged 18 commits into from May 26, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 5, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

This addresses the SeriesGroupBy (easier) half of #41291:

  1. When we have numeric_only=True and non-numeric data in a SeriesGroupBy reduction, raise NotImplementedError (matching Series behavior) instead of falling back to an often-wrong fallback (xref TST: xfail incorrect test_empty_groupby #41341)
  2. When the user doesn't explicitly pass numeric_only=True, change the default to False for SeriesGroupBy, leaving DataFrameGroupBy unaffected.

@jreback
Copy link
Contributor

jreback commented May 6, 2021

if you can rebase this one

@jreback jreback added the Groupby label May 6, 2021
@jreback
Copy link
Contributor

jreback commented May 6, 2021

haven't looked yet

@jbrockmendel
Copy link
Member Author

Rebased+green

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase and a couple of comments

For DataFrameGroupBy we want it to be True (for backwards-compat).
"""
# GH#41291
if numeric_only is lib.no_default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we deprecating the dataframe groupby behavior here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is just a way of having the effective default be different for SeriesGroupBy vs DataFrameGroupBy


def get_result():
if method == "attr":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a super complicated test. can you try to split in followons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@@ -906,6 +906,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrameGroupBy.__getitem__` with non-unique columns incorrectly returning a malformed :class:`SeriesGroupBy` instead of :class:`DataFrameGroupBy` (:issue:`41427`)
- Bug in :meth:`DataFrameGroupBy.transform` with non-unique columns incorrectly raising ``AttributeError`` (:issue:`41427`)
- Bug in :meth:`Resampler.apply` with non-unique columns incorrectly dropping duplicated columns (:issue:`41445`)
- Bug in :meth:`SeriesGroupBy` aggregations incorrectly returning empty :class:`Series` instead of raising ``TypeError`` on aggregations that are invalid for its dtype (:issue:`41342`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example here that a user would understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green

@jreback jreback added this to the 1.3 milestone May 17, 2021
@@ -91,7 +91,10 @@ def test_cython_agg_nothing_to_agg():
frame = DataFrame({"a": np.random.randint(0, 5, 50), "b": ["foo", "bar"] * 25})
msg = "No numeric types to aggregate"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be moved to the relevant section below

pandas/core/groupby/groupby.py Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jbrockmendel
Copy link
Member Author

@jreback i think comments have been addressed; this is a blocker for fixing the analogous DataFrameGroupBy behavior

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can merge on green.

@simonjayhawkins simonjayhawkins merged commit fa8f68e into pandas-dev:master May 26, 2021
@simonjayhawkins
Copy link
Member

Thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants