Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: make Series.agg aggregate when possible #53325

Merged
merged 11 commits into from
Jun 7, 2023

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented May 21, 2023

The doc string for Series.agg says: "A passed user-defined-function will be passed a Series for evaluation." which isn't true currently. In addition, Series.agg currently will not always aggregate, when given a aggregation function. This PR fixes those issues.

It can be noted that both the behavior and that doc string are correct for DataFrame.agg, so this PR aligns the behavior of Series.agg and DataFrame.agg.

I'm not sure if this is too drastic as a bug fix, or we want it as part of a deprecation process, but just putting this up there, will see where it goes.

EDIT: In the new version the old behavior is deprecated rather than just removed.

@mroeschke
Copy link
Member

I stumbled upon this in #53208. I think changing this behavior is good but should be deprecated IMO to ensure f actually produces a reduced result

@mroeschke mroeschke added the Apply Apply, Aggregate, Transform label May 22, 2023
@topper-123 topper-123 changed the title BUG: make Series.agg aggregate when possible DEPR: make Series.agg aggregate when possible May 22, 2023
@topper-123
Copy link
Contributor Author

I've made a new version with deprecation.

@topper-123
Copy link
Contributor Author

In the new deprecation version I BTW found out that while DataFrame.agg isn't affected by this PR when given a single callable or a dict of callables, it was actually affected when given a list of callables...

pandas/core/apply.py Outdated Show resolved Hide resolved
pandas/tests/apply/test_frame_apply.py Outdated Show resolved Hide resolved
pandas/tests/apply/test_frame_apply.py Outdated Show resolved Hide resolved
pandas/tests/apply/test_invalid_arg.py Show resolved Hide resolved
pandas/tests/apply/test_series_apply.py Outdated Show resolved Hide resolved
)
tm.assert_frame_equal(result, expected)

# TODO: the result below is wrong, should be fixed (GH53325)
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.agg({"x": foo1}, 0, 3, c=4)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually gave a wrong result in main and in v1.5, which was quite surprising.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach
Copy link
Member

@mroeschke - good here?

# operation is actually defined on the Series, e.g. str
# GH53325: The setup below is just to keep current behavior while emitting a
# deprecation message. In the future this will all be replaced with a simple
# `result = f(self.obj, *self.args, **self.kwargs)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about enforcing this deprecation; will we also check that result is a scalar-like? In theory f could non-reducing lambda *args, **kwargs: obj?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't. Rather, we treat whatever the UDF returns as if it were a scalar. An example where this can be useful:

df = DataFrame({"a": [1, 1, 2, 2, 2], "b": range(5)})
gb = df.groupby("a")
result = gb.agg(list)
print(result)
#            b
# a           
# 1     [0, 1]
# 2  [2, 3, 4]

While I think this is only particularly useful in groupby, for consistency I think we should apply this approach to all agg methods. It has the added benefit of not having to infer what is a scalar and what is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, This just deprecates a path that will never aggregate, but there are still other paths that won't aggregate, like you mention. I agree with @rhshadrach that a sensible path may be to just be very permissive, but OTOH if we want to restrict what we accept for the results fro aggregations, that can always be done later.

@mroeschke mroeschke added this to the 2.1 milestone Jun 7, 2023
@mroeschke mroeschke merged commit 3fbf49d into pandas-dev:main Jun 7, 2023
30 checks passed
@mroeschke
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the series_agg branch June 7, 2023 03:51
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* BUG: make Series.agg aggregate when possible

* fix doc build

* deprecate instead of treating as a bug

* CLN: Apply.agg_list_like

* some cleanups

* REF/CLN: func in core.apply (pandas-dev#53437)

* REF/CLN: func in core.apply

* Remove type-hint

* REF: Decouple Series.apply from Series.agg (pandas-dev#53400)

* update test

* fix issues

* fix issues

* fix issues

---------

Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants