Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: new .agg for list-likes #43736

Closed
wants to merge 47 commits into from

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Sep 24, 2021

Part of #43678

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them

Replaces the list-like implementation for .agg used by Series, DataFrame, SeriesGroupBy, DataFrameGroupBy, Window, and Resample. This will transpose agg results, and swap multiindex levels as well as the order from within the levels. Adds the new implementation behind the option "new_udf_methods".

Docs: future_udf_behavior.zip

@attack68
Copy link
Contributor

just a comment - I have always tried to avoid names like "new_routine" because at some point, it won't be, and you wished you never called it that in the first place! :)

@rhshadrach
Copy link
Member Author

rhshadrach commented Sep 25, 2021

@attack68 - The plan is to put all usage of these methods behind the option "new_udf_methods"; when the old are deprecated and then removed, they will be renamed by dropping all the "new_" prefixes. If you have any suggestions for different names, please do make them!

@attack68
Copy link
Contributor

no, no suggestions. But, and I kid you not, in my line of work I have seen swap_pricer(), new_swap_pricer(), new_swap_pricer2() and new_new_swap_pricer() - imagine explaining that to new recruits!

@mroeschke
Copy link
Member

Agreed, with @attack68. Helpful to have a slightly clearer naming even during the transition period. Maybe shared_list_like if the plan is for method to be agg, transform, etc?

@rhshadrach
Copy link
Member Author

rhshadrach commented Sep 25, 2021

@mroeschke and @attack68 - is there opposition to the "new_udf_methods" option as well, or is it just on the method names themselves? I am not a particular fan of "new" myself, but I could not think of anything concise and better. Regardless of the name that is picked, I would like the relationship between the option name and the methods that are behind that option name to be clear.

Edit: could go with "experimental" and prefix the methods with that?

Edit: "experimental" is a bad choice because there will be a period where the default for this option is True and thus it is no longer experimental.

@mroeschke
Copy link
Member

I would like the relationship between the option name and the methods that are behind that option name to be clear

+1 I would hope these are consistent.

Maybe future_udf_behavior if the idea is that this will be the default some day?

@rhshadrach
Copy link
Member Author

Thanks @mroeschke - I like "future".

…w_udfs_list_agg

� Conflicts:
�	pandas/tests/resample/test_resample_api.py
@rhshadrach rhshadrach mentioned this pull request Sep 25, 2021
3 tasks
@@ -408,6 +414,77 @@ def agg_list_like(self) -> DataFrame | Series:
)
return concatenated.reindex(full_ordered_index, copy=False)

def future_list_single_arg(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same request from the other PR re naming; "future" won't be very helpful for a reader a year from now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to have "future_" methods alongside the current methods; all with the same prefix so they are easy to identify. Any such method is behind the option "future_udf_behavior" meaning they will only be called when set to True. Assuming we do end up going forward with this new (experimental) behavior, once it is in a good place we deprecate the option and then remove the option.

A year from now, we will still have the option "future_udf_behavior", and in my opinion, the "future_" prefix is meaningful and helpful - namely in its connection to this option. It is also the (intended) future behavior of the methods. When the option is removed, the "old" methods are removed and the "future_" methods are renamed by removing the prefix (none are public).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've clearly given this more thought than i have so im going to stop complaining about this, will instead grumble to myself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is still up for improvements and suggestions are most welcome, but wanted to explain why I felt "future" was appropriate.

rhshadrach and others added 16 commits October 9, 2021 16:04
…w_udfs_list_agg

� Conflicts:
�	pandas/core/groupby/generic.py
�	pandas/core/groupby/groupby.py
�	pandas/tests/apply/test_frame_apply.py
�	pandas/tests/groupby/test_groupby.py
* Fix dtypes for read_json

* Address comments

* Add whatsnew entry

* Update doc/source/whatsnew/v1.4.0.rst

Co-authored-by: Matthew Zeitlin <37011898+mzeitlin11@users.noreply.github.com>

* Linting

Co-authored-by: Matthew Zeitlin <37011898+mzeitlin11@users.noreply.github.com>
@rhshadrach
Copy link
Member Author

rhshadrach commented Oct 18, 2021

I've added docs (built attachment in OP) summarizing the changes here.

…w_udfs_list_agg

� Conflicts:
�	pandas/core/groupby/generic.py
�	pandas/core/groupby/groupby.py
�	pandas/tests/apply/test_frame_apply.py
�	pandas/tests/groupby/test_groupby.py
 into new_udfs_list_agg

� Conflicts:
�	pandas/tests/apply/test_frame_apply.py
�	pandas/tests/groupby/aggregate/test_aggregate.py
�	pandas/tests/groupby/aggregate/test_other.py
�	pandas/tests/groupby/test_groupby.py
�	pandas/tests/resample/test_resample_api.py
@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 8, 2021
@rhshadrach rhshadrach closed this Dec 11, 2021
@rhshadrach rhshadrach deleted the new_udfs_list_agg branch September 10, 2022 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet