Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate as_index in groupby #35860

Open
rhshadrach opened this issue Aug 22, 2020 · 8 comments
Open

DEPR: Deprecate as_index in groupby #35860

rhshadrach opened this issue Aug 22, 2020 · 8 comments
Labels
Deprecate Functionality to remove in pandas Groupby Needs Discussion Requires discussion from core team before further action

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Aug 22, 2020

Should we deprecate using as_index in groupby? As the docs say, using as_index=False should only impact reductions. Here, my expectation is that a line such as

df.groupby(keys, as_index=False).op(...)

should always be equivalent to

df.groupby(keys, as_index=True).op(...).reset_index()

It seems quite easy for the user to do so if they desire, and this would simplify the groupby internals.

@rhshadrach rhshadrach added Bug Groupby Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Aug 22, 2020
@rhshadrach
Copy link
Member Author

cc @jbrockmendel, @WillAyd, @jreback Any thoughts here?

@jbrockmendel
Copy link
Member

this would simplify the groupby internals

Do you have a qualitative estimate of how much simplification we'd get? I'm definitely open to the idea in principal.

@jreback
Copy link
Contributor

jreback commented Aug 27, 2020

in principal this makes groupby simpler but then again it makes a slightly more verbose syntax and to be honest i wish we wouldn't actually set the index and just make it a column

so either way this is a rather big change

so -0 on deprecating

@rhshadrach
Copy link
Member Author

@jbrockmendel Not sure what kind of qualitative estimate one can give here; here are some that I think are on the more quantitative side. I count 13 instances of branching created within core.groupby that depend solely on as_index; not much code itself (27 lines). Also, the method _insert_inaxis_grouper_inplace could be removed. Of course, it doubles the testing when it comes to reduction methods, plus the additional testing that it does not impact non-reductions. There is also the ambiguity on what the behavior should be when it comes to a UDF and a Series/DataFrame with unique groups (is it a reduction or a transform?), but this will continue to impact other behaviors of groupby and so I don't think it should be really considered.

@jreback If we were to leave the groups as columns, I think that would necessitate the index being a RangeIndex, is that right? I personally utilize meaningful indexes in my workflows and lean heavily on pandas's alignment when combining results from various computations, which is why I find as_index=True behavior most useful. But of course I understand if others' workflows/uses are different.

@jorisvandenbossche
Copy link
Member

I agree that as_index=False and .reset_index() are basically the same, and thus it's not really needed to provide both ways.

But on the other hand, I think that as_index=False is also quite massively used. So I am personally not sure it is worth disrupting their workflow and having all those users need to update their code for something relatively minor.
I mean, from a user point of view, it's not that we consider it "harmful" or bad practice code, I think, so for me that gives less reason to actually deprecate.

@rhshadrach
Copy link
Member Author

But on the other hand, I think that as_index=False is also quite massively used.

Thanks @jorisvandenbossche. Working on some of the as_index=False bugs, I got the sense that perhaps it isn't used all that often, and this was especially the case when I saw some others express that it should be removed in an issue. @jreback's and your responses, along with some searching on SO, have me convinced otherwise.

Going to close this issue in a week if there is no futher discussion.

@rhshadrach rhshadrach removed the Needs Discussion Requires discussion from core team before further action label Sep 5, 2020
@WillAyd
Copy link
Member

WillAyd commented Oct 19, 2020

Reopening as I think this is worthwhile to do.

@WillAyd WillAyd reopened this Oct 19, 2020
@mroeschke mroeschke removed the Bug label Aug 10, 2021
@rhshadrach rhshadrach added the Needs Discussion Requires discussion from core team before further action label Aug 15, 2022
@jbrockmendel
Copy link
Member

+1. The groupby tests are made really cumbersome by the combination of options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Groupby Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants