-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd.Series.groupby is said to have only optional parameters #8015
Comments
But we could probably provide a better error message here. Something like "You have to at least specify one of 'by' or 'level' " Current error: |
Thinking about this again, I am a little bit confused: The signature is:
So, or this should actually groupby by the index, or the documentation should be updated. |
Hmmmm. To me this looks much less readable than (something like): df.groupby(df.index).sum() I don't think this being the default is sensible (I think it would lead to confusion/bugs as well as not usually being a regular use case) ... ? Saying that, definitely the error message should be fixed! Perhaps say, groupby a column(s), df.index or ...just (perish the thought) see the groupby docs! |
@hayd yep, I can agree with that, being more explicit is better. But then, I would argue that A use case is to remove duplicates in your index, but indeed, not the most standard thing to use groupby for. |
I would argue that this minor inconsistency (requiring level=0 to be passed, though I'm not sure I even agree it's inconsistent!) is a good compromise rather than allowing drop_duplicates is a bad example (there's function for that), but I know I've done this before for something and it can be useful, it's just infrequently required (which I think is good reason not to use it as the "default" a. you rarely use it b. you'd have to look up what it's doing when you do see it). There's reason to have axis=0 by default, as we are grouping row-wise (be it on index or a column), it doesn't mean groupby the index! That is to say, I like the current behaviour (both are explicit in what they are doing):
and leaving groupby blank should definitely raise
|
@hayd Ah yes, forgot about that meaning of So let's just say then: the docstring of And to further nitpick a bit :-) |
So in reality, you have to specify:
would not allow just a bare, |
This all feels like turtles all the way down :) |
@jreback I think @hayd is right, that
|
@jorisvandenbossche that look right, so |
yep, indeed, the docstring should be clarified a bit on that account. |
closed by #8950 |
Hello! Sorry for bringing back to life this old issue but I find very ugly the current syntax of the
is not straightforward at all. A I'd like to propose this little enhancement, would you accept a PR of this kind? Or, at least, can you explain me the design choice made behind this syntax, please? Thank you and have a nice day! |
so which shall be the answer? explicit is much better than implicit
|
I agree totally on the fact that explicit is better than implicit but, from my point of view, it should not have a different meaning from
if not differently specified. After all
does exactly that, I would simply propose to put this as default behavior. But I'm just speaking from a user, point of view, I don't know almost anything of the details of the implementations, I just wanted to highlight the problem :) If you think it may be a reasonable point of view, I can work on a small PR. |
@ferdas but you are missing the point, we can groupby on index or values (in the Series case) by default, which shall it be? and if you pick one, why? it makes no sense to have a default here, when its not very clear which is the preferred operation. So -1 on any change to this. Furthermore for what you are actually using this,
|
Since the actual behavior ( I see your point but I don't get how it's possible to think that by calling Due to the values-oriented nature of the Well, if I'm the only one that see the good of this proposal, I'll quit it here. Thank you for the suggestion of Good evening! |
In
https://github.com/pydata/pandas/blob/master/pandas/core/generic.py
,by=None
should be corrected to justby
asNone
always leads to an error.See this http://stackoverflow.com/questions/17929426/groupby-for-pandas-series-not-working
The text was updated successfully, but these errors were encountered: