Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'mode' not recognized by df.groupby().agg(), but pd.Series.mode works #11562

Closed
patricksurry opened this issue Nov 9, 2015 · 7 comments
Closed
Labels
Apply Apply, Aggregate, Transform Duplicate Report Duplicate issue or pull request Enhancement Groupby

Comments

@patricksurry
Copy link

This works:

df = pd.DataFrame({'A': [1, 2, 1, 2, 1, 2, 3], 'B': [1, 1, 1, 2, 2, 2, 2]})
df.groupby('B').agg(pd.Series.mode)

but this doesn't:

df.groupby('B').agg('mode')

...
AttributeError: Cannot access callable attribute 'mode' of 'DataFrameGroupBy' objects, try using the 'apply' method

I thought all the series aggregate methods propagated automatically to groupby, but I've probably misunderstood?

@patricksurry
Copy link
Author

Hmm, I guess this might be because pd.Series.mode() returns a series, not a scalar. So maybe I need my own mode that decides how to handle the multi-modal case, e.g. pd.Series.mode().mean() or whatever?

@TomAugspurger
Copy link
Contributor

might be because pd.Series.mode() returns a series, not a scalar

Correct. IIRC there's an older issue about this, where we decided to keep our behavior of always returning a series, and not adding a flag to reduce if possible. I could be misremembering though.

In these cases I'll usually just use scipy's

df.groupby('B').agg(lambda x: scipy.stats.mode(x)[0])

scipy.stats.mode returns a tuple of (mode, count) and we just want the mode.

@jreback
Copy link
Contributor

jreback commented Jul 26, 2016

Here's a mini-example; could be like .value_counts()

In [6]: df = DataFrame({'A' : [1,2,1,2], 'B' : [1,1,1,1]})

In [7]: df
Out[7]: 
   A  B
0  1  1
1  2  1
2  1  1
3  2  1

In [8]: df.groupby('A').B.value_counts()
Out[8]: 
A  B
1  1    2
2  1    2
Name: B, dtype: int64

In [9]: df.groupby('A').B.apply(lambda x: x.mode())
Out[9]: 
A   
1  0    1
2  0    1
Name: B, dtype: int64

@jreback jreback added this to the Next Major Release milestone Jul 26, 2016
@kernc
Copy link
Contributor

kernc commented Apr 7, 2017

What about when grouping Series?

I have no issue with .agg('mode') returning the first mode, if any, while issuing a warning if the modes were multuple.

@gosuto-inzasheru
Copy link

gosuto-inzasheru commented Oct 9, 2019

I encountered this problem and ended up settling for this:
.agg({'column': lambda x: pd.Series.mode(x)[0][0]})

But yes, I agree with @kernc, I would not mind .agg('mode') returning the first mode if multiple modes are returned.

@mroeschke
Copy link
Member

xref #19254

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel added the Apply Apply, Aggregate, Transform label Feb 11, 2023
@rhshadrach
Copy link
Member

Closing as a duplicate of #19254

@rhshadrach rhshadrach added the Duplicate Report Duplicate issue or pull request label Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Duplicate Report Duplicate issue or pull request Enhancement Groupby
Projects
None yet
Development

No branches or pull requests

8 participants