Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
groupby().max() operation removes string columns #2700
Comments
|
I'll have a look. It's not obvious that this should work by default on non-numeric data |
aleyan
commented
Jan 16, 2013
|
strings have lt() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous. |
wesm
closed this
in dda2363
Jan 20, 2013
wesm
was assigned
Jan 20, 2013
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
lselector commentedJan 15, 2013
While upgrading pandas from 0.7.2 to 0.9.1 we have found that groupby().max() operation now removes non-numeric columns. This broke our code in several places. Workaround is to use groupby().aggregate(np.max).
Here is an example demonstrating the problem:
aa=DataFrame({'nn':[11,11,22,22],'ii':[1,2,3,4],'ss':4*['mama']})
aa.groupby('nn').max()
output on pandas 0.7.2
ii nn ss
nn
11 2 11 mama
22 4 22 mama
output on pandas 0.9.1
ii
nn
11 2
22 4
As you see, object column 'ss' is dropped in new version !!!
This was very un-intuitive.