groupby().max() operation removes string columns #2700

Closed
lselector opened this Issue Jan 15, 2013 · 2 comments

Comments

Projects
None yet
3 participants

While upgrading pandas from 0.7.2 to 0.9.1 we have found that groupby().max() operation now removes non-numeric columns. This broke our code in several places. Workaround is to use groupby().aggregate(np.max).

Here is an example demonstrating the problem:

aa=DataFrame({'nn':[11,11,22,22],'ii':[1,2,3,4],'ss':4*['mama']})
aa.groupby('nn').max()

output on pandas 0.7.2
ii nn ss
nn
11 2 11 mama
22 4 22 mama

output on pandas 0.9.1
ii
nn
11 2
22 4

As you see, object column 'ss' is dropped in new version !!!
This was very un-intuitive.

Owner

wesm commented Jan 16, 2013

I'll have a look. It's not obvious that this should work by default on non-numeric data

aleyan commented Jan 16, 2013

strings have lt() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous.

wesm closed this in dda2363 Jan 20, 2013

wesm was assigned Jan 20, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment