groupby().max() operation removes string columns #2700

Closed
lselector opened this Issue Jan 15, 2013 · 2 comments

Comments

Projects
None yet
3 participants

While upgrading pandas from 0.7.2 to 0.9.1 we have found that groupby().max() operation now removes non-numeric columns. This broke our code in several places. Workaround is to use groupby().aggregate(np.max).

Here is an example demonstrating the problem:

aa=DataFrame({'nn':[11,11,22,22],'ii':[1,2,3,4],'ss':4*['mama']})
aa.groupby('nn').max()

output on pandas 0.7.2
ii nn ss
nn
11 2 11 mama
22 4 22 mama

output on pandas 0.9.1
ii
nn
11 2
22 4

As you see, object column 'ss' is dropped in new version !!!
This was very un-intuitive.

Owner

wesm commented Jan 16, 2013

I'll have a look. It's not obvious that this should work by default on non-numeric data

aleyan commented Jan 16, 2013

strings have lt() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous.

@wesm wesm closed this in dda2363 Jan 20, 2013

@ghost ghost assigned wesm Jan 20, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment