Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

groupby().max() operation removes string columns #2700

Closed
lselector opened this Issue · 2 comments

3 participants

@lselector

While upgrading pandas from 0.7.2 to 0.9.1 we have found that groupby().max() operation now removes non-numeric columns. This broke our code in several places. Workaround is to use groupby().aggregate(np.max).

Here is an example demonstrating the problem:

aa=DataFrame({'nn':[11,11,22,22],'ii':[1,2,3,4],'ss':4*['mama']})
aa.groupby('nn').max()

output on pandas 0.7.2
ii nn ss
nn

11 2 11 mama
22 4 22 mama

output on pandas 0.9.1
ii
nn

11 2
22 4

As you see, object column 'ss' is dropped in new version !!!
This was very un-intuitive.

@wesm
Owner

I'll have a look. It's not obvious that this should work by default on non-numeric data

@aleyan

strings have __lt__() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous.

@wesm wesm closed this in dda2363
@wesm wesm was assigned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.