Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby().max() operation removes string columns #2700

Closed
lselector opened this issue Jan 15, 2013 · 2 comments
Closed

groupby().max() operation removes string columns #2700

lselector opened this issue Jan 15, 2013 · 2 comments
Assignees
Milestone

Comments

@lselector
Copy link

While upgrading pandas from 0.7.2 to 0.9.1 we have found that groupby().max() operation now removes non-numeric columns. This broke our code in several places. Workaround is to use groupby().aggregate(np.max).

Here is an example demonstrating the problem:

aa=DataFrame({'nn':[11,11,22,22],'ii':[1,2,3,4],'ss':4*['mama']})
aa.groupby('nn').max()

output on pandas 0.7.2
ii nn ss
nn
11 2 11 mama
22 4 22 mama

output on pandas 0.9.1
ii
nn
11 2
22 4

As you see, object column 'ss' is dropped in new version !!!
This was very un-intuitive.

@wesm
Copy link
Member

wesm commented Jan 16, 2013

I'll have a look. It's not obvious that this should work by default on non-numeric data

@aleyan
Copy link

aleyan commented Jan 16, 2013

strings have lt() defined so the built in min() and max() work on them. If the non-numeric object supports the proper comparison methods, min() and max() aggregate functions should be non-ambiguous.

@wesm wesm closed this as completed in dda2363 Jan 20, 2013
@ghost ghost assigned wesm Jan 20, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants