Add filtering capability to GroupBy #919

wesm opened this Issue Mar 15, 2012 · 7 comments


None yet

6 participants

wesm commented Mar 15, 2012

Can be accomplished in a hackish way using apply, but a more structured approach would be nice

7/12/2012: Not sure what I was intending with this one

sanand0 commented Oct 13, 2012

This would be quite useful.

For example, if for an address book, data.groupby('city') lists 1000 cities, and we want those with over 100 entries, would be useful to be able to say something like:

grouped = data.groupby('city')
grouped.filter(grouped.size() > 100)

... and then compute on just that subset.

apratap commented Oct 31, 2012

FYI : without knowing about this open issue, I stumbled upon the same cleaning requirement. Would be nice to have this in pandas but for now I was able to move fwd.


apratap commented Nov 1, 2012

Wesley: Can you please help me with the apply hack ? I still cant seem to filter grouped data. More details on the stackoverflow above. Thanks! -Abhi


Probably bad issue etiquette but just wanted to add my +1 for this enhancement. I grouped my data by year (hydrologic water year actually) and then wanted to remove years with less than 365 days of data. I used the stackoverflow answer of pandas.concat() to work around it. But that is pretty ugly.

I agree with sanand0 that grouped.filter() would be easiest. Another possibility would be to add a 'drop()' function to a DataFrameGroupBy object. This would allow a loop over len(group.groups[name]).


another +1. I can't quite figure out what the best way to work around it is, in a generic way. Which of the SO answers do you recommend Wesley?

@jreback jreback closed this in #3680 Jun 6, 2013
jreback commented Jun 6, 2013

closed via #3680

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment