Can be accomplished in a hackish way using apply, but a more structured approach would be nice
7/12/2012: Not sure what I was intending with this one
This would be quite useful.
For example, if for an address book, data.groupby('city') lists 1000 cities, and we want those with over 100 entries, would be useful to be able to say something like:
grouped = data.groupby('city')
grouped.filter(grouped.size() > 100)
... and then compute on just that subset.
FYI : without knowing about this open issue, I stumbled upon the same cleaning requirement. Would be nice to have this in pandas but for now I was able to move fwd.
Wesley: Can you please help me with the apply hack ? I still cant seem to filter grouped data. More details on the stackoverflow post.link above. Thanks! -Abhi
Probably bad issue etiquette but just wanted to add my +1 for this enhancement. I grouped my data by year (hydrologic water year actually) and then wanted to remove years with less than 365 days of data. I used the stackoverflow answer of pandas.concat() to work around it. But that is pretty ugly.
I agree with sanand0 that grouped.filter() would be easiest. Another possibility would be to add a 'drop()' function to a DataFrameGroupBy object. This would allow a loop over len(group.groups[name]).
another somewhat related reference: http://stackoverflow.com/questions/13446480/python-pandas-remove-entries-based-on-the-number-of-occurrences
another +1. I can't quite figure out what the best way to work around it is, in a generic way. Which of the SO answers do you recommend Wesley?
closed via #3680