Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Filter of grouped dataframe raises ValueError #4447

Closed
mattemathias opened this Issue Aug 2, 2013 · 15 comments

Comments

Projects
None yet
4 participants

see also #4527

When trying to filter a grouped dataframe with more than 2 columns raises an ValueError:

In [90]: dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)})

In [91]: dff.groupby('B').filter(lambda x: len(x) > 2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-91-89d79df28299> in <module>()
----> 1 dff.groupby('B').filter(lambda x: len(x) > 2)

C:\Anaconda\lib\site-packages\pandas\core\groupby.pyc in filter(self, func, dropna, *args, **kwargs)
   2092                 res = path(group)
   2093
-> 2094             if res:
   2095                 indexers.append(self.obj.index.get_indexer(group.index))
   2096

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


In [93]: pd.__version__
Out[93]: '0.12.0'
Contributor

jtratner commented Aug 2, 2013

good catch. Looking into this.

Contributor

jtratner commented Aug 3, 2013

So far, I've found that the proximate problem is with the slow_path here, but fixing it for this case causes issues for other tests. Working on it...

Contributor

danielballan commented Aug 21, 2013

The problem arises with len specifically. For example, count is fine.

In [1]: dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)})

In [2]: dff.groupby('B').filter(lambda x: x.count() > 2)
Out[2]: 
   A  B  C
0  0  a  0
1  1  a  1
2  2  b  2
3  3  b  3
4  4  b  4
5  5  b  5
6  6  c  6
7  7  c  7
Contributor

jreback commented Aug 21, 2013

@danielballan you prob just need to deal with different return types/shapes from the filtered function;

this is BTD (bug test driven!)

Contributor

jreback commented Aug 23, 2013

@danielballan fyi... # 4657 will partially fix this (sort of) as it looks at the return value from the filter in a better way

Contributor

jreback commented Sep 20, 2013

@danielballan would be nice to fix for 0.13

Contributor

danielballan commented Sep 20, 2013

I cannot reproduce the error. I think #4657 changed the path taken, so now the shape of the len result is interpreted correctly. On the current master:

In [19]: pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)}).groupby('B').filter(lambda x: len(x) > 2)
Out[19]: 
   A  B  C
2  2  b  2
3  3  b  3
4  4  b  4
5  5  b  5

If you can point me toward an extant problem, I will follow up.

If not, close?

Contributor

jreback commented Sep 20, 2013

hmm....maybe add some test cases that return different things in the function to try to break it:

e.g. scalar, series, frame, raise various exceptions, etc

Contributor

jreback commented Sep 20, 2013

and then can close this issue

Contributor

danielballan commented Sep 20, 2013

OK, sounds good.

Contributor

jreback commented Sep 26, 2013

@danielballan tests for this?

Contributor

danielballan commented Sep 26, 2013

Within a week should be no problem. (Not exactly related, but is there a target for 0.13?)

Contributor

jreback commented Sep 26, 2013

gr8!

not exactly sure of target date...still lots of issues to get thru...few weeks maybe

Contributor

jreback commented Oct 2, 2013

Contributor

danielballan commented Oct 3, 2013

Added four tests in PR #5096 using len on a DataFrame and a Series. Will continue to explore various shapes. I have not discovered any other filter bugs so far. I will switch focus to #4621....

@jreback jreback closed this in #5096 Oct 13, 2013

jreback added a commit that referenced this issue Oct 13, 2013

Merge pull request #5096 from danielballan/filter-len-test
TST: Groupby filter tests involved len, closing #4447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment