Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter of grouped dataframe raises ValueError #4447

Closed
mattemathias opened this issue Aug 2, 2013 · 15 comments · Fixed by #5096
Closed

Filter of grouped dataframe raises ValueError #4447

mattemathias opened this issue Aug 2, 2013 · 15 comments · Fixed by #5096
Labels
Bug Groupby Testing pandas testing functions or related to the test suite
Milestone

Comments

@mattemathias
Copy link

see also #4527

When trying to filter a grouped dataframe with more than 2 columns raises an ValueError:

In [90]: dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)})

In [91]: dff.groupby('B').filter(lambda x: len(x) > 2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-91-89d79df28299> in <module>()
----> 1 dff.groupby('B').filter(lambda x: len(x) > 2)

C:\Anaconda\lib\site-packages\pandas\core\groupby.pyc in filter(self, func, dropna, *args, **kwargs)
   2092                 res = path(group)
   2093
-> 2094             if res:
   2095                 indexers.append(self.obj.index.get_indexer(group.index))
   2096

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


In [93]: pd.__version__
Out[93]: '0.12.0'
@jtratner
Copy link
Contributor

jtratner commented Aug 2, 2013

good catch. Looking into this.

@jtratner
Copy link
Contributor

jtratner commented Aug 3, 2013

So far, I've found that the proximate problem is with the slow_path here, but fixing it for this case causes issues for other tests. Working on it...

@danielballan
Copy link
Contributor

The problem arises with len specifically. For example, count is fine.

In [1]: dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)})

In [2]: dff.groupby('B').filter(lambda x: x.count() > 2)
Out[2]: 
   A  B  C
0  0  a  0
1  1  a  1
2  2  b  2
3  3  b  3
4  4  b  4
5  5  b  5
6  6  c  6
7  7  c  7

@jreback
Copy link
Contributor

jreback commented Aug 21, 2013

@danielballan you prob just need to deal with different return types/shapes from the filtered function;

this is BTD (bug test driven!)

@jreback
Copy link
Contributor

jreback commented Aug 23, 2013

@danielballan fyi... # 4657 will partially fix this (sort of) as it looks at the return value from the filter in a better way

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

@danielballan would be nice to fix for 0.13

@danielballan
Copy link
Contributor

I cannot reproduce the error. I think #4657 changed the path taken, so now the shape of the len result is interpreted correctly. On the current master:

In [19]: pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc'), 'C': np.arange(8)}).groupby('B').filter(lambda x: len(x) > 2)
Out[19]: 
   A  B  C
2  2  b  2
3  3  b  3
4  4  b  4
5  5  b  5

If you can point me toward an extant problem, I will follow up.

If not, close?

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

hmm....maybe add some test cases that return different things in the function to try to break it:

e.g. scalar, series, frame, raise various exceptions, etc

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

and then can close this issue

@danielballan
Copy link
Contributor

OK, sounds good.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

@danielballan tests for this?

@danielballan
Copy link
Contributor

Within a week should be no problem. (Not exactly related, but is there a target for 0.13?)

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

gr8!

not exactly sure of target date...still lots of issues to get thru...few weeks maybe

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

@danielballan ?

@danielballan
Copy link
Contributor

Added four tests in PR #5096 using len on a DataFrame and a Series. Will continue to explore various shapes. I have not discovered any other filter bugs so far. I will switch focus to #4621....

jreback added a commit that referenced this issue Oct 13, 2013
TST: Groupby filter tests involved len, closing #4447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants