Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby filter can't access all columns #6512

Closed
hayd opened this issue Feb 28, 2014 · 5 comments · Fixed by #6593
Closed

groupby filter can't access all columns #6512

hayd opened this issue Feb 28, 2014 · 5 comments · Fixed by #6593
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Feb 28, 2014

Unexpectedly? don't have access to the grouped columns.

Also seems to allow returning a boolean Series (and takes the first item as the condition).

In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=['A', 'B'])

In [12]: g = df.groupby('A')  # same with as_index=False (which *correctly* has no effect)

In [13]: g.filter(lambda x: x['A'].sum() == 2)
KeyError: u'no item named A'

In [14]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[14]:
   A  B
0  1  2
1  1  3

named A'

In [14]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[14]:
   A  B
0  1  2
1  1  3

In [15]: g.filter(lambda x: x.sum() == 5)  # weird that this works (excepted raise)
Out[15]:
   A  B
0  1  2
1  1  3

In [16]: g = df.groupby(df['A'])  # hack/workaround

In [16]: g.filter(lambda x: x.sum() == 5)  # seems to look at first col
Out[16]:
   A  B
2  5  6

In [17]: g.filter(lambda x: x['A'].sum() == 5)  # works
Out[17]:
   A  B
2  5  6

In [18]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[18]:
   A  B
0  1  2
1  1  3

cc @danielballan

@hayd hayd added the Groupby label Feb 28, 2014
@jreback jreback added the Bug label Feb 28, 2014
@jreback jreback added this to the 0.14.0 milestone Feb 28, 2014
@danielballan
Copy link
Contributor

Woah, this is bad. Very surprised the huge number of tests does not cover this. I'm too busy to dig into this this week, but I can get into it next week. Let me know if you make any progress in the meantime.

@naught101
Copy link

Not entirely sure, but this might be related: http://stackoverflow.com/questions/22139053/grouped-function-between-2-columns-in-a-pandas-dataframe

df = pandas.DataFrame({"Dummy":[1,2]*6, "X":[1,3,7]*4, 
                       "Y":[2,3,4]*4, "group":["A","B"]*6})
df.groupby('group')[:,['X', 'Y']].head(1)

         Dummy  X  Y group
group                     
A     0      1  1  2     A
B     1      2  3  3     B

[2 rows x 4 columns]

@hayd
Copy link
Contributor Author

hayd commented Mar 3, 2014

@naught101 I posted your find as #6524.

I suspect that it's because this syntax is not supported:

g[:,['X', 'Y']]
g[['X', 'Y']]

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

Trivial fix after #6570, use selected_obj rather than objwithexclusions on first line.

@hayd
Copy link
Contributor Author

hayd commented Mar 11, 2014

fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants