groupby(group_keys=True) ignored when apply returns unsliced data #8467

Closed
kay1793 opened this Issue Oct 5, 2014 · 5 comments

Comments

Projects
None yet
2 participants

kay1793 commented Oct 5, 2014

I ran into this unexplained behaviour with groupby when using group_keys=True (the default),
it's not clear why using x vs. x[:] causes the group_keys argument to be ignored.

In [86]: df = DataFrame({'key': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    ...:                 'value': range(9)})
    ...: df
Out[86]: 
   key  value
0    1      0
1    1      1
2    1      2
3    2      3
4    2      4
5    2      5
6    3      6
7    3      7
8    3      8

In [87]: df.groupby('key', group_keys=True).apply(lambda x: x[:].key)
Out[87]: 
key   
1    0    1
     1    1
     2    1
2    3    2
     4    2
     5    2
3    6    3
     7    3
     8    3
Name: key, dtype: int64

In [88]: df.groupby('key', group_keys=True).apply(lambda x: x.key)
Out[88]: 
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: key, dtype: int64
Contributor

jreback commented Oct 5, 2014

has nothing to do with group_keys (which is a very odd option anyhow).

Has to do with whether you are returning something that is exactly identical or not

x[:].key is NOT identical to x.key. They are equal, but the 2nd is the exact object, while the first is a copy. These aggregate differently because that's how groupby determines mutation. (e.g. even though you didn't actually mutate, it looks like you are).

Impossible to disambiguate. Nor even sure why you would.

What are you actually trying to do?

kay1793 commented Oct 5, 2014

Actually. I asked why x cased doesn't behave like what you call the mutated case, not the other way round.
Also, I'm returning a series when I get in a dataframe in the apply function. How is x[:2] "mutation"
while x.foo is not?
If it's impossible to "disambiguate" when the return types are different, well. I don't know what you mean.

The docstring for group_keys is:

group_keys : boolean, default True
    When calling apply, add group keys to index to identify pieces

What I'm trying to do is to get the group keys included in the index, when I call apply.

I already have a workaround, It just seemed like a bug. If you're sure this is all just fine, feel free to close.

jreback added this to the 0.15.0 milestone Oct 6, 2014

Contributor

jreback commented Oct 6, 2014

@kay1793 ok this is a bug, fixed by #8484

the mutation issue was a red herring (and shouldn't have affected the results)

jreback closed this in #8484 Oct 6, 2014

kay1793 commented Oct 6, 2014

👍 @jreback

This went from impossible and useless to bug fixed in record time, my whiplash says to tell you: Thanks!

Contributor

jreback commented Oct 6, 2014

hahah

np

I may argue if their is a bug
but when detected they get squashed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment