Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columns selection after groupby reset group_keys to True #9959

Closed
ruoyu0088 opened this issue Apr 21, 2015 · 3 comments · Fixed by #35444
Closed

columns selection after groupby reset group_keys to True #9959

ruoyu0088 opened this issue Apr 21, 2015 · 3 comments · Fixed by #35444
Labels
Apply Apply, Aggregate, Transform Bug Groupby
Milestone

Comments

@ruoyu0088
Copy link

df = pd.DataFrame({"g":[1, 2, 1, 2, 1], "a":range(5), "b":range(1, 6), "c":range(2, 7)})
g = df.groupby("g", group_keys=False)
print g.group_keys  #False
print g[["a", "b", "c"]].group_keys  #True

So the apply() results are different for these two case:

print g.apply(lambda x:x[:2])

ouput:

   a  b  c  g
0  0  1  2  1
2  2  3  4  1
1  1  2  3  2
3  3  4  5  2

but

print g[["a", "b", "c"]].apply(lambda x:x[:2])

output:

     a  b  c
g           
1 0  0  1  2
  2  2  3  4
2 1  1  2  3
  3  3  4  5
@HereticSK
Copy link

Came across the same problem today. I am surprised that this problem dates back to as far as 2 years ago. Same problem still occurs in the current pandas version.

After poking into the code, I believe the problem is in DataFrameGroupBy._gotitem. It constructs a new GroupBy object and returns it. However, it does not pass the original group_keys to the constructor, so the default group_keys=True is used.

def _gotitem(self, key, ndim, subset=None):
        if ndim == 2:
            if subset is None:
                subset = self.obj
            return DataFrameGroupBy(subset, self.grouper, selection=key,
                                    grouper=self.grouper,
                                    exclusions=self.exclusions,
                                    as_index=self.as_index)
        elif ndim == 1:
            if subset is None:
                subset = self.obj[key]
            return SeriesGroupBy(subset, selection=key,
                                 grouper=self.grouper)

        raise AssertionError("invalid ndim for _gotitem")

Any plan to fix this? I think it would be kind of annoying if the behavior of group_keys=False is not consistent in column selection.

@ron819
Copy link

ron819 commented Aug 8, 2018

@jbrockmendel @mroeschke can you please label this issue? I think it was missed because it wasn't tag.
This problem is indeed annoying.

@mroeschke mroeschke added Groupby Apply Apply, Aggregate, Transform labels Aug 8, 2018
@mroeschke
Copy link
Member

Possibly xref #14927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform Bug Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants