columns selection after groupby reset group_keys to True #9959

ruoyu0088 · 2015-04-21T00:56:18Z

df = pd.DataFrame({"g":[1, 2, 1, 2, 1], "a":range(5), "b":range(1, 6), "c":range(2, 7)})
g = df.groupby("g", group_keys=False)
print g.group_keys  #False
print g[["a", "b", "c"]].group_keys  #True

So the apply() results are different for these two case:

print g.apply(lambda x:x[:2])

ouput:

   a  b  c  g
0  0  1  2  1
2  2  3  4  1
1  1  2  3  2
3  3  4  5  2

but

print g[["a", "b", "c"]].apply(lambda x:x[:2])

output:

     a  b  c
g           
1 0  0  1  2
  2  2  3  4
2 1  1  2  3
  3  3  4  5

The text was updated successfully, but these errors were encountered:

HereticSK · 2017-09-28T16:50:17Z

Came across the same problem today. I am surprised that this problem dates back to as far as 2 years ago. Same problem still occurs in the current pandas version.

After poking into the code, I believe the problem is in DataFrameGroupBy._gotitem. It constructs a new GroupBy object and returns it. However, it does not pass the original group_keys to the constructor, so the default group_keys=True is used.

def _gotitem(self, key, ndim, subset=None):
        if ndim == 2:
            if subset is None:
                subset = self.obj
            return DataFrameGroupBy(subset, self.grouper, selection=key,
                                    grouper=self.grouper,
                                    exclusions=self.exclusions,
                                    as_index=self.as_index)
        elif ndim == 1:
            if subset is None:
                subset = self.obj[key]
            return SeriesGroupBy(subset, selection=key,
                                 grouper=self.grouper)

        raise AssertionError("invalid ndim for _gotitem")

Any plan to fix this? I think it would be kind of annoying if the behavior of group_keys=False is not consistent in column selection.

ron819 · 2018-08-08T06:46:17Z

@jbrockmendel @mroeschke can you please label this issue? I think it was missed because it wasn't tag.
This problem is indeed annoying.

mroeschke · 2018-08-08T16:04:52Z

Possibly xref #14927

mroeschke added Groupby Apply Apply, Aggregate, Transform labels Aug 8, 2018

mroeschke added the Bug label Jun 28, 2020

This was referenced Jul 29, 2020

BUG: Attributes of DataFrameGroupBy when subset to a series #35443

Closed

BUG: Attributes are lost when subsetting columns in groupby #35444

Merged

jreback added this to the 1.2 milestone Aug 6, 2020

arw2019 mentioned this issue Aug 15, 2020

BUG: slicing DataFrameGroupBy to SeriesGroupBy doesn't propagate dropna #35745

Closed

3 tasks

jreback closed this as completed in #35444 Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

columns selection after groupby reset group_keys to True #9959

columns selection after groupby reset group_keys to True #9959

ruoyu0088 commented Apr 21, 2015

HereticSK commented Sep 28, 2017

ron819 commented Aug 8, 2018 •

edited

Loading

mroeschke commented Aug 8, 2018

columns selection after groupby reset group_keys to True #9959

columns selection after groupby reset group_keys to True #9959

Comments

ruoyu0088 commented Apr 21, 2015

HereticSK commented Sep 28, 2017

ron819 commented Aug 8, 2018 • edited Loading

mroeschke commented Aug 8, 2018

ron819 commented Aug 8, 2018 •

edited

Loading