BUG: DataFrameGroupBy.getitem should warn on tuple of length 1 #36302

rhshadrach · 2020-09-12T03:29:39Z

Currently warnings are only emitted when the length of a tuple is greater than 1. From this comment in PR where the deprecation was implemented, there was some confusion as to the behavior of df.groupby('a')['b'] vs df.groupby('a')[('b', )]. The former is the SeriesGroupBy that the comment refers to where key is a string, the latter should still be deprecated.

The text was updated successfully, but these errors were encountered:

rhshadrach · 2020-09-12T03:54:28Z

I didn't notice that in the whatsnew of PR #30546 there are the lines:

    # single key, returns SeriesGroupBy
    g['B']

    # tuple of single key, returns SeriesGroupBy
    g[('B',)]

This makes me think that perhaps it wasn't due to any confusion. However, I don't see any mention of tuples of length one in the original issue #23566 and allowing only tuples of length one seems odd to me.

@jorisvandenbossche Any thoughts?

rhshadrach · 2020-09-26T14:12:22Z

This is also inconsistent with DataFrame, e.g.

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df[('b',)]

raises KeyError: ('b', )

tehunter · 2024-04-16T15:35:14Z

Is there any interest in making the DataGrameGroupBy.__getitem__ selection mirror the DataFrame.__getitem__ selection? In my mind, there should be some parity like so:

df = pd.DataFrame(...)
df_gb = df.groupby("A")

# Passing a tuple returns the Series for the column matching that MultiIndex tuple representation
s: pd.Series = df[("B", "1")]
# Passing a tuple *should* return the SeriesGroupBy for the column matching that MultiIndex tuple representation
s_gb: pd.SeriesGroupBy = df_gb[("B", "1")]
# This should be equivalent
s_gb_equiv: pd.SeriesGroupBy = s.groupby(df["A"])

Currently, it doesn't seem like there's really any way to reduce a DataFrameGroupBy to a SeriesGroupBy if the dataframe has a MultiIndex, besides applying a squeeze() to the result.

rhshadrach · 2024-04-16T21:13:22Z

I think so @tehunter. E.g. this works just fine:

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]}).set_index("a")
df.columns = pd.MultiIndex.from_tuples([("b",)])
df.groupby("a")[("b",)].sum()

and will continue to work fine under your proposed behavior. We still need to deprecate the case where you are passing a tuple and the columns are not a MultiIndex nor tuples, so this issue needs to remain, but we can enable your desired behavior without waiting for that deprecation. Could you open up a separate issue?

tehunter · 2024-04-16T21:51:14Z

Thanks, just opened #58282

rhshadrach added Bug Groupby labels Sep 12, 2020

rhshadrach self-assigned this Sep 12, 2020

rhshadrach added the Needs Discussion Requires discussion from core team before further action label Sep 26, 2020

rhshadrach removed their assignment Dec 12, 2020

tehunter mentioned this issue Apr 16, 2024

BUG: DataFrameGroupBy.__getitem__ fails with tuples on multi-level column objects #58282

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupBy.getitem should warn on tuple of length 1 #36302

BUG: DataFrameGroupBy.getitem should warn on tuple of length 1 #36302

rhshadrach commented Sep 12, 2020

rhshadrach commented Sep 12, 2020 •

edited

rhshadrach commented Sep 26, 2020

tehunter commented Apr 16, 2024 •

edited

rhshadrach commented Apr 16, 2024 •

edited

tehunter commented Apr 16, 2024

BUG: DataFrameGroupBy.__getitem__ should warn on tuple of length 1 #36302

BUG: DataFrameGroupBy.__getitem__ should warn on tuple of length 1 #36302

Comments

rhshadrach commented Sep 12, 2020

rhshadrach commented Sep 12, 2020 • edited

rhshadrach commented Sep 26, 2020

tehunter commented Apr 16, 2024 • edited

rhshadrach commented Apr 16, 2024 • edited

tehunter commented Apr 16, 2024

BUG: DataFrameGroupBy.getitem should warn on tuple of length 1 #36302

BUG: DataFrameGroupBy.getitem should warn on tuple of length 1 #36302

rhshadrach commented Sep 12, 2020 •

edited

tehunter commented Apr 16, 2024 •

edited

rhshadrach commented Apr 16, 2024 •

edited