Skip to content

BUG: GroupBy.apply with as_index=False still produces a MultiIndex #58291

@WillAyd

Description

@WillAyd

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Assume we have this data:

In [88]:   df = pd.DataFrame([
    ...:       ["group_a", 0],
    ...:       ["group_a", 2],
    ...:       ["group_b", 1],
    ...:       ["group_b", 3],
    ...:       ["group_b", 5],
    ...:   ], columns=["group", "value"])
    ...: 
    ...:   df
In [88]:   def up_to_two_rows(df: pd.DataFrame) -> pd.DataFrame:
    ...:       return df.head(2)

Calling .apply seems to always create a MultiIndex, even when you don't ask for the groups:

In [89]: df.groupby("group").apply(up_to_two_rows)
<ipython-input-89-e4954502d06d>:1: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  df.groupby("group").apply(up_to_two_rows)
Out[89]: 
             group  value
group                    
group_a 0  group_a      0
        1  group_a      2
group_b 2  group_b      1
        3  group_b      3
In [92]: df.groupby("group").apply(up_to_two_rows, include_groups=False)
Out[92]: 
           value
group           
group_a 0      0
        1      2
group_b 2      1
        3      3
In [93]: df.groupby("group", as_index=False).apply(up_to_two_rows, include_groups=False)
Out[93]: 
     value
0 0      0
  1      2
1 2      1
  3      3

Issue Description

I am ultimately trying to get output that matches:

In [97]: df.groupby("group").apply(up_to_two_rows, include_groups=False).reset_index(level=0)
Out[97]: 
     group  value
0  group_a      0
1  group_a      2
2  group_b      1
3  group_b      3

But am not clear what combination of keywords is supposed to do that, if any

Expected Behavior

In [97]: df.groupby("group").apply(up_to_two_rows, include_groups=False).reset_index(level=0)
Out[97]: 
     group  value
0  group_a      0
1  group_a      2
2  group_b      1
3  group_b      3

Installed Versions

main

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions