Skip to content

BUG: groupby.apply() drops _metadata from subclassed DataFrame #62134

@JBGreisman

Description

@JBGreisman

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas._testing as tm
import numpy as np

subdf = tm.SubclassedDataFrame(
    {"X": [1, 1, 2, 2, 3], "Y": np.arange(0, 5), "Z": np.arange(10, 15)}
)
subdf.testattr = "test"

# Calculate groupby-sum in two ways: one preserves metadata, one does not
expected = subdf.groupby("X").sum()
result = subdf.groupby("X").apply(np.sum, axis=0, include_groups=False)

# Both dataframes have equivalent content
tm.assert_frame_equal(result, expected)

print(expected.testattr)    # prints "test"
print(result.testattr)      # raises AttributeError

Issue Description

When extending the pandas.DataFrame by subclassing, most operations preserve the _metadata attributes. This is not the case for groupby.apply(), after which the _metadata fields are dropped. I think this is unintended behavior, because an equivalent call using a built-in groupby method (such as groupby.mean()) does preserve the fields.

Expected Behavior

import pandas._testing as tm
import numpy as np

subdf = tm.SubclassedDataFrame(
    {"X": [1, 1, 2, 2, 3], "Y": np.arange(0, 5), "Z": np.arange(10, 15)}
)
subdf.testattr = "test"
result = subdf.groupby("X").apply(np.sum, axis=0, include_groups=False)

assert result.testattr == "test"   # attribute should be preserved after groupby-apply

Installed Versions

INSTALLED VERSIONS

commit : c888af6
python : 3.12.11
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.3.1
numpy : 2.2.6
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.1
Cython : 3.1.3
sphinx : 8.1.3
IPython : 9.4.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.4
blosc : None
bottleneck : 1.5.0
dataframe-api-compat : None
fastparquet : 2024.11.0
fsspec : 2025.7.0
html5lib : 1.1
hypothesis : 6.138.2
gcsfs : 2025.7.0
jinja2 : 3.1.6
lxml.etree : 6.0.0
matplotlib : 3.10.5
numba : 0.61.2
numexpr : 2.11.0
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 21.0.0
pyreadstat : 1.3.1
pytest : 8.4.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2025.7.0
scipy : 1.16.1
sqlalchemy : 2.0.43
tables : 3.10.2
tabulate : 0.9.0
xarray : 2025.8.0
xlrd : 2.0.2
xlsxwriter : 3.2.5
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team membermetadata_metadata, .attrs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions