Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.groupby doesn't preserve _metadata #29442

Closed
pandichef opened this issue Nov 6, 2019 · 9 comments · Fixed by #35688
Closed

DataFrame.groupby doesn't preserve _metadata #29442

pandichef opened this issue Nov 6, 2019 · 9 comments · Fixed by #35688
Labels
Bug Groupby metadata _metadata, .attrs
Milestone

Comments

@pandichef
Copy link

pandichef commented Nov 6, 2019

I'm following the 0.25.3 documentation on using _metadata.

sdf = SubclassedDataFrame2(my_data_as_dictionary)
sdf.added_property = 'hello pandas'
df = sdf.groupby('mycategorical')[['myfloat1', 'myfloat2']].sum()
print(df.added_property)

The above output produces the result:
AttributeError: 'DataFrame' object has no attribute 'added_property'

added_property is not being "passed to manipulation results" as described in the documentation.

@gfyoung gfyoung added the Groupby label Nov 6, 2019
@mroeschke mroeschke added the metadata _metadata, .attrs label Nov 6, 2019
@TomAugspurger
Copy link
Contributor

FYI, this is likely fixed as part of #28394, where I'm calling __finalize__ in more places.

@pandichef
Copy link
Author

pandichef commented Nov 6, 2019 via email

@mroeschke mroeschke added the Bug label Apr 3, 2020
@pandichef
Copy link
Author

pandichef commented Apr 16, 2020

Fyi, this is still an issue on pandas 1.0.3. @TomAugspurger I think there is a relatively quick fix that I'm using in a personal project... in DataFrameGroupBy, you have access to self.obj. So you can write a class decorator to implement the following logic at the end of every method in DataFrameGroupBy:

if isinstance(method_result, DataFrame) and issubclass(self.obj.__class__, DataFrame):
    return self.obj.__class__(result)
elif isinstance(method_result, Series) and issubclass(self.obj.__class__, Series):
    return self.obj.__class__(result)

etc.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 16, 2020 via email

@pandichef
Copy link
Author

pandichef commented Apr 16, 2020 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 16, 2020 via email

@Japanuspus
Copy link
Contributor

It appears that the commit responsible for the groupby regression is #34214, where object creation is done with type(self)(...) which does not call __finalize__ (or checks _constructor), which should be done according to the docs.

I will try to implement this internally and submit a pull request.

Japanuspus added a commit to Japanuspus/pandas that referenced this issue Aug 12, 2020
This bug is a regression in v1.1.0 and was introduced by the fix for pandas-devGH-34214 in commit [6f065b].

Underlying cause is that the `*Splitter` classes do not use the `._constructor` property and do not call `__finalize__`.

Please note that the method name used for `__finalize__` calls was my best guess since documentation for the value has been hard to find.

[6f065b]: pandas-dev@6f065b6
@jreback jreback added this to the 1.1.1 milestone Aug 13, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 20, 2020
@simonjayhawkins
Copy link
Member

heads up: moved off 1.1.1 milestone

Japanuspus added a commit to Japanuspus/pandas that referenced this issue Aug 27, 2020
Added testcase `test_groupby_sum_with_custom_metadata` for functionality exercised in the #pandas-devGH-29442.
Testcase fails on current code.
Japanuspus added a commit to Japanuspus/pandas that referenced this issue Aug 27, 2020
In order to propagate metadata fields, the `__finalize__` method must be
called for the resulting DataFrame with a reference to input. By
implementing this in `_GroupBy._agg_general`, this is performed as late
as possible for the `.sum()` (and similar) code-paths.

Fixes #pandas-devGH-29442
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 8, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (imminent release) as associated PR #35688 not yet ready.

@jreback jreback modified the milestones: 1.1.3, 1.1.4 Oct 2, 2020
simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this issue Oct 14, 2020
…DataFrame.groupby doesn't preserve _metadata
simonjayhawkins added a commit that referenced this issue Oct 15, 2020
…esn't preserve _metadata (#37122)

Co-authored-by: Janus <janus@insignificancegalore.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby metadata _metadata, .attrs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants