You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pandas uses NDFrame.__finalize__ to propagate metadata from one NDFrame to
another. This ensures that things like self.attrs and self.flags are not
lost. In general we would like that any operation that accepts one or more
NDFrames and returns an NDFrame should propagate metadata by calling __finalize__.
This is a meta-issue to improve the use of __finalize__. Here's a hopefully
accurate list of methods that don't currently call finalize.
Some general comments around finalize
We don't have a good sense for what should happen to attrs when there are
multiple NDFrames involved with differing attrs (e.g. in concat). The safest
approach is to probably drop the attrs when they don't match, but this will
need some thought.
We need to be mindful of performance. __finalize__ can be somewhat expensive
so we'd like to call it exactly once per user-facing method. This can be tricky
for things like DataFrame.apply which is sometimes used internally. We may need
to refactor some methods to have a user-facing DataFrame.apply that calls an internal DataFrame._apply. The internal method would not call __finalize__, just the
user-facing DataFrame.apply would.
If you're interested in working on this please post a comment indicating which method
you're working on. Un-xfail the test, then update the method to pass the test. Some of these
will be much more difficult to work on than others (e.g. groupby is going to be difficult). If you're
unsure whether a particular method is likely to be difficult, ask first.
DataFrame.__getitem__ with a scalar
DataFrame.eval with engine="numexpr"
DataFrame.duplicated
DataFrame.add, mul, etc. (at least for most things; some work to do on conflicts / overlapping attrs in binops)
Improve coverage of
NDFrame.__finalize__Pandas uses
NDFrame.__finalize__to propagate metadata from one NDFrame toanother. This ensures that things like
self.attrsandself.flagsare notlost. In general we would like that any operation that accepts one or more
NDFrames and returns an NDFrame should propagate metadata by calling
__finalize__.The test file at
https://github.com/pandas-dev/pandas/blob/master/pandas/tests/generic/test_finalize.py
attempts to be an exhaustive suite of tests for all these cases. However there
are many tests currently xfailing, and there are likely many APIs not covered.
This is a meta-issue to improve the use of
__finalize__. Here's a hopefullyaccurate list of methods that don't currently call finalize.
Some general comments around finalize
attrswhen there aremultiple NDFrames involved with differing attrs (e.g. in concat). The safest
approach is to probably drop the attrs when they don't match, but this will
need some thought.
__finalize__can be somewhat expensiveso we'd like to call it exactly once per user-facing method. This can be tricky
for things like
DataFrame.applywhich is sometimes used internally. We may needto refactor some methods to have a user-facing
DataFrame.applythat calls an internalDataFrame._apply. The internal method would not call__finalize__, just theuser-facing
DataFrame.applywould.If you're interested in working on this please post a comment indicating which method
you're working on. Un-xfail the test, then update the method to pass the test. Some of these
will be much more difficult to work on than others (e.g. groupby is going to be difficult). If you're
unsure whether a particular method is likely to be difficult, ask first.
DataFrame.__getitem__with a scalarDataFrame.evalwithengine="numexpr"DataFrame.duplicatedDataFrame.add,mul, etc. (at least for most things; some work to do on conflicts / overlapping attrs in binops)DataFrame.combine,DataFrame.combine_firstDataFrame.updateDataFrame.pivot,pivot_tableDataFrame.stackDataFrame.unstackDataFrame.explodeBUG: added finalize to explode, GH28283 #46629DataFrame.meltBUG/TST/DOC: added finalize to melt, GH28283 #46648DataFrame.diffDataFrame.applymapDataFrame.appendDataFrame.mergeDataFrame.covDataFrame.corrwithDataFrame.countDataFrame.nuniqueDataFrame.idxmax,idxminDataFrame.modeDataFrame.quantile(scalar and list of quantiles)DataFrame.isinDataFrame.popDataFrame.squeezeSeries.absDataFrame.getDataFrame.roundDataFrame.convert_dtypesDataFrame.pct_changeDataFrame.transformDataFrame.applyDataFrame.any,sum,std,mean, etdc.Series.str.operations returning a Series / DataFrameSeries.dt.operations returning a Series / DataFrameSeries.cat.operations returning a Series / DataFrame.iloc/.locAdd missing __finalize__ calls in indexers/iterators #46101