CLN: Simplify gathering of results in aggregate #37227

rhshadrach · 2020-10-18T21:14:48Z

Reduces the number of paths when collecting results in aggregation.aggregate; one where we are gathering NDFrames using concat and the other where we are gathering scalars using Series.

The reason for the one test change is as follows. In one case, we previously gathered results using DataFrame. When this occurs and the indexes are not all equal, DataFrame will sort the index whereas concat will have the index in order of appearance. For example with

df = DataFrame(
    {
        'A': pd.Series([1, 2], index=['b', 'a']),
        'B': pd.Series([3, 4], index=['c', 'a'])
    }
)

gives

     A    B
a  2.0  4.0
b  1.0  NaN
c  NaN  3.0

whereas using concat instead of DataFrame on the first line with axis=1 gives:

     A    B
b  1.0  NaN
a  2.0  4.0
c  NaN  3.0

If in this example you replace the 2nd index with ['b', 'a'] (so that they are equal), then both Dataframe and concat will produce the same result with index ['b', 'a']. If on the other hand you replace the 2nd index with ['a', 'b'], then DataFrame will result in index ['a', 'b'] whereas concat will result in index ['b', 'a'].

jreback

wow a lot simpler. pls merge master and ping on green.

jreback · 2020-10-20T00:31:10Z

pandas/core/aggregation.py

-
-            # we have a dict of Series
-            # return a MI Series
+        if any(isinstance(r, NDFrame) for r in results.values()):


can you add a ABCNDFrame in pandas.core.dtypes.generic

jreback · 2020-10-20T00:31:39Z

pandas/core/aggregation.py

+                keys_to_use = keys_to_use if keys_to_use != [] else keys
+                axis = 0 if isinstance(obj, ABCSeries) else 1
+                result = concat({k: results[k] for k in keys_to_use}, axis=axis)
+            # Raised if some value of results is not a NDFrame


can you move this comment below the AttributeError (as confusing where it is now)

jreback · 2020-10-20T00:32:26Z

pandas/core/aggregation.py

            try:
-                result = concat(results)
-            except TypeError as err:
+                keys_to_use = [k for k in keys if not results[k].empty]


where does the AttributeError come from?

can you move things outside the try/except (or use an else)?

pandas/tests/frame/apply/test_frame_apply.py

jreback · 2020-10-22T00:19:22Z

lgtm merge on green

jreback · 2020-10-22T23:49:46Z

thanks @rhshadrach

CLN: Simplify gathering of results in aggregate

a90098f

jreback requested changes Oct 20, 2020

View reviewed changes

jreback added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 20, 2020

jreback reviewed Oct 20, 2020

View reviewed changes

pandas/tests/frame/apply/test_frame_apply.py Show resolved Hide resolved

Added ABCNDFrame, removed try-except.

3cf4db5

jreback added this to the 1.2 milestone Oct 22, 2020

jreback approved these changes Oct 22, 2020

View reviewed changes

Added ABCNDFrame, removed try-except.

db01e1b

rhshadrach mentioned this pull request Oct 22, 2020

TST: eval test unreliable #37328

Closed

jreback merged commit affc4d5 into pandas-dev:master Oct 22, 2020

rhshadrach deleted the agg_cleanup branch October 22, 2020 23:54

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

CLN: Simplify gathering of results in aggregate (pandas-dev#37227)

daff98c

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

CLN: Simplify gathering of results in aggregate (pandas-dev#37227)

1fe6eb4

dchigarev mentioned this pull request Feb 5, 2021

QST: Is it intended that empty dictionary aggregation raises exception since 1.2.0? #39609

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: Simplify gathering of results in aggregate #37227

CLN: Simplify gathering of results in aggregate #37227

rhshadrach commented Oct 18, 2020 •

edited

Loading

jreback left a comment

jreback Oct 20, 2020

jreback Oct 20, 2020

jreback Oct 20, 2020

jreback commented Oct 22, 2020

jreback commented Oct 22, 2020

CLN: Simplify gathering of results in aggregate #37227

CLN: Simplify gathering of results in aggregate #37227

Conversation

rhshadrach commented Oct 18, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

jreback Oct 20, 2020

Choose a reason for hiding this comment

jreback Oct 20, 2020

Choose a reason for hiding this comment

jreback Oct 20, 2020

Choose a reason for hiding this comment

jreback commented Oct 22, 2020

jreback commented Oct 22, 2020

rhshadrach commented Oct 18, 2020 •

edited

Loading