Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Keyword Aggregation in DataFrame.agg and Series.agg #26513

Closed
TomAugspurger opened this issue May 24, 2019 · 6 comments · Fixed by #29116
Closed

Allow Keyword Aggregation in DataFrame.agg and Series.agg #26513

TomAugspurger opened this issue May 24, 2019 · 6 comments · Fixed by #29116
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Followup to #26399

In [2]: df = pd.DataFrame({"A": [1, 2, 1, 2], "B": [1, 2, 3, 4]})

In [3]: df
Out[3]:
   A  B
0  1  1
1  2  2
2  1  3
3  2  4

In [4]: df.agg(foo=("B", "sum"))

Expected Output

In [13]: df.agg({"B": {"foo": "sum"}})
/Users/taugspurger/Envs/dask-dev/lib/python3.7/site-packages/pandas/core/frame.py:6284: FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
  result, how = self._aggregate(func, axis=axis, *args, **kwargs)
Out[13]:
      B
foo  10

without the warning. Similar for Series.agg

In [16]: df.B.agg({"foo": "sum"})  # allow foo="sum"
Out[16]:
foo    10
Name: B, dtype: int64
@TomAugspurger TomAugspurger added API Design Blocker Blocking issue or pull request for an upcoming release labels May 24, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone May 24, 2019
@TomAugspurger TomAugspurger changed the title All Keyword Aggregation in DataFrame.agg and Series.agg Alllow Keyword Aggregation in DataFrame.agg and Series.agg May 24, 2019
@TomAugspurger TomAugspurger changed the title Alllow Keyword Aggregation in DataFrame.agg and Series.agg Allow Keyword Aggregation in DataFrame.agg and Series.agg May 24, 2019
@TomAugspurger
Copy link
Contributor Author

Is this a blocker for 0.25.0? Is anyone interested in working on it? I'm not.

cc @zertin, @smcateer, @pirsquared, who seemed interested in this for the GroupBy.agg context (this is for DataFrame/Series.agg).

@zertrin
Copy link
Contributor

zertrin commented Jun 20, 2019

I am not sure what this follow-up corresponds to.

As blocker, I remember the multiple lambda problem, for which I believe you said you had a WIP.

Which of the blocker remaining issues this issue corresponds to? The description is unclear to me.

@TomAugspurger
Copy link
Contributor Author

Multiple lambdas is at #26905 (tagged for 0.25, but needs review).

This issue is about supporting the new named aggregation syntax in DataFrame.agg, rather than just DataFrameGroupBy.agg (likewise for Series). Ideally, these two wouldn't diverge in behavior.

In [2]: df = pd.DataFrame({"A": [1, 2, 1, 2], "B": [1, 2, 3, 4]})

In [3]: df
Out[3]:
   A  B
0  1  1
1  2  2
2  1  3
3  2  4

In [4]: df.agg(foo=("B", "sum"))
      B
foo  10

@TomAugspurger TomAugspurger removed the Blocker Blocking issue or pull request for an upcoming release label Jun 20, 2019
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jul 2, 2019

Would be nice for 0.25, but not a blocker. Pushing.

@TomAugspurger TomAugspurger modified the milestones: 0.25.0, Contributions Welcome Jul 2, 2019
@christopherzimmerman
Copy link
Contributor

christopherzimmerman commented Sep 10, 2019

@TomAugspurger I'm going to look into this, just had a question about desired behavior.

Currently, if you aggregate different functions on different series with DataFrame.agg, you still get a Series back:

In [16]: df = pd.DataFrame({"A": [1, 2, 1, 2], "B": [1, 2, 3, 4]})

In [17]: df
Out[17]:
   A  B
0  1  1
1  2  2
2  1  3
3  2  4

In [18]: df.agg({"A": "sum", "B": "mean"})
Out[18]:
A    6.0
B    2.5
dtype: float64

For the example you provided above, it would seem the result would be something like:

In [18]: df.agg(foo=("A", "sum"), bar=("B", "mean"))
Out[18]:
       A    B
foo    6  NaN
bar  NaN  2.5

When something like:

foo    6.0
bar    2.5
dtype: float64

Would be the closest to current behavior. Any thoughts?

@TomAugspurger
Copy link
Contributor Author

You're probably right about the expected output being a Series there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants