Deprecation of relabeling dicts in groupby.agg brings many issues

This issue is created based on the discussion from #15931 following the deprecation of relabeling dicts in `groupby.agg`. A lot of what is summarized below was already discussed in the previous discussion. I would recommend in particular https://github.com/pandas-dev/pandas/pull/15931#issuecomment-336139085 where the problems are also clearly stated.

The motivation behind the deprecation of #15931 was mostly related to bringing a consistent interface for `agg()` between Series and Dataframe (see also #14668 for context).

The relabeling functionality with a nested dict has been described by some as being too complex and/or inconsistent and thus deprecated.

However, this comes at a price: the impossibility to aggregate and rename at the same time leads to very annoying issues and some backward incompatibility where no sensible workaround is available:
- _[annoying]_ no more control over the names of the resulting columns
- _[annoying]_ you need to find a way to rename the MultiIndex _after_ performing the aggregation, requiring to keep track of the order of columns at two places in the code.... not practical at all and sometimes downright impossible (cases below).
- ⚠️  _**[breaking]**_ cannot apply more than one callable with the same internal name on the same input column. This results in two sub-cases:
    -  _**[breaking]**_ you can't apply anymore two or more lambda aggregators on the same column
    -  _**[breaking]**_ you can't apply anymore two or more aggregators from partial functions unless you alter their hidden `__name__` attribute

## Example

_(please note, this is a crafted example for the purpose of demonstrating the problem in as short a code as possible, but all of the demonstrated issues here did bite me in real life since the change, and in situations not as simple as here)_

### Input Dataframe

```python
mydf = pd.DataFrame(
    {
        'cat': ['A', 'A', 'A', 'B', 'B', 'C'],
        'energy': [1.8, 1.95, 2.04, 1.25, 1.6, 1.01],
        'distance': [1.2, 1.5, 1.74, 0.82, 1.01, 0.6]
    },
    index=range(6)
)
```

```
  cat  distance  energy
0   A      1.20    1.80
1   A      1.50    1.95
2   A      1.74    2.04
3   B      0.82    1.25
4   B      1.01    1.60
5   C      0.60    1.01
```

### Before:

easy to write and read, and works as expected

```python
import numpy as np
import statsmodels.robust as smrb
from functools import partial
 
# median absolute deviation as a partial function
# in order to demonstrate the issue with partial functions as aggregators
mad_c1 = partial(smrb.mad, c=1)

# renaming and specifying the aggregators at the same time
# note that I want to choose the resulting column names myself
# for example "total_xxxx" instead of just "sum"
mydf_agg = mydf.groupby('cat').agg({
    'energy': {
        'total_energy': 'sum',
        'energy_p98': lambda x: np.percentile(x, 98),  # lambda
        'energy_p17': lambda x: np.percentile(x, 17),  # lambda
    },
    'distance': {
        'total_distance': 'sum',
        'average_distance': 'mean',
        'distance_mad': smrb.mad,   # original function
        'distance_mad_c1': mad_c1,  # partial function wrapping the original function
    },
})
```

results in

```
          energy                             distance
    total_energy energy_p98 energy_p17 total_distance average_distance distance_mad distance_mad_c1
cat
A           5.79     2.0364     1.8510           4.44            1.480     0.355825           0.240
B           2.85     1.5930     1.3095           1.83            0.915     0.140847           0.095
C           1.01     1.0100     1.0100           0.60            0.600     0.000000           0.000
```

and all is left is:

```python
# get rid of the first MultiIndex level in a pretty straightforward way
mydf_agg.columns = mydf_agg.columns.droplevel(level=0)
```

Happy dance praising pandas 💃 🕺 !

### After

```python
import numpy as np
import statsmodels.robust as smrb
from functools import partial

# median absolute deviation as a partial function
# in order to demonstrate the issue with partial functions as aggregators
mad_c1 = partial(smrb.mad, c=1)

# no way of choosing the destination's column names...
mydf_agg = mydf.groupby('cat').agg({
    'energy': [
    	'sum',
    	lambda x: np.percentile(x, 98), # lambda
    	lambda x: np.percentile(x, 17), # lambda
    ],
    'distance': [
    	'sum',
    	'mean',
    	smrb.mad, # original function
    	mad_c1,   # partial function wrapping the original function
    ],
})
```

The above breaks because the lambda functions will all result in columns named `<lambda>` which results in

    SpecificationError: Function names must be unique, found multiple named <lambda>

**Backward incompatible regression: one cannot apply two different lambdas to the same original column anymore.**

If one removes the `lambda x: np.percentile(x, 98)` from above, we get the same issue with the partial function which inherits the function name from the original function:

    SpecificationError: Function names must be unique, found multiple named mad

Finally, after overwriting the `__name__` attribute of the partial (for example with `mad_c1.__name__ = 'mad_c1'`) we get:

```
    energy          distance
       sum <lambda>      sum   mean       mad mad_c1
cat
A     5.79   1.8510     4.44  1.480  0.355825  0.240
B     2.85   1.3095     1.83  0.915  0.140847  0.095
C     1.01   1.0100     0.60  0.600  0.000000  0.000
```

with still 

- one column missing (98th percentile)
- the handling of the MultiIndex columns 
- and the renaming of the columns

to deal with in separate step.

There is no control possible for the column names after aggregation, the best we can get in an automated way is some combination of original column name and the _aggregate function's name_ like this:

```python
mydf_agg.columns = ['_'.join(col) for col in mydf_agg.columns]
```

which results in:

```
     energy_sum  energy_<lambda>  distance_sum  distance_mean  distance_mad distance_mad_c1
cat
A          5.79           1.8510          4.44          1.480      0.355825           0.240
B          2.85           1.3095          1.83          0.915      0.140847           0.095
C          1.01           1.0100          0.60          0.600      0.000000           0.000
```

and if you really need to have different names, you can do it like this:

```python
mydf_agg.rename({
    "energy_sum": "total_energy",
    "energy_<lambda>": "energy_p17",
    "distance_sum": "total_distance",
    "distance_mean": "average_distance"
    }, inplace=True)
```

but that means that you need to be careful to keep the renaming code (which must now be located at another place in the code) in sync with the code where the aggregation is defined...

Sad pandas user 😢  (which still loves pandas of course)

---

I am all in for consistency, and at the same time I deeply regret the deprecation of the _aggregate and rename_ functionality. I hope the examples above make the pain points clear.

---

### Possible solutions

- Un-deprecate the dict-of-dict relabeling functionality
- Provide another API to be able to do it (but why should there be two methods for the same main purpose, namely aggregation?)
- ??? (open to suggestions)

---

_Optional read:_

With respect to the aforementioned discussion in the pull request which has been going on already for a few months, I only recently realized one of the reasons why I am so bothered by this deprecation: "aggregate and rename"  is a natural thing to do with GROUP BY aggregations in SQL since in SQL you usually provide the destination column name directly next to the aggregation expression, e.g. `SELECT col1, avg(col2) AS col2_mean, stddev(col2) AS col2_var FROM mytable GROUP BY col1`.

I'm **_not_** saying that Pandas should necessarily provide the same functionalities as SQL of course. But the examples provided above demonstrate why the dict-of-dict API was in my opinion a clean and simple solution to many use-cases.

(* I don't personally agree that the dict-of-dict approach is complex.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Deprecation of relabeling dicts in groupby.agg brings many issues #18366

Example

Input Dataframe

Before:

After

Possible solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Deprecation of relabeling dicts in groupby.agg brings many issues #18366

Description

Example

Input Dataframe

Before:

After

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions