-
-
Notifications
You must be signed in to change notification settings - Fork 17.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: aggregation without MultiIndex columns #14581
Comments
You simply need to specify an aggregation directly on the Series. PySpark in general is NOT an inspiration as things to be honest are much clunkier. |
Thanks @jreback for the quick response. My previous example is over simplified, what about the following example df = pd.DataFrame([
['A', 1, 85],
['A', 2, 45],
['A', 3, 23],
['B', 4, 76],
['B', 5, 43],
['B', 6, 56]
], columns=['Key', 'Value', 'Age'])
df1 = df.groupby('Key') \
.agg({
'Value': {'V1': 'sum', 'V2': 'count'},
'Age': {'AvgAge': 'mean', 'StdAge': 'std'}
}) Is there a way to handle multiple columns aggregation with multiple aggregation method in a convenient way? |
FWIW I would also like to see a new groupby API entry point (not requiring keyword args) that does not yield a row index. |
Currently the group-by-aggregation in pandas will create MultiIndex columns if there are multiple operation on the same column. However, this introduces some friction to reset the column names for fast filter and join. (If all operations could be chained together, analytics would be smoother)
Expected Output
It would be great if there is a simple alias function for columns (like the pyspark's implementation), such as
The text was updated successfully, but these errors were encountered: