Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Groupby with as_index=False doesn't support multiple aggregation functions #3737

Closed
jangorecki opened this issue Jan 9, 2020 · 2 comments · Fixed by #4346
Closed
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@jangorecki
Copy link

I would like to do sum(c) and mean(c) by group a. I am on 0.11.0:

import cudf as cu
import pandas as pd
pdf = pd.DataFrame({'a': [1.,3,4,1.], 'c': [6., 7., 5., 10.]})
cdf = cu.DataFrame({'a': [1.,3,4,1.], 'c': [6., 7., 5., 10.]})
ans1 = pdf.groupby(['a'],as_index=False).agg({'c': ['sum','mean']})
ans1.head()
#     a     c     
#         sum mean
#0  1.0  16.0  8.0
#1  3.0   7.0  7.0
#2  4.0   5.0  5.0
ans2 = cdf.groupby(['a'],as_index=False).agg({'c': ['sum','mean']})
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", line 46, in agg
#    return self._apply_aggregation(func)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", line 132, in _apply_aggregation
#    result = self._groupby.compute_result(agg)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/grou
#pby/groupby.py", line 373, in compute_result
#    return self.construct_result(out_key_columns, out_value_columns)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/grou
#pby/groupby.py", line 460, in construct_result
#    out_value_columns, columns=self.value_names
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/grou
#pby/groupby.py", line 25, in dataframe_from_columns
#    df.columns = columns
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/data
#frame.py", line 310, in __setattr__
#    object.__setattr__(self, key, col)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/data
#frame.py", line 1134, in columns
#    self._rename_columns(columns)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/data
#frame.py", line 1155, in _rename_columns
#    self.rename(mapper=mapper, inplace=True)
#  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/data
#frame.py", line 1881, in rename
#    out_column = mapper[key] + ("cudf_" + str(postfix),)
#TypeError: must be str, not tuple
@jangorecki jangorecki added Needs Triage Need team to review and classify feature request New feature or request labels Jan 9, 2020
@kkraus14 kkraus14 added bug Something isn't working Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify feature request New feature or request labels Jan 10, 2020
@kkraus14
Copy link
Collaborator

Defer this to the groupby refactor that's desperately needed.

@beckernick
Copy link
Member

As discussed in #3429 , this error is caused by as_index=False, which we don't currently support. If you use the default as_index=True, you can do cdf.groupby(['a']).agg({'c': ['sum','mean']}).

Updating this feature request to explicitly be about supporting as_index=False.

@beckernick beckernick changed the title [FEA] groupby multiple functions to single variable [FEA] Groupby with as_index=False doesn't support multiple aggregation functions Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants