-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] SeriesGroupBy doesn't support std aggregation #3429
Comments
shouldn't this issue be resolved by #2791 in 0.11.0? ans = x.groupby(['id4','id5'],as_index=False).agg({'v3':'std'})
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", #line 46, in agg
# return self._apply_aggregation(func)
# File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", #line 132, in _apply_aggregation
# result = self._groupby.compute_result(agg)
# File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", #line 370, in compute_result
# self.dropna,
# File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/core/groupby/groupby.py", #line 551, in _groupby_engine
# key_columns, value_columns, aggs, dropna=dropna
# File "cudf/_lib/groupby.pyx", line 81, in cudf._lib.groupby.groupby
#KeyError: 'std'
|
We'll need more than just the above notation. We would explicitly want to be able to do something like this, too: import cudf
df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
df.groupby('a').agg({'b':['sum', 'std']})
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-7c4e9424589f> in <module>
2
3 df = cudf.DataFrame({'a': [1.,3,4,1.],'b': [4.,5,6,-10], 'c': [6., 7., 5., 10.]})
----> 4 df.groupby('a').agg({'b':['sum', 'std']})
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in agg(self, func)
44
45 def agg(self, func):
---> 46 return self._apply_aggregation(func)
47
48 def size(self):
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in _apply_aggregation(self, agg)
130 Applies the aggregation function(s) ``agg`` on all columns
131 """
--> 132 result = self._groupby.compute_result(agg)
133 libcudf.nvtx.nvtx_range_pop()
134 return result
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in compute_result(self, agg)
368 aggs_as_list,
369 self.sort,
--> 370 self.dropna,
371 )
372
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in _groupby_engine(key_columns, value_columns, aggs, sort, dropna)
549 """
550 out_key_columns, out_value_columns = libcudf.groupby.groupby(
--> 551 key_columns, value_columns, aggs, dropna=dropna
552 )
553
cudf/_lib/groupby.pyx in cudf._lib.groupby.groupby()
KeyError: 'std' That way, we could do multiple groupby-aggregations in a single call to |
@beckernick your comment seems to be a different FR, it was already requested in #3737 |
Thanks for linking that @jangorecki . The error in that issue is caused by What you can't currently do is However, now that I think about it more, I wonder if we wouldn't avoid two libcudf calls unless all aggs used are either hash-based or sort-based? |
Thanks @harrism. This will be tackled in the upcoming groupby libcudf++ port. |
Is your feature request related to a problem? Please describe.
SeriesGroupBy is not implemented with std() yet.
With mean aggregation, we can get the following
But it is not yet implemented with standard deviation.
Describe the solution you'd like
It'd be useful to have this feature to select the Series for aggregation instead of selecting a subset of columns as dataframe each time. With larger datasets and a need for multiple Series aggregation, it would be better to have the SeriesGroupBy have this method.
Describe alternatives you've considered
For now, I'm selecting a subset of the dataframe to get the results for a specific series.
The text was updated successfully, but these errors were encountered: