-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#6191: Implement groupby.rolling
API
#6292
Conversation
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@@ -39,6 +39,7 @@ | |||
from modin.core.dataframe.algebra.default2pandas.groupby import GroupBy | |||
from modin.config import IsExperimental | |||
from .series import Series | |||
from .window import RollingGroupby |
Check notice
Code scanning / CodeQL
Cyclic import Note
modin.pandas.window
super().__init__(self._groupby_obj._df, *args, **kwargs) | ||
|
||
def sem(self, *args, **kwargs): | ||
ErrorMessage.missmatch_with_pandas( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double 's', really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a misspelling in the original method's signature, raised an issue to fix this allover the project: #6293
@@ -3021,6 +3021,68 @@ def groupby_size( | |||
result.columns = result.columns[:-1].append(pandas.Index(["size"])) | |||
return result | |||
|
|||
@doc_utils.add_refer_to("GroupBy.rolling") | |||
def groupby_rolling( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to reflect this anyhow in docs, change supported api spreadsheet maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there's no mentioning of any inner methods of groupby
(and so no groupby.rolling
) in the supported api spreadsheet. There's just a line saying that pandas.DataFrame.groupby
is supported and that's it (src).
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@@ -135,7 +136,20 @@ def __init__( | |||
} | |||
self._kwargs.update(kwargs) | |||
|
|||
def __override(self, **kwargs): | |||
def _override(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this method accessible out of this class so we can call it from RollingGroupby
func, | ||
*args, | ||
**kwargs, | ||
) | ||
) | ||
if isinstance(self._dataframe, DataFrame): | ||
return dataframe | ||
elif is_list_like(func): | ||
elif is_list_like(func) and dataframe.columns.nlevels > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added so we wouldn't try to drop one single level
@@ -37,9 +37,9 @@ def _build_rolling(cls, func): | |||
Function that takes pandas DataFrame and applies `func` on a rolling window. | |||
""" | |||
|
|||
def fn(df, rolling_args, *args, **kwargs): | |||
def fn(df, rolling_kwargs, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked rolling_args
into rolling_kwargs
everywhere for interoperability between Rolling
and RollingGroupby
constructors
def agg_func(grp, *args, **kwargs): | ||
return agg_method(grp, original_agg_func, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was changed in order to pass the original_agg_func
as a second positional arg rather than a keyword arg so we can avoid the following error:
# when we call 'df.groupby("a").fillna(0)' the keyword binding would resolve into something like:
>>> pandas.core.groupby.DataFrameGroupBy.aggregate(df.groupby("a"), 0, func="fillna")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: aggregate() got multiple values for argument 'func'
# instead we want the 'func' to be passed as a positional arg as well, so the fillna's arg
# would come after the func
>>> pandas.core.groupby.DataFrameGroupBy.aggregate(df.groupby("a"), "fillna", 0)
b c
0 3 a
1 2 b
2 5 c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a comment. Also, there are other unresolved comments from @anmyachev.
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
groupby.rolling
API #6191docs/development/architecture.rst
is up-to-date