-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#6803: Enable range-partitioning impl for 'groupby.apply()' by default #6804
Conversation
…apply()' by default Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@@ -3735,14 +3735,6 @@ def groupby( | |||
skip_on_aligning_flag = "__skip_me_on_aligning__" | |||
|
|||
def apply_func(df): # pragma: no cover | |||
if any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categorical are now always being caught at the query compiler level
by_dtypes = self._modin_frame._dtypes.lazy_get(by).get() | ||
else: | ||
by_dtypes = self.dtypes[by] | ||
if any(isinstance(dtype, pandas.CategoricalDtype) for dtype in by_dtypes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're now materializing 'by' dtypes to consistently catch unsupported cases and fallback to an older implementation. Previously, if the dtypes weren't materialized we were raising an exception in the kernel which caused groupby to fail. Since we're now moving this implementation out of experimental mode, we want more stability here in terms of falling back to an implementation that has more coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
# 'group_wise' means 'groupby.apply()'. We're certain that range-partitioning groupby | ||
# always works better for '.apply()', so we're using it regardless of the 'ExperimentalGroupbyImpl' | ||
# value | ||
if how == "group_wise" or ExperimentalGroupbyImpl.get(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename ExperimentalGroupbyImpl
to RangePartitioningGroupby
in a separate PR as we discussed offline.
What do these changes do?
It's believed that range-partitioning implementation is always better for
groupby.apply()
, so this PR makes the new implementation to be a default one.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
groupby.apply()
automatically #6803docs/development/architecture.rst
is up-to-date