-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix reduction UDFs over ungrouped, bounded windows on Pandas backend #2395
BUG: Fix reduction UDFs over ungrouped, bounded windows on Pandas backend #2395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed one around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
|
Sorry revoked my approved. There is a small typo otherwise LGTM. |
|
I updated the comments to be more accurate and also a little more concise. I also updated when an I also fixed the typo in |
|
CI is failing--looks like there is a use case of creating a custom aggcontext (link). Let me look into how to handle this |
|
OK, to handle custom Note:For the most part, this is exactly how the execution rules handled custom The only difference is that before this PR, when handling a grouped aggregation, the execution rule would pre-process the data. Below is what the execution rule did before this PR in the case that the Lines 242 to 252 in 899804c
In this PR, I'm changing this so that custom Lines 283 to 286 in 7239cd6
I believe this is more proper. I don't think we should pre-process any data passed to custom |
|
@icexelloss if you'd have a look / approve when ready. ignore the 1 failing part of the CI for now. |
|
Changes:
|
|
@timothydijamco Can you also add #2395 (comment) to the overview section? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
|
thanks @timothydijamco |
Overview
Currently, trying to apply a reduction UDF over an ungrouped, bounded window on the Pandas backend will hit an error.
This is because the execution rule for reduction UDFs (over an ungrouped aggregation context) doesn't handle the case that the aggregation context could be a bounded window (assumes it is unbounded).
This PR adds logic to handle this case.
Edit: Incidental improvements
This PR also ended up making some improvements to aggregation contexts (
ibis/pandas/aggcontext.py) and how they are used to execute reduction/analytic UDFs (ibis/pandas/udf.py) to simplify things and increase consistency:execute_udaf_node_no_groupby(now simply callingaggcontext.aggregardless of the specific type ofAggregationContext)aggcontext.aggin more cases. This surfaced an error, which I fixed by no longer passingkwargstoaggcontext.agg(inexecute_udaf_node_no_groupbyandexecute_udaf_node_groupby).This is because reduction/analytic UDFs already have
kwargsapplied when they were created. Trying to passkwargsto the functions again leads to an error (~unexpected keyword argument) in some cases.aggcontext.agg.In other words, always do something like this:
aggcontext.agg(args[0], func, *args[1:], **kwargs)rather than this:
aggcontext.agg(args[0], func, *args, **kwargs)Example
Code
Output
Before
After
No error, and the result is correct
Testing
I added a new test for ungrouped, bounded windows in
test_window.py.Readability changes
I also renamed the tests in
test_window.pyto make it more clear which variant of windows each test is handling. This is what the list of tests intest_window.pynow look like:test_grouped_bounded_expanding_windowtest_ungrouped_bounded_expanding_window(new in this PR)test_grouped_bounded_following_windowtest_grouped_bounded_preceding_windowstest_grouped_unbounded_windowtest_ungrouped_unbounded_windowI'm open to any suggestions to try to simplify this, since
test_window.pykeeps growing. I've found this difficult to simplify further because there are so many combinations of windows and operations over windows (several dimensions: grouped/ungrouped, bounded/unbounded, ordered/unordered, reduction/analytic, udf/non-udf). To make things worse, specific permutations may need to be xfailed on certain backends, so there's a minimum amount explicitness required for test parameters (to let us xfail very specific permutations).