-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add PySpark support for ReductionVectorizedUDF #2366
FEAT: Add PySpark support for ReductionVectorizedUDF #2366
Conversation
|
It would be ideal if #2364 gets merged before this does, but it's not strictly necessary Merging #2364 before this would formally ensure these changes work with PySpark 3 (although I have verified this already in #2348) Not merging #2364 before this is OK as we'd still be testing using the old paradigm of testing only on PySpark 2 (and then #2364 would later ensure the changes in this PR work with PySpark 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Might or might not need more changes.
|
LGTM. I don't think getting #2364 merged is prerequisite because this PR already makes sure this works for Spark 2. Getting Spark 3 to work is orthogonal and can be done in 2364. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a re-make of #2348.
Change
Add a compilation function for
ReductionVectorizedUDFfor the PySpark backendTesting
test_vectorized_udf.py: New basic test for reduction UDFtest_aggregation.py: New test parameter (that uses reduction UDFs) fortest_aggregateandtest_aggregate_groupedtest_window.py: New test for unbounded window, including a test parameter that uses reduction UDFs