Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add PySpark support for ReductionVectorizedUDF #2366

Merged
merged 3 commits into from
Sep 8, 2020

Conversation

timothydijamco
Copy link
Contributor

This is a re-make of #2348.

Change

Add a compilation function for ReductionVectorizedUDF for the PySpark backend

Testing

  • test_vectorized_udf.py: New basic test for reduction UDF
  • test_aggregation.py: New test parameter (that uses reduction UDFs) for test_aggregate and test_aggregate_grouped
  • test_window.py: New test for unbounded window, including a test parameter that uses reduction UDFs

@timothydijamco
Copy link
Contributor Author

timothydijamco commented Sep 8, 2020

It would be ideal if #2364 gets merged before this does, but it's not strictly necessary

Merging #2364 before this would formally ensure these changes work with PySpark 3 (although I have verified this already in #2348)

Not merging #2364 before this is OK as we'd still be testing using the old paradigm of testing only on PySpark 2 (and then #2364 would later ensure the changes in this PR work with PySpark 3)

@jreback jreback added pyspark The Apache PySpark backend backends - spark feature Features or general enhancements udf Issues related to user-defined functions labels Sep 8, 2020
@jreback jreback added this to the Next Feature Release milestone Sep 8, 2020
Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Might or might not need more changes.

@icexelloss
Copy link
Contributor

LGTM. I don't think getting #2364 merged is prerequisite because this PR already makes sure this works for Spark 2. Getting Spark 3 to work is orthogonal and can be done in 2364.

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback merged commit a70d443 into ibis-project:master Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pyspark The Apache PySpark backend udf Issues related to user-defined functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants