Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Support Spark filter with window operation #2687

Merged
merged 15 commits into from
Mar 23, 2021

Conversation

emilyreff7
Copy link
Contributor

@emilyreff7 emilyreff7 commented Mar 16, 2021

Overview

Currently, if you write an expression that does a filter on a window operation and execute this in the pyspark backend, we get a failure from upstream Spark. For example:
table.filter(lambda t: t['id'].mean().over(window) > 3)
would raise pyspark.sql.utils.AnalysisException: It is not allowed to use window functions inside WHERE clause;. There is an upstream Spark issue about this: https://issues.apache.org/jira/browse/SPARK-33057.

Proposed Change

A simple workaround can be implemented to avoid passing a window operation directly to filter in the pyspark backend. We can achieve the same filter by first assigning the filter predicate to a temporary, internal column, performing the filter using that column, and then dropping the column.

Tests

Added a test in test_basic.py that does a window operation in a filter.

@jreback
Copy link
Contributor

jreback commented Mar 22, 2021

@emilyreff7 can you merge master one more time

@jreback jreback added this to the Next release milestone Mar 22, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

@emilyreff can you rebase

@icexelloss icexelloss changed the title Fix Spark filter with window operation FEAT: Support Spark filter with window operation Mar 23, 2021
@emilyreff7 emilyreff7 force-pushed the fix-filter-with-window-spark branch 2 times, most recently from 6eb130c to 89b9000 Compare March 23, 2021 15:42
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

lgtm can you add a whatsnew note :-> ping on green.

@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

@icexelloss merge when ready

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@icexelloss
Copy link
Contributor

@jreback Looks good to me.

@jreback jreback merged commit b28536d into ibis-project:master Mar 23, 2021
@emilyreff7 emilyreff7 deleted the fix-filter-with-window-spark branch June 3, 2021 18:55
@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants