-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: window operations for pyspark backend #1945
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1945 +/- ##
==========================================
- Coverage 87.68% 86.13% -1.55%
==========================================
Files 93 93
Lines 16971 17068 +97
Branches 2145 2157 +12
==========================================
- Hits 14881 14702 -179
- Misses 1681 1961 +280
+ Partials 409 405 -4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments/questions, but overall looks good
ibis/pyspark/compiler.py
Outdated
|
||
|
||
@compiles(ops.Lead) | ||
def compile_lead(t, expr, scope, *, window, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this and Lag
are so similar, can we factor them into a single function that takes the pyspark function as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
ibis/tests/all/test_window.py
Outdated
@@ -3,7 +3,8 @@ | |||
|
|||
import ibis | |||
import ibis.common.exceptions as com | |||
from ibis.tests.backends import Csv, OmniSciDB, Pandas, Parquet | |||
from ibis.tests.backends import Csv, Impala, OmniSciDB, Pandas, Parquet, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we generally prefer wrapping these in parentheses rather than \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
…ommit but don't pass with non pyspark backends
Rebased on top of master. |
ibis/pyspark/compiler.py
Outdated
return compile_aggregator(t, expr, scope, F.max, context) | ||
|
||
def fn(col): | ||
if "window" in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default for string used in ibis is single quotes. probably we should keep all string using single quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks @hjoo
Implemented window operations for PySpark backend to pass all tests in `ibis/tests/all/test_window.py`: - `WindowOp` - `Lag` - `Lead` - `MinRank` - `DenseRank` - `NTile` - `FirstValue` - `LastValue` - `RowNumber` - `CumulativeSum` - `CumulativeMean` - `CumulativeMin` - `CumulativeMax` Also enhanced select aggregation operations (e.g. `Any`, `NotAny`, `All`, `NotAll`, `Count`, `Max`, `Min`, `Mean`, `Sum`) to be interoperable with windows. Author: Hyonjee <hyonjee.joo@twosigma.com> Closes ibis-project#1945 from hjoo/pyspark-window and squashes the following commits: 54a3405 [Hyonjee] change double quotes to single quotes in pyspark compiler.py 7cb2291 [Hyonjee] add helper methods for pyspark shift and cumulative window ops, refactor import line 5c82b19 [Hyonjee] remove extra test_window lead/lag tests that were added in previous commit but don't pass with non pyspark backends 7ac2dd8 [Hyonjee] skip unsupported window tests for OmniSciDB 419c309 [Hyonjee] fix test_window() in ibis/pyspark/tests/test_basic.py and remove xfail 7bd6585 [Hyonjee] window operations for pyspark backend
Implemented window operations for PySpark backend to pass all tests in
ibis/tests/all/test_window.py
:WindowOp
Lag
Lead
MinRank
DenseRank
NTile
FirstValue
LastValue
RowNumber
CumulativeSum
CumulativeMean
CumulativeMin
CumulativeMax
Also enhanced select aggregation operations (e.g.
Any
,NotAny
,All
,NotAll
,Count
,Max
,Min
,Mean
,Sum
) to be interoperable with windows.