-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify implementation of fillna and isna in Pyspark backend #2882
Conversation
ibis/backends/pyspark/compiler.py
Outdated
| @@ -1742,7 +1742,9 @@ def compile_if_null(t, expr, scope, timecontext, **kwargs): | |||
| op = expr.op() | |||
| col = t.translate(op.arg, scope, timecontext) | |||
| ifnull_col = t.translate(op.ifnull_expr, scope, timecontext) | |||
| return F.when(col.isNull(), ifnull_col).otherwise(col) | |||
| return F.when(col.isNull(), ifnull_col).otherwise( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F.when(col.isNull() | F.isnan(col), ifnull_col).otherwise(col)
Does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated
| drop=True | ||
| ) | ||
| tm.assert_frame_equal(result, expected) | ||
|
|
||
|
|
||
| def test_fillna(client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this test to the shared test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to test_generic.py
| tm.assert_frame_equal(result, expected) | ||
|
|
||
|
|
||
| def test_notnull(client): | ||
| table = client.table('nan_table') | ||
| def test_isna(client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to the shared test to ensure same behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to test_generic.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Small comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a release note
| @@ -30,6 +30,43 @@ def test_fillna_nullif(backend, con, expr, expected): | |||
| assert con.execute(expr) == expected | |||
|
|
|||
|
|
|||
| @pytest.mark.only_on_backends(['pandas', 'dask', 'pyspark']) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these work on other backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They error out in SQL, impala, etc. https://github.com/ibis-project/ibis/runs/3196422295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
thanks @emilyreff7 |
Proposed Change
Unify the Pyspark backend's treatment of
fillnaandisnato match the Pandas backend.Before this PR:
fillnanp.nanfillnaNoneisnanp.nanisnaNoneAfter this PR, the Pyspark behavior for
fillnawithnp.nan, andisnawith nulls, will match the Pandas behavior.Testing
Added tests for Pyspark in
test_null.pyto assert this new behavior.