-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement join for PySpark backend #1967
ENH: Implement join for PySpark backend #1967
Conversation
d549289
to
4560a38
Compare
ibis/pyspark/compiler.py
Outdated
| @@ -76,7 +76,8 @@ def compile_selection(t, expr, scope, **kwargs): | |||
| # TODO: Support sort_keys (see issue #1957) | |||
| if op.sort_keys: | |||
| raise NotImplementedError( | |||
| "predicates and sort_keys are not supported with Selection") | |||
| "predicates and sort_keys are not supported with Selection" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are automated changes by the pre commit hook. It appears that the previous commit doesn't use pre commit hook to format things.
|
cc @hjoo @toryhaavik |
ibis/tests/all/test_join.py
Outdated
| @@ -10,7 +10,7 @@ def left(batting): | |||
|
|
|||
| @pytest.fixture(scope='module') | |||
| def right(awards_players): | |||
| return awards_players[awards_players.lgID == 'NL'] | |||
| return awards_players[awards_players.lgID == 'NL'].drop(['yearID', 'lgID']) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are 'yearID' and 'lgID' dropped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having duplicate column names in the result confuses certain backends (for example PySpark). For the purpose of this test I think avoiding duplicate column names makes our lives easier.
|
@icexelloss can you please rebase and that will pick up the formatting changes done separately? |
4560a38
to
c8cefa1
Compare
|
@toryhaavik I've cleaned up the format changes. I am now picking Backends to be included in all/test_join.py to make CI green. Will ping you again when it's green. |
|
@toryhaavik I think the tests pass now (With the exception of one database connection/startup failure). So I think this PR is ready for another review. Can you please take a look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this LGTM, i'll merge once the tests pass
db00d90
to
15b7bf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @icexelloss
|
Thanks @toryhaavik ! |
Implement join functionality for PySpark backend. This PR only enables tests/all/test_join.py Author: Li Jin <ice.xelloss@gmail.com> Closes ibis-project#1967 from icexelloss/pyspark-backend-join and squashes the following commits: 15b7bf5 [Li Jin] Fix impala tests and fix csv test for python 3.5 782db34 [Li Jin] Clean up 54225c8 [Li Jin] Disable csv tests afbf374 [Li Jin] Only support Pandas and PySpark backend in test_join 75550cd [Li Jin] Clean up tests c8cefa1 [Li Jin] ENH: Implement join for PySpark backend
Implement join functionality for PySpark backend. This PR only enables tests/all/test_join.py