Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement join for PySpark backend #1967

Merged
merged 6 commits into from
Sep 18, 2019

Conversation

icexelloss
Copy link
Contributor

Implement join functionality for PySpark backend. This PR only enables tests/all/test_join.py

@@ -76,7 +76,8 @@ def compile_selection(t, expr, scope, **kwargs):
# TODO: Support sort_keys (see issue #1957)
if op.sort_keys:
raise NotImplementedError(
"predicates and sort_keys are not supported with Selection")
"predicates and sort_keys are not supported with Selection"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are automated changes by the pre commit hook. It appears that the previous commit doesn't use pre commit hook to format things.

@icexelloss
Copy link
Contributor Author

cc @hjoo @toryhaavik

@@ -10,7 +10,7 @@ def left(batting):

@pytest.fixture(scope='module')
def right(awards_players):
return awards_players[awards_players.lgID == 'NL']
return awards_players[awards_players.lgID == 'NL'].drop(['yearID', 'lgID'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are 'yearID' and 'lgID' dropped here?

Copy link
Contributor Author

@icexelloss icexelloss Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having duplicate column names in the result confuses certain backends (for example PySpark). For the purpose of this test I think avoiding duplicate column names makes our lives easier.

@toryhaavik
Copy link
Contributor

@icexelloss can you please rebase and that will pick up the formatting changes done separately?

@icexelloss
Copy link
Contributor Author

@toryhaavik I've cleaned up the format changes. I am now picking Backends to be included in all/test_join.py to make CI green. Will ping you again when it's green.

@icexelloss
Copy link
Contributor Author

@toryhaavik I think the tests pass now (With the exception of one database connection/startup failure). So I think this PR is ready for another review. Can you please take a look? Thanks!

Copy link
Contributor

@toryhaavik toryhaavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM, i'll merge once the tests pass

Copy link
Contributor

@toryhaavik toryhaavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @icexelloss

@toryhaavik toryhaavik merged commit 44961f6 into ibis-project:master Sep 18, 2019
@icexelloss
Copy link
Contributor Author

Thanks @toryhaavik !

costrouc pushed a commit to costrouc/ibis that referenced this pull request Oct 10, 2019
Implement join functionality for PySpark backend. This PR only enables
tests/all/test_join.py
Author: Li Jin <ice.xelloss@gmail.com>

Closes ibis-project#1967 from icexelloss/pyspark-backend-join and squashes the following commits:

15b7bf5 [Li Jin] Fix impala tests and fix csv test for python 3.5
782db34 [Li Jin] Clean up
54225c8 [Li Jin] Disable csv tests
afbf374 [Li Jin] Only support Pandas and PySpark backend in test_join
75550cd [Li Jin] Clean up tests
c8cefa1 [Li Jin] ENH: Implement join for PySpark backend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants