Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: use Scope class for scope in pyspark backend #2402

Merged
merged 11 commits into from
Sep 25, 2020

Conversation

LeeTZ
Copy link
Contributor

@LeeTZ LeeTZ commented Sep 22, 2020

What is the change

This PR purpose a change to the data structure scope in pyspark execution. As Implemented in #2306, this PR enables Scope class to replace dict implementation for scope for pyspark backend, also, timecontext is added as a param in pyspark backend

Notable changes

In this PR a new API set_value is added for Scope class, which makes Scope class immutable. This is needed for pyspark backend since the current logic caches all results in a global variable scope. All pyspark translations are reading from the same global scope. Which is different from pandas execution.

As per our discussion in #2386, we may not want __get_item__ / __set_item__ implemented for more confusion in using the class. Therefore in this PR, a set_value API, similar to our get_value API is proposed. This API is only used in pyspark backend for now.

How is this change tested

Tests that save, retrieve, and modify data in scope are covered in pyspark/tests/test_basic.py. This PR passes all tests.

Follow-ups

  • Fix spark windowing for time indexed table
  • Add time context adjustment logic for pyspark backend

To make things clear, I will keep this PR simple and do one thing at a time. Will address these in followup PRs.

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Sep 23, 2020

As per offline discussion with @icexelloss, I include the work for the first followup: enable time context in pyspark backend, into this PR as well。

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Left some comments.

ibis/expr/scope.py Show resolved Hide resolved
ibis/pyspark/client.py Outdated Show resolved Hide resolved
ibis/pyspark/client.py Outdated Show resolved Hide resolved
ibis/pyspark/compiler.py Show resolved Hide resolved
ibis/pyspark/timecontext.py Outdated Show resolved Hide resolved
ibis/pyspark/tests/test_timecontext.py Show resolved Hide resolved
@icexelloss icexelloss self-requested a review September 23, 2020 21:47
Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Sep 23, 2020

Thanks @icexelloss ! @jreback would you mind taking a look when you have time?

@jreback jreback added pyspark The Apache PySpark backend backends - spark labels Sep 23, 2020
@jreback jreback added this to the Next Bugfix Release milestone Sep 23, 2020
@LeeTZ LeeTZ closed this Sep 23, 2020
@LeeTZ LeeTZ reopened this Sep 23, 2020
@LeeTZ LeeTZ closed this Sep 23, 2020
@LeeTZ LeeTZ reopened this Sep 23, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable. question about __contains__ pls add a new release note (or can just add this issue to the original time context one)

ibis/client.py Show resolved Hide resolved
ibis/expr/scope.py Show resolved Hide resolved
ibis/expr/scope.py Show resolved Hide resolved
ibis/expr/scope.py Show resolved Hide resolved
timecontext: Optional[TimeContext]
time context associate with the result.
value : Object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


Returns
-------
Scope
a new Scope instance with op in it.
"""
return Scope({op: ScopeItem(result, timecontext)})
scope = Scope()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wy don't you do this in the Scope({op: ....})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I should do that, fixed.

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Sep 25, 2020

@jreback I addressed comments and added this PR to release note. CI is green now.

return op in self._items

def __iter__(self):
return iter(self._items.keys())

def set_value(
self, op: Node, timecontext: Optional[TimeContext], value: Any
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type -> None as the return (I assume it doesnt return anything)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I added type to the return value.

@@ -159,7 +212,7 @@ def merge_scopes(


def make_scope(
op: Node, result: Any, timecontext: Optional[TimeContext] = None
op: Node, timecontext: Optional[TimeContext], value: Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would not above making timecontext and value keyword only here, e.g add a * in the signature, then you don't accidently set one

@@ -159,7 +212,7 @@ def merge_scopes(


def make_scope(
op: Node, result: Any, timecontext: Optional[TimeContext] = None
op: Node, timecontext: Optional[TimeContext], value: Any
) -> 'Scope':
"""make a Scope instance, adding (op, result, timecontext) into the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this slightly weird that make_scope takes a Node but Scope takes a dict. can we unify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now with the new constructor of Scope it makes sense to kill this make_scope entirely and use the Scope constructor only. This will simplify the usage of this class.

@jreback
Copy link
Contributor

jreback commented Sep 25, 2020

lgtm. ping on green.

@LeeTZ
Copy link
Contributor Author

LeeTZ commented Sep 25, 2020

Thanks @jreback CI green now

@jreback jreback merged commit 8b01fa1 into ibis-project:master Sep 25, 2020
@jreback
Copy link
Contributor

jreback commented Sep 25, 2020

thanks @LeeTZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pyspark The Apache PySpark backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants