-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: use Scope class for scope in pyspark backend #2402
FEAT: use Scope class for scope in pyspark backend #2402
Conversation
|
As per offline discussion with @icexelloss, I include the work for the first followup: enable time context in pyspark backend, into this PR as well。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Left some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1
|
Thanks @icexelloss ! @jreback would you mind taking a look when you have time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable. question about __contains__ pls add a new release note (or can just add this issue to the original time context one)
ibis/expr/scope.py
Outdated
| timecontext: Optional[TimeContext] | ||
| time context associate with the result. | ||
| value : Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ibis/expr/scope.py
Outdated
|
|
||
| Returns | ||
| ------- | ||
| Scope | ||
| a new Scope instance with op in it. | ||
| """ | ||
| return Scope({op: ScopeItem(result, timecontext)}) | ||
| scope = Scope() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wy don't you do this in the Scope({op: ....})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I should do that, fixed.
|
@jreback I addressed comments and added this PR to release note. CI is green now. |
ibis/expr/scope.py
Outdated
| return op in self._items | ||
|
|
||
| def __iter__(self): | ||
| return iter(self._items.keys()) | ||
|
|
||
| def set_value( | ||
| self, op: Node, timecontext: Optional[TimeContext], value: Any | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type -> None as the return (I assume it doesnt return anything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I added type to the return value.
ibis/expr/scope.py
Outdated
| @@ -159,7 +212,7 @@ def merge_scopes( | |||
|
|
|||
|
|
|||
| def make_scope( | |||
| op: Node, result: Any, timecontext: Optional[TimeContext] = None | |||
| op: Node, timecontext: Optional[TimeContext], value: Any | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not above making timecontext and value keyword only here, e.g add a * in the signature, then you don't accidently set one
ibis/expr/scope.py
Outdated
| @@ -159,7 +212,7 @@ def merge_scopes( | |||
|
|
|||
|
|
|||
| def make_scope( | |||
| op: Node, result: Any, timecontext: Optional[TimeContext] = None | |||
| op: Node, timecontext: Optional[TimeContext], value: Any | |||
| ) -> 'Scope': | |||
| """make a Scope instance, adding (op, result, timecontext) into the | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find this slightly weird that make_scope takes a Node but Scope takes a dict. can we unify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now with the new constructor of Scope it makes sense to kill this make_scope entirely and use the Scope constructor only. This will simplify the usage of this class.
|
lgtm. ping on green. |
|
Thanks @jreback CI green now |
|
thanks @LeeTZ |
What is the change
This PR purpose a change to the data structure scope in pyspark execution. As Implemented in #2306, this PR enables Scope class to replace dict implementation for scope for pyspark backend, also, timecontext is added as a param in pyspark backend
Notable changes
In this PR a new API
set_valueis added forScopeclass, which makesScopeclass immutable. This is needed for pyspark backend since the current logic caches all results in a global variable scope. All pyspark translations are reading from the same globalscope. Which is different from pandas execution.As per our discussion in #2386, we may not want
__get_item__/__set_item__implemented for more confusion in using the class. Therefore in this PR, aset_valueAPI, similar to ourget_valueAPI is proposed. This API is only used in pyspark backend for now.How is this change tested
Tests that save, retrieve, and modify data in scope are covered in
pyspark/tests/test_basic.py. This PR passes all tests.Follow-ups
To make things clear, I will keep this PR simple and do one thing at a time. Will address these in followup PRs.