-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix scope get to use hashmap lookup instead of list lookup #2386
Conversation
|
I found this class to be a little confusing. It looks like we are emulating dict interface by implementing
I think clarifying these questions would be helpful to make it more clear how to use this class. |
|
Thanks, @icexelloss , So given this For @icexelloss's question about |
|
I refactored a little bit.
To make this class clearer, these are the exposed API for
|
ibis/expr/scope.py
Outdated
| @@ -122,17 +116,17 @@ def merge_scope(self, other_scope: 'Scope', overwrite=False) -> 'Scope': | |||
| """ | |||
| result = Scope() | |||
|
|
|||
| for op in self.items(): | |||
| result[op] = self[op] | |||
| for op in self._items.keys(): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this basically what you want?
https://stackoverflow.com/questions/6354436/python-dictionary-merge-by-updating-but-not-overwriting-if-value-exists
new_items = {}
if overwrite:
new_items = dict(itertools.chain(self._items.items(), other._items.items()))
else:
new_items = dict(itertools.chain(other._items.items(), self._items.items()))
return new Scope(new_items)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice but to update we cannot simply do these. We should think about time context and the logic is covered in result.get_value(op, v.timecontext) is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Ok what you have is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we define
def __iter__(self):
return iter(self._items.keys())
then
for op in self:
...
will just work
or is this more confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, scope is indeed iterable, I will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
|
I think it makes sense. Emulating dict interface is nice but not necessarily IMO. |
ibis/expr/scope.py
Outdated
| def __setitem__(self, op: Node, value: Any) -> None: | ||
| self._items[op] = value | ||
| def __contains__(self, op): | ||
| return op in self._items.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just
return op in self._items
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see Jeff's comments. Ok this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, no i agree this should be in self._items, only iteration should use the keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my knowledge, is k in self._items.keys O(n) or O(1)? The type of self._items.keys is dict_keys so it's not clear to me if the __contains__ method of dict_key is linear or constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around and seems that in Python 3.x, k in self._items and k in self._items.keys() are equivalent and they are all O(1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with a Python 3.6 kernel
>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
35.8 ns ± 0.757 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit 10000 in dic.keys()
74.7 ns ± 0.385 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
seems keys() is still 2x slower than testing in for dict directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ibis/expr/scope.py
Outdated
| def __setitem__(self, op: Node, value: Any) -> None: | ||
| self._items[op] = value | ||
| def __contains__(self, op): | ||
| return op in self._items.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, no i agree this should be in self._items, only iteration should use the keys
ibis/expr/scope.py
Outdated
| @@ -122,17 +116,17 @@ def merge_scope(self, other_scope: 'Scope', overwrite=False) -> 'Scope': | |||
| """ | |||
| result = Scope() | |||
|
|
|||
| for op in self.items(): | |||
| result[op] = self[op] | |||
| for op in self._items.keys(): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we define
def __iter__(self):
return iter(self._items.keys())
then
for op in self:
...
will just work
or is this more confusing?
ibis/expr/scope.py
Outdated
| result._items[op] = self._items[op] | ||
|
|
||
| for op in other_scope._items.keys(): | ||
| for op in other_scope._items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
for op in other_scope
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this PR number to the original scope update in the release notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. let's merge on green.
|
Thank you all for reviewing! Green now @jreback . |
Overview
This PR aims to fix a bug in looking up an
opinscope. The current implementation is using a list lookup to look up items initer(_items). This is slower than a hashmap lookup, also it may cause bugs for objects if their equality is ill-defined.Example
In this case,
Literalis ill-defined for equality. And if we have onlyain scope.b in iter(scope._items)will beTruewhileb in scope._items` return false.Tests
This is tested in
ibis/pandas/tests/tes_core.pywith a new test casetest_scope_look_up.