Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/DEV: Make ibis Node instances hashable #1611

Closed
wants to merge 34 commits into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 5, 2018

Closes #890

@cpcloud cpcloud added this to the Next Release milestone Sep 5, 2018
@cpcloud cpcloud added feature Features or general enhancements expressions Issues or PRs related to the expression API labels Sep 5, 2018
return False

this_exprs = self._all_exprs()
other_exprs = other._all_exprs()

if self.limit != other.limit:
cache[(self, other)] = False
cache[self, other] = False
return False

for x, y in zip(this_exprs, other_exprs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to check the length of this_exprs and other_exprs otherwise we'll hit the same bug we have for equality.

return False

if len(self.args) != len(other.args):
cache[(self, other)] = False
cache[self, other] = False
return False

for left, right in zip(self.args, other.args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use simply:

all_equal(self.args, other.args, cache=cache)

But the length of the arguments needs to be checked here as well, because map zips its arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equality PR will address this length issue. I will refactor this and the code near your other comment to use all_equal once that is PR is merged.

ibis/expr/visualize.py Outdated Show resolved Hide resolved

cache[(self, other)] = True
return True
cache[self, other] = len(this_exprs) == len(other_exprs) and all(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all_equals instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address after #1600.

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code got much much cleaner and better!

We were incorrectly using weakref.WeakValueDictionary to hold temp table
instances that mapped to themselves.

This defeats the purpose of using a weakref container because the
WeakValueDictionary will hold a strong reference to the key and weak
reference to the value. In this scenario we will forever hold at least
one strong reference per key and leak memory.

By using weakref.WeakSet we now only weak references on the table
instance.
@cpcloud
Copy link
Member Author

cpcloud commented Sep 7, 2018

@kszucs arg it looks like there was a backwards incompatible change in nbconvert from 5.3.1 to 5.4.0 :(

@cpcloud
Copy link
Member Author

cpcloud commented Sep 10, 2018

@kszucs Can you review this again?

@kszucs
Copy link
Member

kszucs commented Sep 10, 2018

Sure, reviewing now

substitutor = Substitutor()
return substitutor.substitute(expr, mapping)
"""Substitute subexpressions in `expr` with expression to expression
mapping `substitutions`.
Copy link
Member

@kszucs kszucs Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got sweaty. Surprisingly rhythmical and symmetric sentence :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now that you've pointed it out it's making my head hurt.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 10, 2018

There's a large performance regression introduced by this PR that I'm looking into.

ibis/util.py Outdated
@@ -291,3 +249,36 @@ def get_logger(name, level=None, format=None, propagate=False):
logging, os.environ.get('LOGLEVEL', 'WARNING').upper()))
logger.addHandler(handler)
return logger


def safe_get_name(expr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these would be nicer as Expr methods.

----------
expr : ibis.expr.types.Expr
An Ibis expression
substitutions : Mapping[ibis.expr.types.Expr, ibis.expr.types.Expr]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the dict comprehension below, substitutions seems like a list of 2-tuples.

@@ -61,38 +60,38 @@ def _substitute(self, expr, mapping):
Parameters
----------
expr : ibis.expr.types.Expr
mapping : Dict, OrderedDict
mapping : Mapping[ibis.expr.types.Expr, ibis.expr.types.Expr]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping[ibis.expr.types.Node, ibis.expr.types.Expr]?

unchanged = True
for i, arg in enumerate(new_args):
if isinstance(arg, ir.Expr):
new_arg = self.substitute(arg, mapping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recursion seems like a task for lin.traverse. Issue?

self._hash = hash(
(type(self),) + tuple(
element.op() if isinstance(element, ir.Expr) else element
for element in self.flat_args()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get rid of flat_args and prevent having any non-expressions in the hierarchy, except the literal leaves.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 10, 2018

Ok, figured out what perf problem is: I wasn't short circuiting with self is other in Node.equals.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 10, 2018

@kszucs Fixed the docs, and added those functions as private methods.

@cpcloud cpcloud closed this in 2cfd385 Sep 10, 2018
@cpcloud cpcloud deleted the hashable-ops branch September 10, 2018 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants