New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hail] Python common subexpression elimination #7009
Conversation
Doesn’t handle agg environments correctly.
commit 07e37a3 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Sep 5 15:10:30 2019 -0400 fixes commit f6b74c5 Merge: 6320d30 baa5251 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Sep 4 15:28:04 2019 -0400 Merge branch 'tim-agglet-fix' into python-cse commit 6320d30 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Sep 4 15:24:51 2019 -0400 put back old renderer commit 367819d Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Sep 4 11:00:55 2019 -0400 enable AggLets, other refactoring commit baa5251 Author: Tim Poterba <tpoterba@gmail.com> Date: Fri Aug 30 07:20:42 2019 -0400 bump! commit e3be663 Author: Tim Poterba <tpoterba@gmail.com> Date: Fri Aug 30 06:34:45 2019 -0400 fix let lifting bug commit 0e54c39 Author: Tim Poterba <tpoterba@gmail.com> Date: Thu Aug 29 10:24:15 2019 -0400 [hail] Fix performance problems in aggregators by binding arguments commit 2786aec Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 28 14:10:39 2019 -0400 use namedtuple and __slots__ commit 92c1f6b Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 28 13:34:20 2019 -0400 refactoring commit 3842f38 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 28 11:53:02 2019 -0400 refactoring commit f64ac3e Author: patrick-schultz <pschultz@broadinstitute.org> Date: Mon Aug 26 14:08:21 2019 -0400 refactoring commit 636aa5e Author: patrick-schultz <pschultz@broadinstitute.org> Date: Mon Aug 26 10:51:52 2019 -0400 refactoring, documenting commit 3d3671f Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 22 15:35:42 2019 -0400 refactoring commit 6aa0ba1 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 22 10:10:54 2019 -0400 refactoring commit 427d8c7 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 14 12:50:30 2019 -0400 wip commit fb6e986 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 14 10:43:19 2019 -0400 bugfixes commit 1cccb05 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Tue Aug 13 10:05:40 2019 -0400 refactoring, bugfixes commit d2df592 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Mon Aug 12 11:46:21 2019 -0400 refactoring commit 6ca6b23 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Mon Aug 12 10:48:31 2019 -0400 linkedlist -> stack commit 50ff007 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Fri Aug 9 16:18:55 2019 -0400 recursion -> loop commit 55c32ba Author: patrick-schultz <pschultz@broadinstitute.org> Date: Fri Aug 9 15:34:17 2019 -0400 defunctionalize commit 3db71db Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 8 15:05:37 2019 -0400 wip commit d196b2e Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 8 14:34:15 2019 -0400 inline print commit 16eb524 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 8 11:36:28 2019 -0400 inline print_renderable commit 8460050 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 8 11:23:40 2019 -0400 factor out args into state object commit f10306e Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 8 10:50:38 2019 -0400 CPS commit 654f219 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 7 13:04:47 2019 -0400 second pass: loop -> recursion commit ed6ce81 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 7 09:56:43 2019 -0400 more refactoring commit e12b7e5 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Aug 7 08:50:49 2019 -0400 some refactoring commit e0c76ad Author: patrick-schultz <pschultz@broadinstitute.org> Date: Fri Aug 2 15:11:21 2019 -0400 cleaning up commit 144a1bd Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 16:40:53 2019 -0400 recursion -> loop commit 28f23e3 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 16:28:47 2019 -0400 defunctionalization commit 6619b17 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 15:12:36 2019 -0400 inline recur commit c2e9f92 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 15:02:48 2019 -0400 CPS transform commit 9e788d3 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 14:29:51 2019 -0400 loop -> recursion commit 3c392e6 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Thu Aug 1 14:04:06 2019 -0400 finally fully working? commit ed133da Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Jul 31 14:40:18 2019 -0400 some refactoring commit 689ca40 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Mon Jul 29 14:57:16 2019 -0400 works with agg bindings commit 62e5680 Author: patrick-schultz <pschultz@broadinstitute.org> Date: Wed Jul 24 11:51:06 2019 -0400 New CSE IR renrerer in Python Doesn’t handle agg environments correctly.
# Conflicts: # hail/python/hail/ir/base_ir.py # hail/python/hail/ir/ir.py
# Conflicts: # hail/python/hail/ir/base_ir.py # hail/python/hail/ir/ir.py # hail/python/hail/ir/matrix_ir.py # hail/python/hail/ir/table_ir.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite nice! I have one question about the lets
while I digest for the rest of the morning. I want understand the control flow in the loop better, and that will take a little time.
else: | ||
lets = None | ||
|
||
if lets is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems possible for lets
to not be defined here, is that true?
If we hit the else
on 221, and neither of the if
s on 222 or 226, lets
will be a name error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lets = None
is the else clause of the for loop on 214, which is executed if we get through the for loop without triggering a break. All of the breaks follow setting lets
, and if none of those happen, we set lets = None
in the else clause.
hail/python/test/hail/test_ir.py
Outdated
' (ApplyBinaryPrimOp `+`' | ||
' (Ref __cse_1)' | ||
' (Ref __cse_1)))') | ||
self.assertEqual(expected, CSERenderer()(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for pytest assertions: assert expected == CSERenderer()(x)
They're visually cleaner and I think they generate slightly better errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize there were different assertions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more things
hail/python/hail/ir/renderer.py
Outdated
self._child_bindings = None | ||
|
||
def has_lifted_lets(self) -> bool: | ||
return self.lifted_lets or self.agg_lifted_lets or self.scan_lifted_lets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't return a boolean - it returns a dictionary, which does actually work using python truthiness in the only place it's actually used.
This code is equivalent to:
def has_lifted_lets(self) -> bool:
if self.lifted_lets:
return self.lifted_lets
elif self.agg_lifted_lets:
return self.agg_lifted_lets
return self.scan_lifted_lets
hail/python/hail/ir/renderer.py
Outdated
let_bodies=[]) | ||
|
||
|
||
class CSERenderer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renderer should be an abstract base class, with [Stupid]Renderer and CSERenderer extending it. Currently the design works because you define the right methods, but the types don't work out.
hail/python/hail/ir/renderer.py
Outdated
def __init__(self, | ||
node: Renderable, | ||
children: Sequence[Renderable], | ||
builder: Sequence[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a List, not Sequence. You use append
all over the place, and that's a List function, not part of the Sequence interfae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Though it looks like the right abc is MutableSequence.
hail/python/hail/ir/renderer.py
Outdated
# parent's 'builder', to save copying. | ||
self.builder = builder | ||
|
||
def add_lets(self, let_bodies: Sequence[str], out_builder: Sequence[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builder should be List (as above)
hail/python/hail/ir/renderer.py
Outdated
def make(node: Renderable, | ||
renderer: 'CSERenderer', | ||
binding_sites: Dict[int, BindingSite], | ||
builder: Sequence[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also lists here
will look again with high prio when tests are passing |
Ugh, I fixed that and thought I had pushed it. |
OK, thanks! |
I made |
Looks great! ship it! |
Stacked on #7000.
Adds a new IR renderer in Python which integrates a CSE pass.
It would be easy to argue that a CSE pass should be separate from the renderer. But we can't easily make the Python IR mutable, because a given IR tree might be used in multiple larger IR (which is exactly what this pass is taking advantage of!) so mutation which depends on the larger context won't work.
So rather than rebuild the entire IR every time we print, I decided for now this is best integrated into the renderer.
I think longer term this should be ported to scala as a full CSE pass (which first does hash-consing/value-numbering to find all repeated subexpressions).
This is not a simple algorithm, but I did my best to make it understandable. If anything feels harder to follow than it should be, I'd like to try to improve it.