Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hail] Python common subexpression elimination #7009

Merged
merged 50 commits into from Sep 17, 2019

Conversation

@patrick-schultz
Copy link
Collaborator

commented Sep 6, 2019

Stacked on #7000.

Adds a new IR renderer in Python which integrates a CSE pass.

It would be easy to argue that a CSE pass should be separate from the renderer. But we can't easily make the Python IR mutable, because a given IR tree might be used in multiple larger IR (which is exactly what this pass is taking advantage of!) so mutation which depends on the larger context won't work.

So rather than rebuild the entire IR every time we print, I decided for now this is best integrated into the renderer.

I think longer term this should be ported to scala as a full CSE pass (which first does hash-consing/value-numbering to find all repeated subexpressions).

This is not a simple algorithm, but I did my best to make it understandable. If anything feels harder to follow than it should be, I'd like to try to improve it.

commit 07e37a3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Sep 5 15:10:30 2019 -0400

    fixes

commit f6b74c5
Merge: 6320d30 baa5251
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 15:28:04 2019 -0400

    Merge branch 'tim-agglet-fix' into python-cse

commit 6320d30
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 15:24:51 2019 -0400

    put back old renderer

commit 367819d
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 11:00:55 2019 -0400

    enable AggLets, other refactoring

commit baa5251
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Fri Aug 30 07:20:42 2019 -0400

    bump!

commit e3be663
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Fri Aug 30 06:34:45 2019 -0400

    fix let lifting bug

commit 0e54c39
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Thu Aug 29 10:24:15 2019 -0400

    [hail] Fix performance problems in aggregators by binding arguments

commit 2786aec
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 14:10:39 2019 -0400

    use namedtuple and __slots__

commit 92c1f6b
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 13:34:20 2019 -0400

    refactoring

commit 3842f38
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 11:53:02 2019 -0400

    refactoring

commit f64ac3e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 26 14:08:21 2019 -0400

    refactoring

commit 636aa5e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 26 10:51:52 2019 -0400

    refactoring, documenting

commit 3d3671f
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 22 15:35:42 2019 -0400

    refactoring

commit 6aa0ba1
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 22 10:10:54 2019 -0400

    refactoring

commit 427d8c7
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 14 12:50:30 2019 -0400

    wip

commit fb6e986
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 14 10:43:19 2019 -0400

    bugfixes

commit 1cccb05
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Tue Aug 13 10:05:40 2019 -0400

    refactoring, bugfixes

commit d2df592
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 12 11:46:21 2019 -0400

    refactoring

commit 6ca6b23
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 12 10:48:31 2019 -0400

    linkedlist -> stack

commit 50ff007
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 9 16:18:55 2019 -0400

    recursion -> loop

commit 55c32ba
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 9 15:34:17 2019 -0400

    defunctionalize

commit 3db71db
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 15:05:37 2019 -0400

    wip

commit d196b2e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 14:34:15 2019 -0400

    inline print

commit 16eb524
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 11:36:28 2019 -0400

    inline print_renderable

commit 8460050
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 11:23:40 2019 -0400

    factor out args into state object

commit f10306e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 10:50:38 2019 -0400

    CPS

commit 654f219
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 13:04:47 2019 -0400

    second pass: loop -> recursion

commit ed6ce81
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 09:56:43 2019 -0400

    more refactoring

commit e12b7e5
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 08:50:49 2019 -0400

    some refactoring

commit e0c76ad
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 2 15:11:21 2019 -0400

    cleaning up

commit 144a1bd
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 16:40:53 2019 -0400

    recursion -> loop

commit 28f23e3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 16:28:47 2019 -0400

    defunctionalization

commit 6619b17
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 15:12:36 2019 -0400

    inline recur

commit c2e9f92
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 15:02:48 2019 -0400

    CPS transform

commit 9e788d3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 14:29:51 2019 -0400

    loop -> recursion

commit 3c392e6
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 14:04:06 2019 -0400

    finally fully working?

commit ed133da
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Jul 31 14:40:18 2019 -0400

    some refactoring

commit 689ca40
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Jul 29 14:57:16 2019 -0400

    works with agg bindings

commit 62e5680
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Jul 24 11:51:06 2019 -0400

    New CSE IR renrerer in Python

    Doesn’t handle agg environments correctly.
# Conflicts:
#	hail/python/hail/ir/base_ir.py
#	hail/python/hail/ir/ir.py
@patrick-schultz patrick-schultz changed the title Python common subexpression elimination [Hail] Python common subexpression elimination Sep 6, 2019
@tpoterba tpoterba added stacked PR and removed stacked PR labels Sep 9, 2019
# Conflicts:
#	hail/python/hail/ir/base_ir.py
#	hail/python/hail/ir/ir.py
#	hail/python/hail/ir/matrix_ir.py
#	hail/python/hail/ir/table_ir.py
fix
Copy link
Collaborator

left a comment

This is quite nice! I have one question about the lets while I digest for the rest of the morning. I want understand the control flow in the loop better, and that will take a little time.

else:
lets = None

if lets is not None:

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

it seems possible for lets to not be defined here, is that true?

If we hit the else on 221, and neither of the ifs on 222 or 226, lets will be a name error here.

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Sep 11, 2019

Author Collaborator

The lets = None is the else clause of the for loop on 214, which is executed if we get through the for loop without triggering a break. All of the breaks follow setting lets, and if none of those happen, we set lets = None in the else clause.

' (ApplyBinaryPrimOp `+`'
' (Ref __cse_1)'
' (Ref __cse_1)))')
self.assertEqual(expected, CSERenderer()(x))

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

I have a slight preference for pytest assertions: assert expected == CSERenderer()(x)

They're visually cleaner and I think they generate slightly better errors

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Sep 11, 2019

Author Collaborator

Ah, I didn't realize there were different assertions!

Copy link
Collaborator

left a comment

A few more things

self._child_bindings = None

def has_lifted_lets(self) -> bool:
return self.lifted_lets or self.agg_lifted_lets or self.scan_lifted_lets

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

this doesn't return a boolean - it returns a dictionary, which does actually work using python truthiness in the only place it's actually used.

This code is equivalent to:

def has_lifted_lets(self) -> bool:
	if self.lifted_lets:
       return self.lifted_lets
    elif self.agg_lifted_lets:
        return self.agg_lifted_lets
    return self.scan_lifted_lets
let_bodies=[])


class CSERenderer:

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

Renderer should be an abstract base class, with [Stupid]Renderer and CSERenderer extending it. Currently the design works because you define the right methods, but the types don't work out.

def __init__(self,
node: Renderable,
children: Sequence[Renderable],
builder: Sequence[str],

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

Should be a List, not Sequence. You use append all over the place, and that's a List function, not part of the Sequence interfae.

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Sep 11, 2019

Author Collaborator

Ah right. Though it looks like the right abc is MutableSequence.

# parent's 'builder', to save copying.
self.builder = builder

def add_lets(self, let_bodies: Sequence[str], out_builder: Sequence[str]):

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

builder should be List (as above)

def make(node: Renderable,
renderer: 'CSERenderer',
binding_sites: Dict[int, BindingSite],
builder: Sequence[str],

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 11, 2019

Collaborator

also lists here

@tpoterba tpoterba dismissed their stale review Sep 11, 2019

err, got test failures

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Sep 13, 2019

will look again with high prio when tests are passing

fix
@patrick-schultz

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 13, 2019

Ugh, I fixed that and thought I had pushed it.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Sep 13, 2019

OK, thanks!

@patrick-schultz

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 13, 2019

I made define_function go through the CSE path, and it broke because CSE was assuming an IR with no free variables. I fixed CSE to take a list of free variables.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

commented Sep 17, 2019

Looks great! ship it!

@danking danking merged commit 5932898 into hail-is:master Sep 17, 2019
1 check passed
1 check passed
ci-test success
Details
@patrick-schultz patrick-schultz deleted the patrick-schultz:python-cse branch Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.