Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hail] Python common subexpression elimination #7009

Merged
merged 50 commits into from Sep 17, 2019

Conversation

patrick-schultz
Copy link
Collaborator

Stacked on #7000.

Adds a new IR renderer in Python which integrates a CSE pass.

It would be easy to argue that a CSE pass should be separate from the renderer. But we can't easily make the Python IR mutable, because a given IR tree might be used in multiple larger IR (which is exactly what this pass is taking advantage of!) so mutation which depends on the larger context won't work.

So rather than rebuild the entire IR every time we print, I decided for now this is best integrated into the renderer.

I think longer term this should be ported to scala as a full CSE pass (which first does hash-consing/value-numbering to find all repeated subexpressions).

This is not a simple algorithm, but I did my best to make it understandable. If anything feels harder to follow than it should be, I'd like to try to improve it.

commit 07e37a3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Sep 5 15:10:30 2019 -0400

    fixes

commit f6b74c5
Merge: 6320d30 baa5251
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 15:28:04 2019 -0400

    Merge branch 'tim-agglet-fix' into python-cse

commit 6320d30
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 15:24:51 2019 -0400

    put back old renderer

commit 367819d
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Sep 4 11:00:55 2019 -0400

    enable AggLets, other refactoring

commit baa5251
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Fri Aug 30 07:20:42 2019 -0400

    bump!

commit e3be663
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Fri Aug 30 06:34:45 2019 -0400

    fix let lifting bug

commit 0e54c39
Author: Tim Poterba <tpoterba@gmail.com>
Date:   Thu Aug 29 10:24:15 2019 -0400

    [hail] Fix performance problems in aggregators by binding arguments

commit 2786aec
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 14:10:39 2019 -0400

    use namedtuple and __slots__

commit 92c1f6b
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 13:34:20 2019 -0400

    refactoring

commit 3842f38
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 28 11:53:02 2019 -0400

    refactoring

commit f64ac3e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 26 14:08:21 2019 -0400

    refactoring

commit 636aa5e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 26 10:51:52 2019 -0400

    refactoring, documenting

commit 3d3671f
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 22 15:35:42 2019 -0400

    refactoring

commit 6aa0ba1
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 22 10:10:54 2019 -0400

    refactoring

commit 427d8c7
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 14 12:50:30 2019 -0400

    wip

commit fb6e986
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 14 10:43:19 2019 -0400

    bugfixes

commit 1cccb05
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Tue Aug 13 10:05:40 2019 -0400

    refactoring, bugfixes

commit d2df592
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 12 11:46:21 2019 -0400

    refactoring

commit 6ca6b23
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Aug 12 10:48:31 2019 -0400

    linkedlist -> stack

commit 50ff007
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 9 16:18:55 2019 -0400

    recursion -> loop

commit 55c32ba
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 9 15:34:17 2019 -0400

    defunctionalize

commit 3db71db
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 15:05:37 2019 -0400

    wip

commit d196b2e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 14:34:15 2019 -0400

    inline print

commit 16eb524
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 11:36:28 2019 -0400

    inline print_renderable

commit 8460050
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 11:23:40 2019 -0400

    factor out args into state object

commit f10306e
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 8 10:50:38 2019 -0400

    CPS

commit 654f219
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 13:04:47 2019 -0400

    second pass: loop -> recursion

commit ed6ce81
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 09:56:43 2019 -0400

    more refactoring

commit e12b7e5
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Aug 7 08:50:49 2019 -0400

    some refactoring

commit e0c76ad
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Fri Aug 2 15:11:21 2019 -0400

    cleaning up

commit 144a1bd
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 16:40:53 2019 -0400

    recursion -> loop

commit 28f23e3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 16:28:47 2019 -0400

    defunctionalization

commit 6619b17
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 15:12:36 2019 -0400

    inline recur

commit c2e9f92
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 15:02:48 2019 -0400

    CPS transform

commit 9e788d3
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 14:29:51 2019 -0400

    loop -> recursion

commit 3c392e6
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Thu Aug 1 14:04:06 2019 -0400

    finally fully working?

commit ed133da
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Jul 31 14:40:18 2019 -0400

    some refactoring

commit 689ca40
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Mon Jul 29 14:57:16 2019 -0400

    works with agg bindings

commit 62e5680
Author: patrick-schultz <pschultz@broadinstitute.org>
Date:   Wed Jul 24 11:51:06 2019 -0400

    New CSE IR renrerer in Python

    Doesn’t handle agg environments correctly.
# Conflicts:
#	hail/python/hail/ir/base_ir.py
#	hail/python/hail/ir/ir.py
@patrick-schultz patrick-schultz changed the title Python common subexpression elimination [Hail] Python common subexpression elimination Sep 6, 2019
# Conflicts:
#	hail/python/hail/ir/base_ir.py
#	hail/python/hail/ir/ir.py
#	hail/python/hail/ir/matrix_ir.py
#	hail/python/hail/ir/table_ir.py
Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite nice! I have one question about the lets while I digest for the rest of the morning. I want understand the control flow in the loop better, and that will take a little time.

else:
lets = None

if lets is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems possible for lets to not be defined here, is that true?

If we hit the else on 221, and neither of the ifs on 222 or 226, lets will be a name error here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lets = None is the else clause of the for loop on 214, which is executed if we get through the for loop without triggering a break. All of the breaks follow setting lets, and if none of those happen, we set lets = None in the else clause.

' (ApplyBinaryPrimOp `+`'
' (Ref __cse_1)'
' (Ref __cse_1)))')
self.assertEqual(expected, CSERenderer()(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for pytest assertions: assert expected == CSERenderer()(x)

They're visually cleaner and I think they generate slightly better errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize there were different assertions!

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things

self._child_bindings = None

def has_lifted_lets(self) -> bool:
return self.lifted_lets or self.agg_lifted_lets or self.scan_lifted_lets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't return a boolean - it returns a dictionary, which does actually work using python truthiness in the only place it's actually used.

This code is equivalent to:

def has_lifted_lets(self) -> bool:
	if self.lifted_lets:
       return self.lifted_lets
    elif self.agg_lifted_lets:
        return self.agg_lifted_lets
    return self.scan_lifted_lets

let_bodies=[])


class CSERenderer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renderer should be an abstract base class, with [Stupid]Renderer and CSERenderer extending it. Currently the design works because you define the right methods, but the types don't work out.

def __init__(self,
node: Renderable,
children: Sequence[Renderable],
builder: Sequence[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a List, not Sequence. You use append all over the place, and that's a List function, not part of the Sequence interfae.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Though it looks like the right abc is MutableSequence.

# parent's 'builder', to save copying.
self.builder = builder

def add_lets(self, let_bodies: Sequence[str], out_builder: Sequence[str]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder should be List (as above)

def make(node: Renderable,
renderer: 'CSERenderer',
binding_sites: Dict[int, BindingSite],
builder: Sequence[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lists here

tpoterba
tpoterba previously approved these changes Sep 11, 2019
@tpoterba tpoterba dismissed their stale review September 11, 2019 20:51

err, got test failures

@tpoterba
Copy link
Contributor

will look again with high prio when tests are passing

@patrick-schultz
Copy link
Collaborator Author

Ugh, I fixed that and thought I had pushed it.

@tpoterba
Copy link
Contributor

OK, thanks!

@patrick-schultz
Copy link
Collaborator Author

I made define_function go through the CSE path, and it broke because CSE was assuming an IR with no free variables. I fixed CSE to take a list of free variables.

@tpoterba
Copy link
Contributor

Looks great! ship it!

@danking danking merged commit 5932898 into hail-is:master Sep 17, 2019
@patrick-schultz patrick-schultz deleted the python-cse branch September 17, 2019 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants