Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hail] Fix several CSE bugs #7479

Merged
merged 12 commits into from Nov 14, 2019

Conversation

@patrick-schultz
Copy link
Collaborator

patrick-schultz commented Nov 7, 2019

fixes #7418

The main change here is fixing #7418 by making nodes which perform aggregation aware of the nearest containing node which defines what is being aggregated over. This is done by making a new variable name---here called "agg_capability"---which is added to the child context by any node which (re)defines the meaning of aggregation (TableMapRows, AggFilter, etc.), and which is implicitly referenced by any node which performs an aggregation (ApplyAggOp, AggFilter, etc.).

This requires nodes to be able to bind variables which shadow variables already bound by parents, which it turns out wasn't handled correctly by the CSE algorithm. Fixing this required several changes:

  • I moved free variable computation to a lazy value on BaseIR. This way, each time we see a subtree x, we can recompute the max depth of the binding sites of all of x's free variables, since those binding sites may be different than last time we saw x. This also required splitting the free variables into value, agg, and scan sets, so they can be looked up in the correct context (previously context lookup was always done at the Ref node, at which point the variable was in the value context).
  • Now, in the CSEPrintPass, we have to recompute the same binding depth calculation that was done in the analysis pass, so we know which binding site to look at (previously I just searched all binding sites in scope, but with shadowing handled correctly we don't have sufficient information to decide which binding site is valid). This requires maintaining contexts in the print pass, which is annoying because we are now traversing the tree of Renderable children, which is not exactly the same as the IR tree.
  • To fix this, I duplicated all methods involving binding structure on BaseIR. To avoid having to write twice as many methods on concrete IR classes, I made the methods taking the index of the Renderable child (e.g. renderable_bindings) be the primary methods which are overridden. For all IR nodes, there is a map from IR child index to Renderable child index, defined in renderable_idx_of_child, which is used to define the bindings and similar methods in terms of the renderable_bindings and similar. This is messier than I would have liked, but I couldn't think of a better way.
Copy link
Collaborator

tpoterba left a comment

preliminary comments. will take a bit longer to digest fully.

@@ -1509,6 +1546,7 @@ def _compute_type(self, env, agg_env):
class InsertFields(IR):
class IFRenderField(Renderable):
def __init__(self, field, child):
super().__init__()

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 12, 2019

Collaborator

Python 🙄

child_min_value_binding_depth = self.min_value_binding_depth
child_scan_scope = self.scan_scope

# TODO: can this be moved into make_child_frame?

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 12, 2019

Collaborator

this is in make_child_frame, yes?

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Nov 14, 2019

Author Collaborator

🤦‍♂

ht = hl.utils.range_table(10)
x = hl.agg.count()
self.assertEqual(ht.aggregate((x, hl.agg.filter(ht.idx % 2 == 0, x))), (10, 5))

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 12, 2019

Collaborator

can we add a test that binds a agg.count() variable and uses it in both aggs and scans, and a test that binds a value and uses it in all three contexts?

def free_vars(self):
def vars_from_child(i):
if self.uses_agg_context(i):
return self.children[i].free_agg_vars

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 12, 2019

Collaborator

(let's dig into what's going on here, as we talked about)

@tpoterba

This comment has been minimized.

Copy link
Collaborator

tpoterba commented Nov 14, 2019

just a note that we're waiting for this PR to go in for the 0.2.27 release.

Copy link
Collaborator

tpoterba left a comment

great! I'll PR a release as soon as this goes in.

@tpoterba

This comment has been minimized.

Copy link
Collaborator

tpoterba commented Nov 14, 2019

lots of test failues

@danking danking merged commit f3ebafd into hail-is:master Nov 14, 2019
1 check passed
1 check passed
ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.