-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hail] Fix several CSE bugs #7479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preliminary comments. will take a bit longer to digest fully.
@@ -1509,6 +1546,7 @@ def _compute_type(self, env, agg_env): | |||
class InsertFields(IR): | |||
class IFRenderField(Renderable): | |||
def __init__(self, field, child): | |||
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 🙄
hail/python/hail/ir/renderer.py
Outdated
child_min_value_binding_depth = self.min_value_binding_depth | ||
child_scan_scope = self.scan_scope | ||
|
||
# TODO: can this be moved into make_child_frame? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in make_child_frame, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂
ht = hl.utils.range_table(10) | ||
x = hl.agg.count() | ||
self.assertEqual(ht.aggregate((x, hl.agg.filter(ht.idx % 2 == 0, x))), (10, 5)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test that binds a agg.count()
variable and uses it in both aggs and scans, and a test that binds a value and uses it in all three contexts?
hail/python/hail/ir/base_ir.py
Outdated
def free_vars(self): | ||
def vars_from_child(i): | ||
if self.uses_agg_context(i): | ||
return self.children[i].free_agg_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let's dig into what's going on here, as we talked about)
just a note that we're waiting for this PR to go in for the 0.2.27 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! I'll PR a release as soon as this goes in.
lots of test failues |
fixes #7418
The main change here is fixing #7418 by making nodes which perform aggregation aware of the nearest containing node which defines what is being aggregated over. This is done by making a new variable name---here called "agg_capability"---which is added to the child context by any node which (re)defines the meaning of aggregation (
TableMapRows
,AggFilter
, etc.), and which is implicitly referenced by any node which performs an aggregation (ApplyAggOp
,AggFilter
, etc.).This requires nodes to be able to bind variables which shadow variables already bound by parents, which it turns out wasn't handled correctly by the CSE algorithm. Fixing this required several changes:
BaseIR
. This way, each time we see a subtreex
, we can recompute the max depth of the binding sites of all ofx
's free variables, since those binding sites may be different than last time we sawx
. This also required splitting the free variables into value, agg, and scan sets, so they can be looked up in the correct context (previously context lookup was always done at theRef
node, at which point the variable was in the value context).CSEPrintPass
, we have to recompute the same binding depth calculation that was done in the analysis pass, so we know which binding site to look at (previously I just searched all binding sites in scope, but with shadowing handled correctly we don't have sufficient information to decide which binding site is valid). This requires maintaining contexts in the print pass, which is annoying because we are now traversing the tree ofRenderable
children, which is not exactly the same as the IR tree.BaseIR
. To avoid having to write twice as many methods on concrete IR classes, I made the methods taking the index of theRenderable
child (e.g.renderable_bindings
) be the primary methods which are overridden. For all IR nodes, there is a map from IR child index to Renderable child index, defined inrenderable_idx_of_child
, which is used to define thebindings
and similar methods in terms of therenderable_bindings
and similar. This is messier than I would have liked, but I couldn't think of a better way.