New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Enable use of aggregators on array expressions #12988
Conversation
hail/python/hail/ir/ir.py
Outdated
tuple, uid, elt = unpack_uid(a.typ) | ||
body = AggLet(value_name, elt, body) | ||
body = with_split_rng_state(body, uid) | ||
value_name = tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible for the body to use randomness in a value context? E.g. see TableAggregate.__init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if how I just updated this is sane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris. That's almost right, but we still need to bind the elt
(body
expects value_name
to be an element without uid). And lets keep using with_split_rng_state
to keep that logic in one place:
tup, uid, elt = unpack_uid(a.typ)
body = AggLet(value_name, elt, body)
if body.uses_value_randomness:
body = Let('__rng_state', RNGStateLiteral(), body)
if body.uses_agg_randomness(is_scan=False):
body = with_split_rng_state(body, is_scan=False)
value_name = tup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, should be fixed now.
@@ -238,6 +238,36 @@ def context(self): | |||
return 'agg' | |||
|
|||
|
|||
def aggregate_local_array(array, f): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed adding a CollectionExpression.aggregate
method, similar to fold
. Did you change your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, forgot over the weekend amid the presentation rush. Thanks for the reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, realized this still needs a test for the randomness handling, I can add that.
hail/python/hail/ir/ir.py
Outdated
a = a.handle_randomness(body.uses_randomness) | ||
if body.uses_randomness: | ||
a = a.handle_randomness(body.uses_agg_randomness) | ||
if body.uses_agg_randomness: | ||
tup, uid, elt = unpack_uid(a.typ) | ||
body = AggLet(value_name, elt, body) | ||
if body.uses_value_randomness: | ||
body = Let('__rng_state', RNGStateLiteral(), body) | ||
if body.uses_agg_randomness(is_scan=False): | ||
body = with_split_rng_state(body, is_scan=False) | ||
body = AggLet(value_name, elt, body, is_scan=False) | ||
body = with_split_rng_state(body, uid, is_scan=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test caught some problems with this. Most substantially, we shouldn't be making a new root RNGStateLiteral
for any value randomness. That means if this occurs inside some other looping context, value randomness will be identical on each outer iteration. Instead, we should just use the rng state bound in the outer context, which means we don't need to do anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. Clears up why this was necessary.
This method is useful for accessing functionality that exists in the aggregator library but not the basic expression library, for instance,
call_stats
.