Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Enable use of aggregators on array expressions #12988

Merged
merged 6 commits into from Jun 22, 2023

Conversation

tpoterba
Copy link
Contributor

@tpoterba tpoterba commented May 4, 2023

This method is useful for accessing functionality that exists in the aggregator library but not the basic expression library, for instance, call_stats.

@tpoterba tpoterba added the WIP label May 4, 2023
Comment on lines 2082 to 2085
tuple, uid, elt = unpack_uid(a.typ)
body = AggLet(value_name, elt, body)
body = with_split_rng_state(body, uid)
value_name = tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible for the body to use randomness in a value context? E.g. see TableAggregate.__init__.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if how I just updated this is sane.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris. That's almost right, but we still need to bind the elt (body expects value_name to be an element without uid). And lets keep using with_split_rng_state to keep that logic in one place:

             tup, uid, elt = unpack_uid(a.typ)
             body = AggLet(value_name, elt, body)
             if body.uses_value_randomness:
                 body = Let('__rng_state', RNGStateLiteral(), body)
             if body.uses_agg_randomness(is_scan=False):
                 body = with_split_rng_state(body, is_scan=False)
             value_name = tup

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, should be fixed now.

@@ -238,6 +238,36 @@ def context(self):
return 'agg'


def aggregate_local_array(array, f):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had discussed adding a CollectionExpression.aggregate method, similar to fold. Did you change your mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, forgot over the weekend amid the presentation rush. Thanks for the reminder.

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, realized this still needs a test for the randomness handling, I can add that.

Comment on lines 2080 to 2084
a = a.handle_randomness(body.uses_randomness)
if body.uses_randomness:
a = a.handle_randomness(body.uses_agg_randomness)
if body.uses_agg_randomness:
tup, uid, elt = unpack_uid(a.typ)
body = AggLet(value_name, elt, body)
if body.uses_value_randomness:
body = Let('__rng_state', RNGStateLiteral(), body)
if body.uses_agg_randomness(is_scan=False):
body = with_split_rng_state(body, is_scan=False)
body = AggLet(value_name, elt, body, is_scan=False)
body = with_split_rng_state(body, uid, is_scan=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test caught some problems with this. Most substantially, we shouldn't be making a new root RNGStateLiteral for any value randomness. That means if this occurs inside some other looping context, value randomness will be identical on each outer iteration. Instead, we should just use the rng state bound in the outer context, which means we don't need to do anything here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Clears up why this was necessary.

@chrisvittal chrisvittal changed the title [query] Expose hl.agg.aggregate_local_array [query] Enable use of aggregators on array expressions Jun 22, 2023
@danking danking merged commit 38202f3 into hail-is:main Jun 22, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants