Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail][performance] Fix literals lazy decoding #6605

Merged
merged 8 commits into from
Jul 11, 2019

Conversation

tpoterba
Copy link
Contributor

Cloud providers don't want you to know this one weird trick to save
thousands on your compute bill

This PR:

Running matrix_table_array_arithmetic...
    run 1: 18.28
    run 2: 15.26
    run 3: 14.76

Master:

Running matrix_table_array_arithmetic...
    (I waited 30 minutes and the first partition still hadn't finished...)

Lower bound: 900x faster

Cloud providers don't want you to know this one weird trick to save
thousands on your compute bill
@tpoterba
Copy link
Contributor Author

This change fixes a huge problem caused by these lines of code and context:

https://github.com/hail-is/hail/pull/6605/files#diff-1278c1788239002cc63ccb82cbef8d76L190

The problem is that in our generated code, every literal is decoded each time any literal is referenced. This is extremely expensive!

In this change, we instead decode the literals once with the function is constructed from the partition index (used with randomness), by adding a new region argument which the literals are decoded into. This region must live as long as the RegionValues returned by any invocation of the function.

The primary error mode I might expect is that we use the wrong region to generate the function, causing use-after-free errors. These are well-covered by tests, since I had a few of these bugs and fixed them due to test failures. The region we shouldn't be using is ctx.region, which refers to RVDContext.region, the per-row region that is cleared after each record.

ctx.r (the global execution context region) and ctx.freshRegion (a partition-owned global region, generally named globalRegion or partRegion) are safe.

@danking danking merged commit 02f4e10 into hail-is:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants