Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query/lir] use lir method splitting instead of method wrapping #8963

Merged
merged 17 commits into from Jul 31, 2020

Conversation

cseed
Copy link
Collaborator

@cseed cseed commented Jun 15, 2020

Summary of changes:

  • rip out method wrapping logic in ir.Emit
  • add lir.{Blocks, Locals} for enumerating and indexing blocks and locals
  • add SplitLargeBlocks to break up large blocks (the minimum unit of splitting in SplitMethod is the block)
  • add PST to compute our non-standard variant of the program structure tree (see the comment in PST)
  • add SplitMethod to split large methods based on the PST

@cseed cseed removed the stacked PR label Jun 18, 2020
@cseed cseed force-pushed the lir-split2-no-wrap branch 2 times, most recently from fc8e6ec to e4d2281 Compare June 26, 2020 00:01
@cseed
Copy link
Collaborator Author

cseed commented Jun 26, 2020

This is still 2-5% slower than the current main line branch. I have a few more things to try. Benchmarks where the change is >20%:

$ hail-bench compare ./0.2.47-3f8cff262dcf.json ./0.2.47-b2acae4f478f.json 
                                    Benchmark Name      Ratio     Time 1     Time 2
                                    --------------      -----     ------     ------
                 matrix_table_entries_table_no_key     580.3%     56.955    330.503
                         table_aggregate_array_sum     528.0%      9.103     48.066
           table_big_aggregate_compile_and_execute     185.0%     13.325     24.648
                           per_row_stats_star_star     154.0%      8.474     13.046
                            table_aggregate_linreg     148.6%     48.593     72.212
              matrix_table_filter_entries_unfilter     140.8%      8.581     12.085
                     matrix_table_array_arithmetic     135.8%     10.208     13.862
                   matrix_table_many_aggs_col_wise     133.7%     34.847     46.576
...
                               full_combiner_chr22      33.4%   1644.907    548.608
----------------------
Geometric mean: 104.8%
Median:  101.9%

FYI @tpoterba @chrisvittal The combiner improvement persists across multiple runs and is real. That's quite nice!

@cseed
Copy link
Collaborator Author

cseed commented Jun 27, 2020

Phew! I finally got to parity. I'm impressed how well the old method wrapping logic worked. Benchmarks with >20% change:

$ hail-bench compare ./0.2.47-3f8cff262dcf-1.json 0.2.47-63dccdda2a44.json 
Failed benchmarks in run 1:
    pc_relate_big
    large_range_matrix_table_sum
Failed benchmarks in run 2:
    pc_relate_big
    large_range_matrix_table_sum
                                    Benchmark Name      Ratio     Time 1     Time 2
                                    --------------      -----     ------     ------
           table_big_aggregate_compile_and_execute     334.8%     13.325     44.608
                     matrix_table_array_arithmetic     133.9%     10.208     13.665
                   matrix_table_many_aggs_col_wise     131.4%     34.847     45.781
                           table_aggregate_counter     126.4%     13.136     16.606
                           per_row_stats_star_star     124.5%      8.474     10.553
              matrix_table_filter_entries_unfilter     123.7%      8.581     10.616
...
                            shuffle_key_rows_by_mt      82.8%     35.932     29.742
                               full_combiner_chr22      23.6%   1644.907    388.414
----------------------
Geometric mean: 100.0%
Median:  99.2%

@tpoterba
Copy link
Contributor

oh, man, this is super exciting. 3x on the combiner? yes please!

We can probably make incremental performance improvements to the LIR method splitting code to bring the compile and execute back down, and that one I consider a little less critical anyway.

@cseed
Copy link
Collaborator Author

cseed commented Jun 29, 2020

FYI @patrick-schultz Tim came up in scorecard, but I know you had some interest in this and thought you might like to look at my solution.

@@ -13,7 +13,7 @@ object InitializeLocals {
var i = entryUsedIn.nextSetBit(0)
while (i >= 0) {
val l = locals(i)
if (!l.isInstanceOf[Parameter]) {
if (!l.isInstanceOf[Parameter] && l.name != "spills") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that wasn't supposed to make it in. Deleted!

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fun read. Mostly cosmetic comments

@@ -950,10 +950,10 @@ object EmitStream {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned in chat, these need to be fields because we need persistent state in compiled iterators used in TableMapPartitions.

var k = 0

// recursion will blow out the stack
val stack = mutable.Stack[(Int, Iterator[Int])]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use our ArrayStack here? this one is a List. This might be why our big compile benchmark got slower 🤷

// although that is only non-trivial when the CFG is irreducible, which
// is relatively rare.

class Region(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of brain baggage attached to this name. Can we call it CRegion or some similar mangling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it PSTRegion.

}
}

class PST(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space

@@ -0,0 +1,581 @@
package is.hail.lir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file was a bit hard to follow using github's viewer, but in an editor with type inference, it's quite straightforward. Nice!

def apply(c: Classx[_], m: Method): Unit = {
new SplitMethod(c, m).split()
object SplitMethod {
val TargetMethodSize: Int = 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we done experiments tweaking this? I'm happy to benchmark a bunch of values when this goes in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some experiments, but also while I was changing the code, so further tuning might be valuable. Spilling can increase method size, and I've seem methods nearly 2x the target size. The JVM max is 8K. I started with 2K, but found 500 works better. I think I tried 250, but it was not better (I don't remember how much worse).

private val paramFields = m.parameterTypeInfo.zipWithIndex.map { case (ti, i) =>
c.newField(genName("f", s"arg$i"), ti)
private val spillsClass = new Classx(genName("C", s"${ m.name }Spills"), "java/lang/Object")
private val spillsCtor = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for empirical performance reasons, right?

case _: GotoX => UnitInfo
case _: IfX => BooleanInfo
case x: IfX =>
if (!regionBlocks(x.Ltrue) && !regionBlocks(x.Lfalse))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain this bit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see.

x.setLfalse(newLfalse)
}
}
case x: SwitchX => IntInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray dead code?

i = sortedsubr.length - 1
while (i >= 0) {
val ri = sortedsubr(i)
if (ri.size > 20 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a parameter to include in benchmarking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

This is just to make sure that we don't split something that had just been split, which would go into an infinite loop. It is paired with this:

          splitSlice(ri.start, ri.end)
          val s = blockSize(blockPartitions.find(ri.start))
          assert(s < 20)

to make sure split things are sufficiently small that we won't be tempted to resplit them.

I don't think this is a performance issue, but there is potentially a pathological case where splitting could fail to reduce the size of the method: you have a massive region, but is made up of size 15 chlid regions which are all independent with no substructure. Then nothing would get split, and we might blow the method size limit. It's actually hard to imagine what such a function might look like, a bunch of switches that all jump to each other. I'm not quite sure how to split such a function, actually.

@cseed cseed dismissed tpoterba’s stale review July 31, 2020 21:22

addressed comments

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitting is a go!

@danking danking merged commit 1deb7cf into hail-is:master Jul 31, 2020
annamiraotoole pushed a commit to annamiraotoole/hail that referenced this pull request Aug 3, 2020
…-is#8963)

* [query] use lir method splitting instead of method wrapping

use locals instead of fields

* add PST

* wip split uses pst

* compiles

* passes IRSuite

* cleaned up splitting, passing IRSuite

* fix up rebase

* minimal local spilling, IRSuite passes

* cleanup

* move some more fields => locals
target method size 500

* compute loop regions in PST
split out loop regions
unconditionally run SplitMethod for loops

* don't use exceptions for return handling

* don't split loops by default, for benchmarking

* cleanup, fixes

* fix rebase

* back off Field => Local

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants