New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up registerizeHarder for huge numbers of locals #3368

Merged
merged 6 commits into from Apr 20, 2015

Conversation

Projects
None yet
3 participants
@aidanhs
Contributor

aidanhs commented Apr 17, 2015

Stage 2 of being fed up with long compile times. There's some cleanup (as promised in #3364), but the important commit for performance is "Pre-compute possible links and conflicts when considering a junction". The motivation for changes around here was profiling and observing that the highlighted two lines

          Block* block = blocks[b];
          Junction& jSucc = junctions[block->exit];
          for (auto otherName : jSucc.live) {      <<<<<<<<<<<<
            if (junc.live.has(otherName)) continue;    <<<<<<<<<<<
            if (block->lastKillLoc[otherName] < block->firstDeadLoc[name]) {

were really hot.
By doing preprocessing we can do map+vector iteration rather than map+map iteration. Given that the inner loop is run O(n^2) times, I'm guessing making it a vec loop is a Good Idea.
The actual motivating idea for the rearrangement is that, when checking for conflicts, we actually want to check for each variable against other variables. Having to loop over blocks and follow pointers to actually find the variables is an implementation detail of how the data is structured.

Before:

tests $ python runner.py asm3.test_sqlite
[...]
Ran 1 test in 73.380s
[...]
tests $ EMCC_DEBUG=1 emcc -O3 --llvm-lto 1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -o out.js python/python.bc 2>&1 | grep 'js opts'
DEBUG    root: emcc step "js opts" took 6.09 seconds

After:

tests $ python runner.py asm3.test_sqlite
[...]
Ran 1 test in 60.924s
[...]
$ EMCC_DEBUG=1 emcc -O3 --llvm-lto 1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -o out.js python/python.bc 2>&1 | grep 'js opts'
DEBUG    root: emcc step "js opts" took 4.48 seconds
@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 17, 2015

Contributor

test_other and the first 50 asm3 tests passed (+ sqlite, python). I'm going to run a full asm3 test.

Contributor

aidanhs commented Apr 17, 2015

test_other and the first 50 asm3 tests passed (+ sqlite, python). I'm going to run a full asm3 test.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 17, 2015

Owner

Tests look good. I can confirm a 20% speedup (tested sqlite, openjpeg).

Owner

kripken commented Apr 17, 2015

Tests look good. I can confirm a 20% speedup (tested sqlite, openjpeg).

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 18, 2015

Owner

Ok, looks like everything is good here. I think there were some tiny tweaks from the comments, ready to merge when those are in.

Great stuff!

Owner

kripken commented Apr 18, 2015

Ok, looks like everything is good here. I think there were some tiny tweaks from the comments, ready to merge when those are in.

Great stuff!

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 18, 2015

Contributor

I have:

  • used insert to detect whether to reserve space in the set (which helped me remember why I hate C++...please check this code style for constructing objects etc is right)
  • used name rather than otherName
  • skipped entry junction for jWorkSet addition
  • put the (int) cast back in

I have not:

  • reserved space in jWorkSet - apparently can't do that in an ordered set. There are alternatives that I'll revisit if I see this loop getting hot.

Let me know if you have any more comments.

Contributor

aidanhs commented Apr 18, 2015

I have:

  • used insert to detect whether to reserve space in the set (which helped me remember why I hate C++...please check this code style for constructing objects etc is right)
  • used name rather than otherName
  • skipped entry junction for jWorkSet addition
  • put the (int) cast back in

I have not:

  • reserved space in jWorkSet - apparently can't do that in an ordered set. There are alternatives that I'll revisit if I see this loop getting hot.

Let me know if you have any more comments.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 18, 2015

Contributor

I'm annoyed github has swallowed a couple of helpful comments for future people looking back in history, so pasting here (I guess this is a lesson to comment on the PR diff/in comments, rather than on commits):


I wouldn't bother looking at the diff unless you're putting them side-by-side and considering them in isolation - this change actually alters the algorithm a bit.

To explain, let's consider 3 bocks (A, B and C) with their exit junctions (aJ, bJ, cJ). B and C are outblocks of aJ, i.e.:

   A
  / \
 B   C

To run through the two versions:

Previously:

For each live variable in aJ
  - mark conflicts against all live variables in aJ
  - for each successor block x
    - find xJ
    - for each live variable v in xJ
      - (optimisation: skip if it's live in aJ, it's already marked as a conflict)
      - if the variable lifetimes overlap in the block, mark them as conflicting

Note that we've got three nested loops - this is why the innermost stuff was so hot, because it gets executed a lot. Even the optimisation is touch and go - turn it into a slightly more expensive check (like checking if it's currently conflicting, rather than is just live in the current junction) and it slows down!
For example, imagine bJ and cJ have the same live variables - we're running the optimisation check twice as many times as necessary!

Now:

For each successor block x
  - find xJ
  - for each live variable v in xJ
    - add x as a possibly conflicting block for variable v
For each live variable v in aJ
  - remove v as a possibly conflicting variable
For each live variable v in aJ
  - mark conflicts against all live variables in aJ
  - for each potentially conflicting variable:
    - for each block the variable is present in
      - if the variable lifetimes overlap in the block, mark them as conflicting

We've effectively flipped the loop inside out, which has the advantage of being clearer - "for variable X, does Y conflict? let's check the blocks Y is present in".
On paper this kinda looks like more work, but I suspect the speedup comes from a) not having to follow pointers inside the doubly nested loop, b) iterating over a vector in the innermost loop but mainly c) moving the 'optimisation' checking whether variables are live up to a top-level loop rather than inside a thrice nested loop.

There are a few other tweaks I thought of while writing this, but they can wait.


The break is a 'free' optimisation as there's effectively no overhead in any situation.

I did consider adding a 'check if conflict exists' condition, but it had no noticeable effect I could detect and may cause slower running in some situations. We don't need it because we're inserting into a set - duplicates aren't an issue.


We must always enter this[do...while] loop in order to do a first pass consideration of any blocks added to bWorkSet above. The reason the condition moved to the end was exactly because of the exit junction case - even when we have an empty junction work set (i.e. it was just the exit junction), we need to look at the blocks going into the exit junction.

Contributor

aidanhs commented Apr 18, 2015

I'm annoyed github has swallowed a couple of helpful comments for future people looking back in history, so pasting here (I guess this is a lesson to comment on the PR diff/in comments, rather than on commits):


I wouldn't bother looking at the diff unless you're putting them side-by-side and considering them in isolation - this change actually alters the algorithm a bit.

To explain, let's consider 3 bocks (A, B and C) with their exit junctions (aJ, bJ, cJ). B and C are outblocks of aJ, i.e.:

   A
  / \
 B   C

To run through the two versions:

Previously:

For each live variable in aJ
  - mark conflicts against all live variables in aJ
  - for each successor block x
    - find xJ
    - for each live variable v in xJ
      - (optimisation: skip if it's live in aJ, it's already marked as a conflict)
      - if the variable lifetimes overlap in the block, mark them as conflicting

Note that we've got three nested loops - this is why the innermost stuff was so hot, because it gets executed a lot. Even the optimisation is touch and go - turn it into a slightly more expensive check (like checking if it's currently conflicting, rather than is just live in the current junction) and it slows down!
For example, imagine bJ and cJ have the same live variables - we're running the optimisation check twice as many times as necessary!

Now:

For each successor block x
  - find xJ
  - for each live variable v in xJ
    - add x as a possibly conflicting block for variable v
For each live variable v in aJ
  - remove v as a possibly conflicting variable
For each live variable v in aJ
  - mark conflicts against all live variables in aJ
  - for each potentially conflicting variable:
    - for each block the variable is present in
      - if the variable lifetimes overlap in the block, mark them as conflicting

We've effectively flipped the loop inside out, which has the advantage of being clearer - "for variable X, does Y conflict? let's check the blocks Y is present in".
On paper this kinda looks like more work, but I suspect the speedup comes from a) not having to follow pointers inside the doubly nested loop, b) iterating over a vector in the innermost loop but mainly c) moving the 'optimisation' checking whether variables are live up to a top-level loop rather than inside a thrice nested loop.

There are a few other tweaks I thought of while writing this, but they can wait.


The break is a 'free' optimisation as there's effectively no overhead in any situation.

I did consider adding a 'check if conflict exists' condition, but it had no noticeable effect I could detect and may cause slower running in some situations. We don't need it because we're inserting into a set - duplicates aren't an issue.


We must always enter this[do...while] loop in order to do a first pass consideration of any blocks added to bWorkSet above. The reason the condition moved to the end was exactly because of the exit junction case - even when we have an empty junction work set (i.e. it was just the exit junction), we need to look at the blocks going into the exit junction.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 20, 2015

Owner

Note that they are hidden by default, but people can still see them, they just need to click on "expand outdated diff".

Owner

kripken commented Apr 20, 2015

Note that they are hidden by default, but people can still see them, they just need to click on "expand outdated diff".

@kripken kripken merged commit 093d3d5 into kripken:incoming Apr 20, 2015

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 20, 2015

Owner

Merged, thanks!

Owner

kripken commented Apr 20, 2015

Merged, thanks!

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 20, 2015

Contributor

Hmm, the only "outdated diff" link I can see on this page is from rfk - I can't see your comments.
Did you click through to the individual commits and comment there?

Contributor

aidanhs commented Apr 20, 2015

Hmm, the only "outdated diff" link I can see on this page is from rfk - I can't see your comments.
Did you click through to the individual commits and comment there?

@aidanhs aidanhs deleted the aidanhs:aphs-optimize-optimizer-2 branch Apr 20, 2015

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 20, 2015

Owner

Ah, yes, good point. I think github is smart about pull diff comments, but I commented on a commit - which is since the rebase not in this pull, I guess! You should get a separate email about it, but it doesn't show up here. Anyhow, all I said there was "makes sense, got it, thanks" or similar.

Owner

kripken commented Apr 20, 2015

Ah, yes, good point. I think github is smart about pull diff comments, but I commented on a commit - which is since the rebase not in this pull, I guess! You should get a separate email about it, but it doesn't show up here. Anyhow, all I said there was "makes sense, got it, thanks" or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment