Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up registerizeHarder optimizer for huge functions #3364

Merged
merged 2 commits into from Apr 17, 2015

Conversation

aidanhs
Copy link
Contributor

@aidanhs aidanhs commented Apr 15, 2015

My benchmarking (with emtcl) indicates that this PR speeds up the js optimisation part of a -O3 build significantly for codebases with huge functions (and therefore especially with llvm-lto). As with my previous optimizer PR, 'caveat emptor' - I first used C++11 a couple of weeks ago. Please review carefully in case I've made a critical mistake which makes it all wrong (:cry:).

The first commit is the main one. My reading of the code is that (previously) a vector was used to attempt to keep track of the 'last' junction in order to try and go from back to front in one shot. Unfortunately using a vector as a stack isn't actually a reliable way to do this as we push junctions back onto the list in an arbitrary order (with for (auto b : junc.inblocks)).
Using an ordered set:

  • lets us reliably choose the 'last' junction, regardless of insertion order
  • allows elimination of a data structure
  • allows us to not worry about checking before inserting

The second commit is just a minor tweak, I don't know if it gives any speedup. It was just misleading to compare against a different variable when we're actually checking for true.

I've run the first 50 tests (alphabetically) from test_core and all of test_other. They all (bar four seemingly irrelevant other tests) pass.

Python is a good example to see the speedup as it has a huge function:

$ # before
$ EMCC_DEBUG=1 emcc -O3 --llvm-lto 1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -o out.js python/python.bc 2>&1 | grep 'js opts'
DEBUG    root: emcc step "js opts" took 9.12 seconds
# after
$ EMCC_DEBUG=1 emcc -O3 --llvm-lto 1 -s EMULATE_FUNCTION_POINTER_CASTS=1 -o out.js python/python.bc 2>&1 | grep 'js opts'
DEBUG    root: emcc step "js opts" took 5.91 seconds

@aidanhs
Copy link
Contributor Author

aidanhs commented Apr 15, 2015

Looking at updating the js-optimizer at the moment.
Thinking about creating an 'orderedset' type.

@kripken
Copy link
Member

kripken commented Apr 15, 2015

js-optimizer.js is probably not worth updating, almost everyone should be using the native compiled one. the .js version is legacy at this point.

@kripken
Copy link
Member

kripken commented Apr 15, 2015

Make sure you rebuild the native optimizer to test it. emcc --clear-cache, or manually erase it from the cache dir.

Also important to test the other suite here, in particular ./tests/runner.py other.test_js_optimizer (which tests native as well, despite the name).

@aidanhs
Copy link
Contributor Author

aidanhs commented Apr 16, 2015

Ah ok, I figured we were keeping the js one around if it wasn't possible to compile for some reason. But that doesn't make sense, since compiling llvm is required to run emscripten.

I've already run the other.* tests (4 unrelated failures - test_crunch, test_scons, test_llvm_lit, test_outline).
The optimizer isn't in my cache, but in the name of belt and braces I've just wiped ~/.em* and run the other.test_js_optimizer test. It passed.

@kripken
Copy link
Member

kripken commented Apr 16, 2015

Hmm, the optimizer must be in the cache. Or a placeholder file saying that building it failed. What does ls -al ~/.emscripten_cache show? Does EMCC_DEBUG=1 output show the native optimizer is being used?

@aidanhs
Copy link
Contributor Author

aidanhs commented Apr 16, 2015

The built optimizer lives in a directory alongside emscripten/incoming - I just cd into emscripten/incoming_optimizer_64bit (or something like that) and type make whenever I switch branches.

~/.emscripten points to the file there as well.

@kripken
Copy link
Member

kripken commented Apr 16, 2015

cc @rfk who wrote this code

@kripken
Copy link
Member

kripken commented Apr 16, 2015

Just to be safe, I ran benchmarks and a bunch of tests on this. I don't see any issues. I also measured on SQLite, and I see a 30% speedup on the optimizer phase, and 40% on poppler - nice! :)

So this looks great, just waiting on @rfk if he has any comments.

@rfk
Copy link
Contributor

rfk commented Apr 17, 2015

code that's cleaner and faster? nice! this looks good to me.

I believe that "order doesnt matter for correctness, but can hugely impact performance" is indeed a known result in this space. Junction ids are not in general ordered by flow position but it's probably a reasonable approximation, and there's no arguing with the empirical results :-)

@kripken kripken merged commit b75bb47 into emscripten-core:incoming Apr 17, 2015
@kripken
Copy link
Member

kripken commented Apr 17, 2015

Great! Merged. Looking forward to the followup stuff.

@aidanhs aidanhs deleted the aphs-optimize-optimizer branch April 17, 2015 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants