Make DEAD_FUNCTIONS work #3239

Merged
merged 2 commits into from Apr 6, 2015

Conversation

Projects
None yet
4 participants
@aidanhs
Contributor

aidanhs commented Mar 7, 2015

I've not yet looked into how to fix it, but I figure the failing test is of value in incoming anyway.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Mar 7, 2015

Contributor

I suppose IGNORED_FUNCTIONS should also have a test, since I can't get that to work either...

I feel I must be doing something wrong.

Contributor

aidanhs commented Mar 7, 2015

I suppose IGNORED_FUNCTIONS should also have a test, since I can't get that to work either...

I feel I must be doing something wrong.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Mar 7, 2015

Contributor

I've pushed a bit of a hacky way to make this work that I'm having to resort to. It's a really subideal implementation but I just want to get it working - it passes my test at least.

Note that my hacky patch is also incomplete, as it only fixes fastcomp. Ideally I'd have put it in compiler/jsifier but they don't seem to do any function parsing for fastcomp and the functions really need to be stripped as soon as possible.

Contributor

aidanhs commented Mar 7, 2015

I've pushed a bit of a hacky way to make this work that I'm having to resort to. It's a really subideal implementation but I just want to get it working - it passes my test at least.

Note that my hacky patch is also incomplete, as it only fixes fastcomp. Ideally I'd have put it in compiler/jsifier but they don't seem to do any function parsing for fastcomp and the functions really need to be stripped as soon as possible.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 9, 2015

Owner

Looks like IGNORED and DEAD functions were not functional (no pun intended) in fastcomp, yeah.

Do we need them? IGNORED seems like it can't work in asm.js, given the modularity. What is your use case for DEAD?

Your fix looks fairly good. We could be more efficient by doing this in the backend I suppose, but this way is simpler. We should probably avoid any work though if there are no dead functions. But first, let's discuss the use case here, I'm unclear what we are solving.

Owner

kripken commented Mar 9, 2015

Looks like IGNORED and DEAD functions were not functional (no pun intended) in fastcomp, yeah.

Do we need them? IGNORED seems like it can't work in asm.js, given the modularity. What is your use case for DEAD?

Your fix looks fairly good. We could be more efficient by doing this in the backend I suppose, but this way is simpler. We should probably avoid any work though if there are no dead functions. But first, let's discuss the use case here, I'm unclear what we are solving.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 9, 2015

Owner

Another option is for people to run llvm-extract, which can remove functions.

Owner

kripken commented Mar 9, 2015

Another option is for people to run llvm-extract, which can remove functions.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Mar 10, 2015

Contributor

I've got a project which compiles full Tcl and Python interpreters, joins them together with a C extension, adds the Python stdlib and runs an application on top of it. It works, which is pretty incredible, but it results in a 13MB js file.

So my plan was to try and cut down the size by manually excluding functions I know aren't used. There's one particular function (thanks find_bigfuncs.py!) which reduces the final file size of my 'max minimise' build (-Oz for bitcode, -O3 --js-opts 1 --llvm-opts 3 --llvm-lto 3 -s ASSERTIONS=0 for js) by ~100KB just for that single exclusion! So seems useful when compiling a library where you know some functions aren't used.

Unfortunately it hits diminishing returns pretty quickly for my use case. The root problem is needing to set EMULATE_FUNCTION_POINTER_CASTS=1 because of Python, which balloons the size from 7.1MB to 13MB. That 7.1MB contains an inline memory init array, all python *.py stdlib files (with a couple of tricks for size optimisation), the python interpreter and the tcl interpreter. So 6MB is a big jump! I was potentially going to look into making function pointer cast emulation more granular, to only apply it to particular libraries (or even individual functions) but didn't get much time to look.

Contributor

aidanhs commented Mar 10, 2015

I've got a project which compiles full Tcl and Python interpreters, joins them together with a C extension, adds the Python stdlib and runs an application on top of it. It works, which is pretty incredible, but it results in a 13MB js file.

So my plan was to try and cut down the size by manually excluding functions I know aren't used. There's one particular function (thanks find_bigfuncs.py!) which reduces the final file size of my 'max minimise' build (-Oz for bitcode, -O3 --js-opts 1 --llvm-opts 3 --llvm-lto 3 -s ASSERTIONS=0 for js) by ~100KB just for that single exclusion! So seems useful when compiling a library where you know some functions aren't used.

Unfortunately it hits diminishing returns pretty quickly for my use case. The root problem is needing to set EMULATE_FUNCTION_POINTER_CASTS=1 because of Python, which balloons the size from 7.1MB to 13MB. That 7.1MB contains an inline memory init array, all python *.py stdlib files (with a couple of tricks for size optimisation), the python interpreter and the tcl interpreter. So 6MB is a big jump! I was potentially going to look into making function pointer cast emulation more granular, to only apply it to particular libraries (or even individual functions) but didn't get much time to look.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 11, 2015

Owner

Got it. Ok, how about if we deprecate IGNORED, and fix DEAD? For the fix, what you have is reasonable, but let's make sure the code only runs if there are DEAD functions, due to the compile time overhead.

Owner

kripken commented Mar 11, 2015

Got it. Ok, how about if we deprecate IGNORED, and fix DEAD? For the fix, what you have is reasonable, but let's make sure the code only runs if there are DEAD functions, due to the compile time overhead.

@juj juj added the tests label Mar 12, 2015

@waywardmonkeys

This comment has been minimized.

Show comment
Hide comment
@waywardmonkeys

waywardmonkeys Mar 13, 2015

Contributor

This test probably needs to move to test_other or set a -g2 flag so that minification doesn't happen which would rename _unused to something else ...

Contributor

waywardmonkeys commented Mar 13, 2015

This test probably needs to move to test_other or set a -g2 flag so that minification doesn't happen which would rename _unused to something else ...

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Mar 13, 2015

Contributor

Could we check for the strings in O0 and just check it exits with an error in Oanythingelse?

Though that's a test_other case really. I'll have a think.

Contributor

aidanhs commented Mar 13, 2015

Could we check for the strings in O0 and just check it exits with an error in Oanythingelse?

Though that's a test_other case really. I'll have a think.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 16, 2015

Owner

Your first line in the last comment sounds fine.

(Also don't forget about my last comment to make it not run any code in emscripten.py if there are no DEAD FUNCTIONS.)

Owner

kripken commented Mar 16, 2015

Your first line in the last comment sounds fine.

(Also don't forget about my last comment to make it not run any code in emscripten.py if there are no DEAD FUNCTIONS.)

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Mar 16, 2015

Owner

More specifically, you can check for the strings in default, asm1, asm2g.

Owner

kripken commented Mar 16, 2015

More specifically, you can check for the strings in default, asm1, asm2g.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 2, 2015

Contributor

I've added a condition to check whether to enter dead functions removal, added logging on doing so and fixed all the tests:

$ python tests/runner.py ALL.test_dead_functions
Running all test modes on test "test_dead_functions"
test_dead_functions (test_core.default) ... (checking sanity from test runner)
INFO     root: (Emscripten: Running sanity checks)
WARNING  root: java does not seem to exist, required for closure compiler, which is optional (define JAVA in ~/.emscripten if you want it)
ok
test_dead_functions (test_core.asm1) ... ok
test_dead_functions (test_core.asm2) ... ok
test_dead_functions (test_core.asm3) ... ok
test_dead_functions (test_core.asm2f) ... ok
test_dead_functions (test_core.asm2g) ... ok
test_dead_functions (test_core.asm1i) ... WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
ok
test_dead_functions (test_core.asm3i) ... ok
test_dead_functions (test_core.asm2nn) ... ok

----------------------------------------------------------------------
Ran 9 tests in 40.232s

OK

However, I decided to check this generates valid asm.js (after being bitten by #3300 yesterday). Turns out it doesn't - "TypeError: asm.js type error: missing definition of function _unused" when compiling the test code with em++ --memory-init-file 0 -O3 -s DEAD_FUNCTIONS="['_unused']" fn.cpp.
Might be nice to have an asm.js validator run on any tests saying 'use asm'.

I've got an idea how to fix this, will give it a shot.

Contributor

aidanhs commented Apr 2, 2015

I've added a condition to check whether to enter dead functions removal, added logging on doing so and fixed all the tests:

$ python tests/runner.py ALL.test_dead_functions
Running all test modes on test "test_dead_functions"
test_dead_functions (test_core.default) ... (checking sanity from test runner)
INFO     root: (Emscripten: Running sanity checks)
WARNING  root: java does not seem to exist, required for closure compiler, which is optional (define JAVA in ~/.emscripten if you want it)
ok
test_dead_functions (test_core.asm1) ... ok
test_dead_functions (test_core.asm2) ... ok
test_dead_functions (test_core.asm3) ... ok
test_dead_functions (test_core.asm2f) ... ok
test_dead_functions (test_core.asm2g) ... ok
test_dead_functions (test_core.asm1i) ... WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
WARNING  root: enabling js opts for EMTERPRETIFY
WARNING  root: enabling --memory-init-file for EMTERPRETIFY
ok
test_dead_functions (test_core.asm3i) ... ok
test_dead_functions (test_core.asm2nn) ... ok

----------------------------------------------------------------------
Ran 9 tests in 40.232s

OK

However, I decided to check this generates valid asm.js (after being bitten by #3300 yesterday). Turns out it doesn't - "TypeError: asm.js type error: missing definition of function _unused" when compiling the test code with em++ --memory-init-file 0 -O3 -s DEAD_FUNCTIONS="['_unused']" fn.cpp.
Might be nice to have an asm.js validator run on any tests saying 'use asm'.

I've got an idea how to fix this, will give it a shot.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 2, 2015

Contributor

Ok, I've pushed a new version. It keeps the dead function in the body of the ASM_JS code, but strips out the body (leaving parameter coercions) and replaces it with a comment and an abort. It then adds return 0 if the actual version of the function had a return value.

The downside of all this is that the comment gets stripped in -O2 or above, even with -g. You can at least downgrade to -O1 and see what's going on, but I'd welcome any suggestions with better debuggability.

Contributor

aidanhs commented Apr 2, 2015

Ok, I've pushed a new version. It keeps the dead function in the body of the ASM_JS code, but strips out the body (leaving parameter coercions) and replaces it with a comment and an abort. It then adds return 0 if the actual version of the function had a return value.

The downside of all this is that the comment gets stripped in -O2 or above, even with -g. You can at least downgrade to -O1 and see what's going on, but I'd welcome any suggestions with better debuggability.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 2, 2015

Owner

Hmm, this looks a little fragile - not sure it can handle getting the return type correctly? That can also break asm validation.

I think this might best be done as a tiny js optimizer pass. See for example asmLastOpts as a small pass. This could be even smaller - just normalizeAsm, then erase the current body, then denormalizeAsm. I think that will fill in all the proper arg coercions and even add a return for you. Then just add a few lines of code in emcc to run this pass. You need to pass in extraInfo with the list of dead functions.

Owner

kripken commented Apr 2, 2015

Hmm, this looks a little fragile - not sure it can handle getting the return type correctly? That can also break asm validation.

I think this might best be done as a tiny js optimizer pass. See for example asmLastOpts as a small pass. This could be even smaller - just normalizeAsm, then erase the current body, then denormalizeAsm. I think that will fill in all the proper arg coercions and even add a return for you. Then just add a few lines of code in emcc to run this pass. You need to pass in extraInfo with the list of dead functions.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 2, 2015

Contributor

Ah, I made the mistake of assuming the only return types are v and i. Which wasn't very clever.

Nice, I've wanted to play a bit more in depth with emscripten, this will give me a chance. Might take a while to get round to it though.

Contributor

aidanhs commented Apr 2, 2015

Ah, I made the mistake of assuming the only return types are v and i. Which wasn't very clever.

Nice, I've wanted to play a bit more in depth with emscripten, this will give me a chance. Might take a while to get round to it though.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 3, 2015

Contributor

PR updated. After spending a few hours poring over the optimizer code and coming up with a native pass, I was informed that only a JS pass was necessary. So I converted to that js about 10 mins, and now both exist :P

Anyway, I would say that the C++ code requires careful review given it's been a while since I've written any (try 'ever' for c++11)...but there's not really much there. It seems to work anyway...

Contributor

aidanhs commented Apr 3, 2015

PR updated. After spending a few hours poring over the optimizer code and coming up with a native pass, I was informed that only a JS pass was necessary. So I converted to that js about 10 mins, and now both exist :P

Anyway, I would say that the C++ code requires careful review given it's been a while since I've written any (try 'ever' for c++11)...but there's not really much there. It seems to work anyway...

@@ -1491,6 +1491,13 @@ try:
js_optimizer_queue = []
js_optimizer_extra_info = {}

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

you should enable js_opts, see https://github.com/kripken/emscripten/blob/master/emcc#L1067

we need to know if any js opt is going to be run later, because it affects what is emitted by the backend.

@kripken

kripken Apr 3, 2015

Owner

you should enable js_opts, see https://github.com/kripken/emscripten/blob/master/emcc#L1067

we need to know if any js opt is going to be run later, because it affects what is emitted by the backend.

emcc
+ if shared.Settings.DEAD_FUNCTIONS:
+ js_optimizer_queue += ['eliminateDeadFuncs']
+ js_optimizer_extra_info['dead_functions'] = shared.Settings.DEAD_FUNCTIONS
+ flush_js_optimizer_queue()

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

no need to flush the queue here, when js_opts are on it should be flushed later anyhow

@kripken

kripken Apr 3, 2015

Owner

no need to flush the queue here, when js_opts are on it should be flushed later anyhow

tools/js-optimizer.js
+ assert(extraInfo && extraInfo.dead_functions);
+ var dead_functions = extraInfo.dead_functions;
+ traverseGeneratedFunctions(ast, function (fun, type) {
+ for (var i = 0; i < dead_functions.length; i++) {

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

instead of linear traversal use a set(), and check func[1] in thatSet. see other examples in the file

@kripken

kripken Apr 3, 2015

Owner

instead of linear traversal use a set(), and check func[1] in thatSet. see other examples in the file

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

it's common to receive extraInfo stuff and setify it

@kripken

kripken Apr 3, 2015

Owner

it's common to receive extraInfo stuff and setify it

tools/optimizer/optimizer.cpp
+ AsmData asmData(fun);
+ fun[3]->setSize(1);
+ fun[3][0] = make1(STAT, make2(CALL, makeName(ABORT), makeArray()));
+ while (asmData.vars.size() > 0) {

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

you can do asmData.vars.clear()

@kripken

kripken Apr 3, 2015

Owner

you can do asmData.vars.clear()

This comment has been minimized.

@aidanhs

aidanhs Apr 4, 2015

Contributor

I guess we don't care about leaving the values in .locals because the structure gets thrown away anyway?

@aidanhs

aidanhs Apr 4, 2015

Contributor

I guess we don't care about leaving the values in .locals because the structure gets thrown away anyway?

tools/optimizer/optimizer.cpp
+ assert(extraInfo->has(DEAD_FUNCTIONS));
+ Ref dead_functions = extraInfo[DEAD_FUNCTIONS];
+ traverseFunctions(ast, [&](Ref fun) {
+ for (size_t i = 0; i < dead_functions->size(); i++) {

This comment has been minimized.

@kripken

kripken Apr 3, 2015

Owner

instead of linear search, there is a StringSet

@kripken

kripken Apr 3, 2015

Owner

instead of linear search, there is a StringSet

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 4, 2015

Contributor

Made review changes.

Contributor

aidanhs commented Apr 4, 2015

Made review changes.

@aidanhs aidanhs changed the title from Add failing test for DEAD_FUNCTIONS to Make DEAD_FUNCTIONS work Apr 5, 2015

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 5, 2015

Contributor

On the subject of deprecation, can we not just remove IGNORED_FUNCTIONS since it doesn't work at the moment in fastcomp, and fastcomp mode is now the only mode that works?

Contributor

aidanhs commented Apr 5, 2015

On the subject of deprecation, can we not just remove IGNORED_FUNCTIONS since it doesn't work at the moment in fastcomp, and fastcomp mode is now the only mode that works?

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 6, 2015

Owner

Yeah, let's remove IGNORED.

Owner

kripken commented Apr 6, 2015

Yeah, let's remove IGNORED.

tests/test_core.py
+ test('abort(-1) at Error')
+ test('abort(-1) at Error', args=['x'], no_build=True)
+
+ Settings.DEAD_FUNCTIONS = []

This comment has been minimized.

@kripken

kripken Apr 6, 2015

Owner

no need for this, we wipe it before each run

@kripken

kripken Apr 6, 2015

Owner

no need for this, we wipe it before each run

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 6, 2015

Owner

Looks great. Please just do that last test tweak, and squash the commits. Can remove IGNORED here or in a later pull, either is fine (if here, then as a second commit).

Owner

kripken commented Apr 6, 2015

Looks great. Please just do that last test tweak, and squash the commits. Can remove IGNORED here or in a later pull, either is fine (if here, then as a second commit).

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 6, 2015

Contributor

Done.

Contributor

aidanhs commented Apr 6, 2015

Done.

kripken added a commit that referenced this pull request Apr 6, 2015

@kripken kripken merged commit 53aac03 into kripken:incoming Apr 6, 2015

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Apr 6, 2015

Owner

Thanks!

Testing locally I noticed the test fails on spidermonkey, because the exceptions syntax is different. Pushed a tiny fix in f5aa623

Owner

kripken commented Apr 6, 2015

Thanks!

Testing locally I noticed the test fails on spidermonkey, because the exceptions syntax is different. Pushed a tiny fix in f5aa623

@aidanhs aidanhs deleted the aidanhs:aphs-test-dead-functions branch Apr 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment