New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large link time regression compared to fastcomp #14200
Comments
|
It's possible this is an issue with the new LLVM pass manager. Does it happen with |
|
Ah, looks like this is with LTO from the stack trace. But running on the testcase, I see no difference with the new pass manager, it's 9 minutes either way for me, so that's not the issue Also adding Overall I think 2x slower LTO than fastcomp is about normal - the LLVM wasm backend does a lot more work than fastcomp did in LTO. But the new backend also has wasm object files for much faster linking than fastcomp - have you tried that? I don't know enough about LLVM internals to know if that stack trace's results, ~65.4% of the total link time in |
|
Thanks for the super fast turnaround! The intent of doing LTO for release builds is to optimize code performance. Our understanding is that It is unfortunate if LTO builds are much slower. For iteration builds, we are non-LTO. Regressing release build times would be unfortunate, but something that is even a bit more awkward is that this is blowing up our CI testing times, since we do need to test release configuration there as well. Dropping LTO from release testing would be generally undesirable. Btw, how does llvm handle linking in a mixed form where some of the inputs are bitcode and some of the input are wasm object files? Are there any traps there? (generally I don't think we need to mix, but in this test case I do now notice that we only partially enabled |
|
Fistly, this a regression from previous version of wasm-ld? Or is just a regression from fastcomp -> upstream in general? I'm guessing its the latter, and that wasm-ld itself didn't regress? In answer to your question, wasm-ld will do LTO on any inputs built with flto inputs (i.e. bitcode files) and its fine to mix lto and non-lto object files and libraries. The basic steps are:
This is why |
|
As Alon said, we do expect LTO link speed to regress somewhat with the fastcomp transition. The expectation is basically:
Also the slowdown of LTO builds is not a linker regression but a compiler regression. The LTO slowdown is a slowdown due to the LLVM compiler, no the linker itself. Perhaps this is stating the obvious but I didn't want wasm-ld to be blamed here :) LTO builds are known to very slow in native world (the chrome LTO build takes many hours and tens of gigs of memory for example.. its so big it can't be done a normal developer workstation and will probable bring down the whole machine!). There are some methods we can employ for speeding up LTO builds:
Depending on what you want to achieve one might choose different options. I believe (4) ThinLTO is path most bang for the buck although sadly it needs more testing in the wasm world. |
|
A note of using LTO in general: From the reports I've seen its not always a win in the WebAssembly world. Since it allows a lot of cross-module optimizations, and in particular inlining, it is often a code size regression over non-LTO. IIRC we have had reports of success but also reports that of little or no runtime gains, combined with code size regressions. So make sure you measure to see what kind of benefit you are getting from using LTO at all. |
|
I did some profiling myself, and I can confirm what @juj saw with a very large amount of time spent in just two methods ( |
|
Thanks for the details!
A regression from fastcomp.
When I left the build running on my computer in debug build, with a breakpoint set to the positive branch sides of the hot Looks like we'll have to re-evaluate whether LTO will be giving any benefits, and do some more detailed testing for numbers. |
Just to confirm, any such optimizations would be normal compiler optimizations right? Nothing linker specific here? i.e. it would speed up normal compilation of a single source file too? It just happens the LTO is more noticeable because its not as parallelizable? For actually making LTO more parallel we should probably also look into recommended ThinLTO and ensuring it works well. |
|
Another way of putting it.. switching to llvm backend regressed compile times but because parallel build systems, that was maybe not as noticeable as the LTO link-time regression? In fact, I would expect the regression in compile time to be way worse because the upstream backend actually compiles the code, and doesn't stop at bitcode by default. |
Correct, yes. Based on your last comment, I wonder if you are not hitting a case that LTO on LLVM might not help. That is, maybe inlining |
|
Interesting idea! LTO all this things for faster LTO'ing of things. |
|
Sadly the LTO build does not speed things up for me. Interestingly, Anyhow, |
|
So it looks like this is just the cost of LLVM LTO? @juj are you OK with disabling LTO when you want a quick build? |
After updating our compiler from 1.38.11 to 2.0.19, we are seeing a very large regression in
wasm-ldlink times in a large project, with link times taking up to 15 minutes(!), or about twice as slow as fastcomp.Taking a look at this regression in Visual Studio 2019 performance profiler, there are two functions that consume ~65.4% of the total link time,
llvm::Function::hasAddressTakenandhasChangeableCC:Looking inside these two functions, they both have a hotspot in the same location: an inlined call to a seemingly
dynamic_cast<>/reinterpret_cast<>likeisa<BlockAddress>()check:To reproduce this issue, throw me a mail or a message at Discord, the test case is unfortunately a bit larger than what can be directly attached here. Unzip the files to Emscripten root directory, and then run
The text was updated successfully, but these errors were encountered: