-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Description
Bugzilla Link | 40303 |
Version | trunk |
OS | All |
Reporter | LLVM Bugzilla Contributor |
CC | @francisvm,@RKSimon,@arsenm |
Extended Description
Clang -ftime-report seems to have quite a lot of overhead, compared to when the flag is not used. In the time reported, curiously passes like "X86 Assembly Printer" take up a lot of of time.
Overhead of having -ftime-report on is 30-60% which sounds like a lot. Some tests I did locally (actual files don't matter; basically any non-trivial .cpp file compilation will do):
regular -ftime-report
catch.cpp 1.337 1.950
catch.cpp -O2 4.616 6.586
stl.cpp 0.882 1.269
unityformat.cpp 7.195 7.312
range-compr.cpp 5.352 6.129
shader.cpp 6.000 9.555
shader.cpp -O2 12.635 20.061
My guess is that's because lib/CodeGen/AsmPrinter/AsmPrinter.cpp basically has two timer samples (NamedRegionTimer) for every instruction, and for every "Handler" that it invokes.
EmitFunctionBody basically looks like:
for (auto &MBB : *MF) {
for (auto &MI : MBB) {
for (const HandlerInfo &HI : Handlers) {
NamedRegionTimer T(...);
beginInstruction();
}
// ...
for (const HandlerInfo &HI : Handlers) {
NamedRegionTimer T(...);
endInstruction();
}
}
}
And then every timer sample is of course involves getting elapsed time, process times, and memory usage samples twice (for beginning and end of region).
I'm testing this with a couple days old clang trunk (8.0.0), but seemingly the issue has been there for a while. Haven't tracked down how far back it exists.