cmd/link: emitting DWARF doubles time spent in linker #26318
Just plain "go list -json cmd/go", which essentially does the prep for a build, takes about 0.12r, so the upper bound on time spent in link here is about 0.45r without DWARF and 0.93r with DWARF. That is, emitting DWARF doubles the amount of time it takes for the linker to execute. We don't pay this cost for "go test" or "go run", because we turn DWARF off for those already.
The Mac shows a similar ~2X slowdown. I thought maybe #12259 was related but given that Linux clearly is unaffected by that issue, it appears to be a general DWARF problem.
I've put the proposal to turn off DWARF by default (#26074) on hold pending better understanding of where all the time is going (this issue). If we decide that DWARF really is a 2X cost, then we need to go back to #26074 and decide whether that's worth paying on every build when most people make no use of the DWARF info.
referenced this issue
Jul 10, 2018
As a point of comparison, I ran both a
perflock benchcmd -n 10 BuildCmdGoAll sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -a cmd/go" > dwarf perflock benchcmd -n 10 BuildCmdGoAll sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -a -gcflags -dwarf=false -ldflags -w cmd/go" > nodwarf perflock benchcmd -n 10 LinkCmdGo sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 cmd/go" >> dwarf perflock benchcmd -n 10 LinkCmdGo sh -c "rm -f /tmp/x1 && go build -o /tmp/x1 -ldflags -w cmd/go" >> nodwarf benchsave nodwarf dwarf
From my linux/amd64 laptop:
This at least probably explains how we missed this. We were exceedingly careful about compiler performance and kept an eye on overall build performance, but weren't looking at the effects on just the link step. Perhaps compilebench should include a linker benchmark.
Looking at the profiles, about half of the additional link time comes from compression. An easy thing to try would be to reduce the compression level to favor speed. About a quarter of the time comes from
This is pretty promising. Using
And the impact on binary size is minimal.
Here's a comparison of building vs just linking jujud between Go 1.10 and tip (including BestSpeed DWARF compression):
So, we're winning on non-incremental builds, probably because of indexed export info, but the link step in isolation is 26% slower. Not great, though certainly better than 2X or the 40% reported in #26074 (comment).
Here is what I see today.
WIth DWARF the average is
So today the slowdown of using DWARF is around 30%, which is a lot better than 200%.
Looking at the CPU profile when linking cmd/go I see 4.76% of CPU time in
These numbers seem plausible to me and I think we've answered the question posed by this issue, so closing.