Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/compile: random performance fluctuations after unrelated changes #8717
We see constant performance fluctuations after unrelated changes on the perf dashboard. For example: http://build.golang.org/perfdetail?commit=96c713ab6c6f2a4b2a8a0bb2e8d674637b6ce596&;commit0=fee5fcd5f87e75235d93fb297123feb15a59ae38&builder=linux-amd64-perf&benchmark=json http://build.golang.org/perfdetail?commit=455042166f1366b147e1249b8d5639be7d67bfce&;commit0=0a5fafdd2343b083457d0baf6487dfce0f01e25f&builder=windows-amd64-perf&benchmark=json http://build.golang.org/perfdetail?commit=ad5d9f8f9be743e72f89d85d8bd6348807bdac90&;commit0=fc588981a45afa430a2d2cd29d234403cb86e1bd&builder=windows-amd64-perf&benchmark=json I can reproduce it locally as well. I've took 2 consecutive commits: changeset: 21142:91110f70916a summary: runtime: allow crash from gsignal stack changeset: 21141:0768bb1fd027 summary: net: fix inconsistent behavior across platforms in SetKeepAlivePeriod and run go.benchmarks binary as: ./bench -bench=json -benchtime=3s -benchnum=100 alternating old and new binaries. The results are: GOPERF-METRIC:time=91998310 GOPERF-METRIC:time=91863644 GOPERF-METRIC:time=91491272 GOPERF-METRIC:time=91988322 new: GOPERF-METRIC:time=93191495 GOPERF-METRIC:time=93222905 GOPERF-METRIC:time=93224972 GOPERF-METRIC:time=93140395 The difference is now that big (probably because my processors panilizes less what is being penalized), but clearly observable. It looks like code alignment issue. Probably loops are not 16-byte aligned or something like this. I guess this also penalizes user binaries in the same random and unpredictable way.
If you want to try to figure out how this could be loop alignment, please go ahead. I spent days on this a few years ago and got absolutely nowhere. I can't find any evidence that loop alignment matters. It may be something else entirely, but in general modern CPUs are black boxes that can misbehave on a whim and - at least for people not privy to the inner workings - simply cannot be fully understood. They are subtle and quick to anger. If you want to try the loop alignment hypothesis, you could edit src/liblink/asm6.c. Look for LoopAlign (MaxLoopPad = 0 means alignment is turned off right now). I am removing the Release-Go1.5 tag. If someone wants to work on this, great, but I am not going to promise to waste any more time on this. Long ago I made peace with the fact that this kind of variation is something we just have to live with sometimes.
Labels changed: added release-none, removed release-go1.5.
Status changed to Accepted.
I can confirm your conclusion. We need to wait until Go becomes important enough so that processor manufacturers allocate engineers for optimization. I've tried to align back-branch targets and all branch targets at 16 bytes (https://golang.org/cl/162890043) with no success. Alignment of back-branch targets increased binary size by 5.1%, all branches - 8.3%. So if we do it, we need something smarter, e.g. align only within real loops. I've checked that in both binaries stack segment address and fs register has equal values, so we can strike it out. Since code has moved, data segment also has a different address. So maybe it's related to data. But I don't see any functions in the profile that heavily access global data...
changed the title from
cmd/gc: random performance fluctuations after unrelated changes
cmd/compile: random performance fluctuations after unrelated changes
Jun 8, 2015
Just debugged another case, which turned out to be this issue.
go version devel +b4538d7 Wed May 11 06:00:33 2016 +0000 linux/amd64
Then depending on presence of the following patch:
The test program:
With the call commented out I consistently see:
Without the call commented out:
All time is spent in computations:
drawPaletted magically becomes faster after the change.
Diff in CPU profiles still does not make any sense to me, it looks like percents are just randomly shuffled.
The the fast version the function is aligned on 0x10:
and in the slow version to 0x20:
If I set function alignment to 0x20 (which is broken due to asm functions, so it actually gives me 0x10 alignment for the function), it mostly fixes the problem:
From the second paper:
Microsoft Research had a tool that would link a program multiple times, where each binary used a different (randomized) function order, then they'd run tests and pick the best function order. Unfortunately, my Google-fu is failing me and I cannot find a reference. (The closest I can find is VC's /ORDER linker option, which looks like it could be used to implement this feature.)