Description
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (go version
)?
go1.6.3, go version devel +8086e7c
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build584510834=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
What did you do?
Investigating slowdown of Lgamma benchmark.
1.6 vs 1.7 gives:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 14.2ns ± 0% +19.33% (p=0.000 n=29+29)
It might be compiler issue(code not changed). Looking into objdump of lgamma function for both versions I see different instruction order and registers used:
In 1.7:
REPNE MOVSD_XMM 0x14a4b5(IP), X3
REPNE MOVSD_XMM 0x14a4bd(IP), X5
REPNE MOVSD_XMM 0x14a4c5(IP), X6
REPNE MOVSD_XMM 0x14a4cd(IP), X7
REPNE MOVSD_XMM 0x14a4d4(IP), X8
REPNE MOVSD_XMM 0x14a4db(IP), X9
REPNE MULSD X4, X9
REPNE ADDSD X9, X8
REPNE MULSD X4, X8
REPNE ADDSD X8, X7
REPNE MULSD X4, X7
REPNE ADDSD X7, X6
REPNE MULSD X4, X6
...
In 1.6:
REPNE MOVSD_XMM 0x1f94c0(IP), X0
REPNE MOVSD_XMM 0x1f94c0(IP), X1
REPNE MULSD X3, X1
REPNE ADDSD X1, X0
REPNE MULSD X3, X0
REPNE MOVSD_XMM 0x1f949c(IP), X1
REPNE ADDSD X1, X0
REPNE MULSD X3, X0
REPNE MOVSD_XMM 0x1f9484(IP), X1
REPNE ADDSD X1, X0
REPNE MULSD X3, X0
...
Tried force scheduler to generate instruction order like in 1.6
(for that expanded all expressions like p1 := _lgamT[0] + w_(_lgamT[3]+w_(_lgamT[6]+w_(_lgamT[9]+w__lgamT[12])))
with tmp variables).
But result was still bad:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 13.1ns ± 0% +10.08% (p=0.000 n=29+19)
Also tried to generate random registers - no much effect:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 13.5ns ± 0% +13.45% (p=0.000 n=29+10)