-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math: Lgamma got slower from Go 1.6 to Go 1.7 #17121
Comments
/cc @cherrymui |
Confirmed that there's been no change since 1.7, although the slowdown from 1.6 is less pronounced for me: $ benchstat 16 17
name old time/op new time/op delta
Lgamma-8 21.1ns ± 5% 22.7ns ±11% +7.48% (p=0.000 n=46+50)
$ benchstat 17 tip
name old time/op new time/op delta
Lgamma-8 22.7ns ±11% 22.7ns ±10% ~ (p=0.684 n=50+50)
$ benchstat 16 tip
name old time/op new time/op delta
Lgamma-8 21.1ns ± 5% 22.7ns ±10% +7.18% (p=0.000 n=46+50) |
For record, I tried CL https://go-review.googlesource.com/c/31656/ to make loads and computations interleaving, but it doesn't help on speed. |
I have looked into the internals of the slow case and looks like there is a problem with register spill/fill caused by a call in the hot loop here: |
Hi @aregm, thanks very much for looking at this. I tried to reproduce this result and I can't. If I run
then I get:
That directory has the assembly equivalents of the Go 1.6, Go 1.7, and Go master (ZZZ) compiler output for the math.Lgamma function (lgamma16.s, lgamma17.s, lgammazzz.s). The idea was to be able to modify the assembly directly, to be able to experiment to test hypotheses about the slowdown without having to coerce the compiler into making the changes you want to try. In the Go 1.6 assembly, the spill of the temporary holding lgamma happens before math.Log, instead of above the switch, as you identified. That is, Go 1.6 does:
while current master does (Go 1.7 is similar):
It's true that autotmp_419 is being spilled unnecessarily in the case that the switch doesn't end up in a case that eventually calls Log. But the spll in master to autotmp_419 replaces the (possibly also unnecessary) spill to lgamma+0(FP) in Go 1.6, so there aren't extra spills compared to Go 1.6, at least not in the hot code. Even so, I tried changing the lgammazzz.s assembly to move the spill:
That cut BenchmarkLgammaZZZ from 14.6ns to 14.5ns, so it does help by 0.1ns but it doesn't seem to be the major contributor for the 1.2ns gap between Go 1.6 and master. |
There are two main causes for this degradation: first in bad case the register allocator is not keeping X2 register live enough and does redundant spills in the critical path, which is: In the good case, it keeps X2 alive throughout the critical path and spills it just before ret. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.6.3, go version devel +8086e7c
What operating system and processor architecture are you using (
go env
)?What did you do?
Investigating slowdown of Lgamma benchmark.
1.6 vs 1.7 gives:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 14.2ns ± 0% +19.33% (p=0.000 n=29+29)
It might be compiler issue(code not changed). Looking into objdump of lgamma function for both versions I see different instruction order and registers used:
In 1.7:
In 1.6:
Tried force scheduler to generate instruction order like in 1.6
(for that expanded all expressions like
p1 := _lgamT[0] + w_(_lgamT[3]+w_(_lgamT[6]+w_(_lgamT[9]+w__lgamT[12])))
with tmp variables).But result was still bad:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 13.1ns ± 0% +10.08% (p=0.000 n=29+19)
Also tried to generate random registers - no much effect:
name old time/op new time/op delta
Lgamma-4 11.9ns ± 0% 13.5ns ± 0% +13.45% (p=0.000 n=29+10)
The text was updated successfully, but these errors were encountered: