Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
What did you do?
I noticed some regressions in performance for a few benchmarks since the scheduling change was made in CL 270940.
What did you expect to see?
Same or better performance.
What did you see instead?
10-20% degradation for some crypto benchmarks. So far I've mostly looked in the crypto package, but I'm still looking.
This example is from crypto/internal/bigmod when comparing the commit before the CL was merged against the latest. I verified the degradation happened with CL 270940, and ran against latest to make sure it hadn't been fixed yet.
In the latest code, the increment of r3 is at the top of the loop the result is put into r8, but then moves it back to r3 at the bottom even though r3 is never clobbered in the loop. In the previous code r3 was incremented at the bottom of the loop and stayed in r3 throughout the loop. I also see that the shift of res is in a different place although I think the degradation is due to using another register and moving it.
The text was updated successfully, but these errors were encountered:
I've seen problems like this before, where the loop increment is scheduled before all the uses of the pre-incremented value have been scheduled, so it needs to put the incremented value into a different register and move it back at the end of the loop.
In fact it is worse on x86 because it is 2-arg assembly.