What version of Go are you using (go version)?
$ go version
go version go1.20.1 darwin/amd64
Does this issue reproduce with the latest release?
Uncertain
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
GOARCH="amd64"
GOOS="darwin"
What did you do?
I was profiling some code and found that swapping 2 elements of a slice was slower than expected. I found that swapping elements if they were pointer types instead of interfaces reached the performance I expected.
This link shows the naive method of swapping elements, and the unsafe method which produces the asm and performance I expected: https://go.dev/play/p/NQZn03hsQU4
Here are the benchmarks:
BenchmarkSwapIntPtrs-8 842213125 1.422 ns/op
BenchmarkSwapAny-8 749056459 1.575 ns/op
BenchmarkSwapAnyAsUintptrs-8 812162583 1.471 ns/op
My assumption is that the compiler should get the performance of the 3rd benchmark in which I use unsafe to force my expected asm to be generated, but is represented here as the 2nd benchmark.
What did you expect to see?
For the code
arr[0], arr[1] = arr[1], arr[0]
where arr is an array of the empty interface, I expected to see just one write barrier check, just as I do for the (paraphrased) asm for swapping a normal pointer type.
MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the array
CMPL runtime.writeBarrier
MOVQ both interface *value* parts into their new positions in the array
What did you see instead?
I unexpectedly found two writebarrier checks, where the asm was writing the values of each interface with separate writebarrier checks, like this (paraphrased) asm.
MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the slice
CMPL runtime.writeBarrier
MOVQ one interface *value* into its new position in the slice
CMPL runtime.writeBarrier
MOVQ the other interface *value* into its new position in the slice
By using unsafe to treat the [2]any as a [4]uintptr and doing 2 sets of swaps, I was able to observe the asm doing just 1 writebarrier check. I currently believe that one check has the same semantics as two checks (in regards to concurrent gc sweep correctness), and this might be incorrect. I'm almost certain however it has the same semantics in regards to concurrent observability promises.
What version of Go are you using (
go version)?Does this issue reproduce with the latest release?
Uncertain
What operating system and processor architecture are you using (
go env)?go envOutputWhat did you do?
I was profiling some code and found that swapping 2 elements of a slice was slower than expected. I found that swapping elements if they were pointer types instead of interfaces reached the performance I expected.
This link shows the naive method of swapping elements, and the unsafe method which produces the asm and performance I expected: https://go.dev/play/p/NQZn03hsQU4
Here are the benchmarks:
My assumption is that the compiler should get the performance of the 3rd benchmark in which I use unsafe to force my expected asm to be generated, but is represented here as the 2nd benchmark.
What did you expect to see?
For the code
where arr is an array of the empty interface, I expected to see just one write barrier check, just as I do for the (paraphrased) asm for swapping a normal pointer type.What did you see instead?
I unexpectedly found two writebarrier checks, where the asm was writing the values of each interface with separate writebarrier checks, like this (paraphrased) asm.
By using unsafe to treat the [2]any as a [4]uintptr and doing 2 sets of swaps, I was able to observe the asm doing just 1 writebarrier check. I currently believe that one check has the same semantics as two checks (in regards to concurrent gc sweep correctness), and this might be incorrect. I'm almost certain however it has the same semantics in regards to concurrent observability promises.