Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: Swapping elements of a [2]any uses 2 separate writebarriers #62126

Closed
dans-stuff opened this issue Aug 18, 2023 · 6 comments
Closed
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dans-stuff
Copy link

dans-stuff commented Aug 18, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.1 darwin/amd64

Does this issue reproduce with the latest release?

Uncertain

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOOS="darwin"

What did you do?

I was profiling some code and found that swapping 2 elements of a slice was slower than expected. I found that swapping elements if they were pointer types instead of interfaces reached the performance I expected.

This link shows the naive method of swapping elements, and the unsafe method which produces the asm and performance I expected: https://go.dev/play/p/NQZn03hsQU4

Here are the benchmarks:

BenchmarkSwapIntPtrs-8          842213125                1.422 ns/op
BenchmarkSwapAny-8              749056459                1.575 ns/op
BenchmarkSwapAnyAsUintptrs-8    812162583                1.471 ns/op

My assumption is that the compiler should get the performance of the 3rd benchmark in which I use unsafe to force my expected asm to be generated, but is represented here as the 2nd benchmark.

What did you expect to see?

For the code

arr[0], arr[1] = arr[1], arr[0]
where arr is an array of the empty interface, I expected to see just one write barrier check, just as I do for the (paraphrased) asm for swapping a normal pointer type.

MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the array
CMPL runtime.writeBarrier
MOVQ both interface *value* parts into their new positions in the array

What did you see instead?

I unexpectedly found two writebarrier checks, where the asm was writing the values of each interface with separate writebarrier checks, like this (paraphrased) asm.

MOVQ the interfaces into registers
MOVQ both interface *type* parts into their new positions in the slice
CMPL runtime.writeBarrier
MOVQ one interface *value* into its new position in the slice
CMPL runtime.writeBarrier
MOVQ the other interface *value* into its new position in the slice

By using unsafe to treat the [2]any as a [4]uintptr and doing 2 sets of swaps, I was able to observe the asm doing just 1 writebarrier check. I currently believe that one check has the same semantics as two checks (in regards to concurrent gc sweep correctness), and this might be incorrect. I'm almost certain however it has the same semantics in regards to concurrent observability promises.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 18, 2023
@dans-stuff
Copy link
Author

It appears my question is much simpler: why do consecutive assignments to interfaces require multiple writebarrier checks, and why is this not required for consecutive assignments to pointer types? I'm not aware of the internals of gc, but this asymmetry strikes me as an unnecessary cost.

@randall77
Copy link
Contributor

The difference you're seeing is that when writing two pointers in a row, we can merge those two pointer writes under a single barrier check. When you're writing interfaces, or strings, the writes are pointer-scalar-pointer-scalar, so we need two write barrier checks because the intervening scalar write prevents merging the barrier checks.

Note that for 1.21 the generated write barrier code has changed significantly. I think the behavior you observed is still present, but may have different performance characteristics. The new barriers also may make this easier to fix because the new barriers can be batched more easily. In particular, it should be easier to just plain ignore intervening scalar writes.

@dans-stuff
Copy link
Author

That pointer-scalar split is what I was observing - though oddly, it reorders the scalars to be first, then the pointers to be next, and still splits them. I will try 1.21 to see if it is fixed.

Very helpful response though, thank you. It may be that in practice, the write barrier is negligible enough that it doesn't matter to perform multiple times.

@randall77
Copy link
Contributor

it reorders the scalars to be first

Yes, that's so the register holding the scalar is used first, so if it dies at that point it doesn't have to be saved around the write barrier call.
(Although that's an old justification, I'm not sure it matters in the last few versions of our barrier.)

@randall77
Copy link
Contributor

Here's a simple example for anyone looking at this:

type S struct {
	a *int
	b int
	c *int
}

func g(p *S) {
	p.a = nil
	p.b = 0
	p.c = nil
}

It would be nice if that compiled to a single barrier check + a gcWriteBarrier2 call, instead of two checks and two gcWriteBarrier1 calls.

@randall77 randall77 added this to the Unplanned milestone Aug 18, 2023
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/521498 mentions this issue: cmd/compile: allow non-pointer writes in the middle of a write barrier

cellularmitosis pushed a commit to cellularmitosis/go that referenced this issue Aug 24, 2023
This lets us combine more write barriers, getting rid of some of the
test+branch and gcWriteBarrier* calls.
With the new write barriers, it's easy to add a few non-pointer writes
to the set of values written.

We allow up to 2 non-pointer writes between pointer writes. This is enough
for, for example, adjacent slice fields.

Fixes golang#62126

Change-Id: I872d0fa9cc4eb855e270ffc0223b39fde1723c4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/521498
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants