Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: broken write barrier #71228

Closed
randall77 opened this issue Jan 11, 2025 · 5 comments
Closed

cmd/compile: broken write barrier #71228

randall77 opened this issue Jan 11, 2025 · 5 comments
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go release-blocker
Milestone

Comments

@randall77
Copy link
Contributor

randall77 commented Jan 11, 2025

I've found a case where we can get a pointer write without a corresponding write barrier.
(This is distilled from a Google-internal failure.)

Reproducer:

package main

import "math/rand/v2"

type S struct {
	a, b string
	ptr  *T
}

type T struct {
	x [64]*byte
}

// f is a fancy way of doing dst.ptr = ptr, without a write barrier.
//go:noinline
func f(dst *S, ptr *T) {
	_ = *dst // early nil check

	var s S
	sp := &s
	g = nil      // simple write barrier
	sp.ptr = ptr // put target pointer in s
	*dst = *sp   // move write barrier
}

var g *byte

const W = 4

func main() {
	for i := 1; i < W; i++ {
		go worker(i)
	}
	worker(0)
}

const N = 100000

// Keep all the S's in the heap reachable.
var workbufs [W][N]*S

func worker(w int) {
	workbuf := &workbufs[w]
	for i := range N {
		workbuf[i] = new(S)
	}
	const A = 100
	var allocbuf [A]*T
	for i := 0; i < 10000000; i++ {
		s := workbuf[rand.IntN(N)]
		j := rand.IntN(A)
		ptr := allocbuf[j]
		allocbuf[j] = nil
		f(s, ptr)
		allocbuf[j] = new(T)
	}
}

Run with GOMAXPROCS=2. Seems to fail 5% of the time or so (if anyone has ideas about how to get that percentage up, please do so). When it fails, we get a zombie object report like this:

runtime: marked free object in span 0x10ab655c8, elemsize=512 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr)
0x14024644000 alloc unmarked
0x14024644200 free  unmarked
0x14024644400 alloc unmarked
0x14024644600 free  unmarked
0x14024644800 alloc unmarked
0x14024644a00 free  marked   zombie
0x0000014024644a00:  0x0000000000000000  0x0000000000000000 
0x0000014024644a10:  0x0000000000000000  0x0000000000000000 
0x0000014024644a20:  0x0000000000000000  0x0000000000000000 
0x0000014024644a30:  0x0000000000000000  0x0000000000000000 
0x0000014024644a40:  0x0000000000000000  0x0000000000000000 
0x0000014024644a50:  0x0000000000000000  0x0000000000000000 
0x0000014024644a60:  0x0000000000000000  0x0000000000000000 
0x0000014024644a70:  0x0000000000000000  0x0000000000000000 
0x0000014024644a80:  0x0000000000000000  0x0000000000000000 
0x0000014024644a90:  0x0000000000000000  0x0000000000000000 
0x0000014024644aa0:  0x0000000000000000  0x0000000000000000 
0x0000014024644ab0:  0x0000000000000000  0x0000000000000000 
0x0000014024644ac0:  0x0000000000000000  0x0000000000000000 
0x0000014024644ad0:  0x0000000000000000  0x0000000000000000 
0x0000014024644ae0:  0x0000000000000000  0x0000000000000000 
0x0000014024644af0:  0x0000000000000000  0x0000000000000000 
0x0000014024644b00:  0x0000000000000000  0x0000000000000000 
0x0000014024644b10:  0x0000000000000000  0x0000000000000000 
0x0000014024644b20:  0x0000000000000000  0x0000000000000000 
0x0000014024644b30:  0x0000000000000000  0x0000000000000000 
0x0000014024644b40:  0x0000000000000000  0x0000000000000000 
0x0000014024644b50:  0x0000000000000000  0x0000000000000000 
0x0000014024644b60:  0x0000000000000000  0x0000000000000000 
0x0000014024644b70:  0x0000000000000000  0x0000000000000000 
0x0000014024644b80:  0x0000000000000000  0x0000000000000000 
0x0000014024644b90:  0x0000000000000000  0x0000000000000000 
0x0000014024644ba0:  0x0000000000000000  0x0000000000000000 
0x0000014024644bb0:  0x0000000000000000  0x0000000000000000 
0x0000014024644bc0:  0x0000000000000000  0x0000000000000000 
0x0000014024644bd0:  0x0000000000000000  0x0000000000000000 
0x0000014024644be0:  0x0000000000000000  0x0000000000000000 
0x0000014024644bf0:  0x0000000000000000  0x0000000000000000 
0x14024644c00 alloc unmarked
0x14024644e00 alloc unmarked
0x14024645000 alloc unmarked
0x14024645200 alloc marked  
0x14024645400 alloc unmarked
0x14024645600 alloc marked  
0x14024645800 alloc unmarked
0x14024645a00 free  unmarked
0x14024645c00 free  unmarked
fatal error: found pointer to free object

This is because in f, the write of ptr to dst.ptr is done erroneously without a write barrier, which ends up hiding a pointer to a white object in a black object.

This fails at tip, 1.23.4, and 1.22.6.
I suspect it may have started with CL 447780.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jan 11, 2025
@randall77 randall77 added this to the Go1.24 milestone Jan 11, 2025
@randall77 randall77 added release-blocker Critical A critical problem that affects the availability or correctness of production systems built using Go labels Jan 11, 2025
@randall77
Copy link
Contributor Author

@gopherbot please open backport issues.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #71229 (for 1.22), #71230 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@randall77
Copy link
Contributor Author

randall77 commented Jan 11, 2025

This does not reproduce on 1.21. So I think the cause is probably a combination of CL 447780 (in 1.21) and CL 521498 (in 1.22).

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Jan 11, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/642197 mentions this issue: cmd/compile: fix write barrier coalescing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. Critical A critical problem that affects the availability or correctness of production systems built using Go release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants