Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile,runtime: checkptr race condition #38872

Open
cuonglm opened this issue May 5, 2020 · 13 comments
Open

cmd/compile,runtime: checkptr race condition #38872

cuonglm opened this issue May 5, 2020 · 13 comments
Milestone

Comments

@cuonglm
Copy link
Contributor

@cuonglm cuonglm commented May 5, 2020

What version of Go are you using (go version)?

$ go version
go version devel +9b189686a5 Tue May 5 05:13:26 2020 +0000 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/cuonglm/Library/Caches/go-build"
GOENV="/Users/cuonglm/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/cuonglm/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/cuonglm/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/cuonglm/sources/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/cuonglm/sources/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
GOAMD64="alignedjumps"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/y4/hs76ltbn7sb66lw_6934kq4m0000gn/T/go-build903704744=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

$ cat main.go
package main

import (
	"fmt"
	"sync"
	"sync/atomic"
	"unsafe"
)

type onceSpy struct {
	done uint64
	m    sync.Mutex
}

// ResetSyncOnce will reset sync.Once to default state.
func ResetSyncOnce(once *sync.Once) {
	spy := (*onceSpy)(unsafe.Pointer(once))
	atomic.StoreUint64(&spy.done, 0)
}
func main() {
	o := sync.Once{}
	o.Do(func() {
		fmt.Println("hello 1")
	})
	ResetSyncOnce(&o)
	o.Do(func() {
		fmt.Println("hello 2")
	})
}

$ go build -gcflags=-d=checkptr main.go
$ for _ in {1..10}; do ./main; done

What did you expect to see?

Consistent result between 10 times run.

What did you see instead?

The result is inconsistent, sometimes passes, sometimes checkptr report errors.

$ for _ in {1..10}; do ./main; done 
hello 1
hello 2
hello 1
hello 2
hello 1
hello 2
hello 1
hello 2
hello 1
fatal error: checkptr: converted pointer straddles multiple allocations

goroutine 1 [running]:
runtime.throw(0x10d1368, 0x3a)
	/Users/cuonglm/sources/go/src/runtime/panic.go:1116 +0x72 fp=0xc000098f28 sp=0xc000098ef8 pc=0x1030b32
runtime.checkptrAlignment(0xc0000b4004, 0x10b8620, 0x1)
	/Users/cuonglm/sources/go/src/runtime/checkptr.go:20 +0xc9 fp=0xc000098f58 sp=0xc000098f28 pc=0x1005e69
main.ResetSyncOnce(...)
	/Users/cuonglm/main.go:17
main.main()
	/Users/cuonglm/main.go:25 +0x65 fp=0xc000098f88 sp=0xc000098f58 pc=0x10a5625
runtime.main()
	/Users/cuonglm/sources/go/src/runtime/proc.go:204 +0x223 fp=0xc000098fe0 sp=0xc000098f88 pc=0x10332c3
runtime.goexit()
	/Users/cuonglm/sources/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x1061241
hello 1
fatal error: checkptr: converted pointer straddles multiple allocations

goroutine 1 [running]:
runtime.throw(0x10d1368, 0x3a)
	/Users/cuonglm/sources/go/src/runtime/panic.go:1116 +0x72 fp=0xc000098f28 sp=0xc000098ef8 pc=0x1030b32
runtime.checkptrAlignment(0xc0000b4004, 0x10b8620, 0x1)
	/Users/cuonglm/sources/go/src/runtime/checkptr.go:20 +0xc9 fp=0xc000098f58 sp=0xc000098f28 pc=0x1005e69
main.ResetSyncOnce(...)
	/Users/cuonglm/main.go:17
main.main()
	/Users/cuonglm/main.go:25 +0x65 fp=0xc000098f88 sp=0xc000098f58 pc=0x10a5625
runtime.main()
	/Users/cuonglm/sources/go/src/runtime/proc.go:204 +0x223 fp=0xc000098fe0 sp=0xc000098f88 pc=0x10332c3
runtime.goexit()
	/Users/cuonglm/sources/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x1061241
hello 1
hello 2
hello 1
hello 2
hello 1
fatal error: checkptr: converted pointer straddles multiple allocations

goroutine 1 [running]:
runtime.throw(0x10d1368, 0x3a)
	/Users/cuonglm/sources/go/src/runtime/panic.go:1116 +0x72 fp=0xc000098f28 sp=0xc000098ef8 pc=0x1030b32
runtime.checkptrAlignment(0xc0000b4004, 0x10b8620, 0x1)
	/Users/cuonglm/sources/go/src/runtime/checkptr.go:20 +0xc9 fp=0xc000098f58 sp=0xc000098f28 pc=0x1005e69
main.ResetSyncOnce(...)
	/Users/cuonglm/main.go:17
main.main()
	/Users/cuonglm/main.go:25 +0x65 fp=0xc000098f88 sp=0xc000098f58 pc=0x10a5625
runtime.main()
	/Users/cuonglm/sources/go/src/runtime/proc.go:204 +0x223 fp=0xc000098fe0 sp=0xc000098f88 pc=0x10332c3
runtime.goexit()
	/Users/cuonglm/sources/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000098fe8 sp=0xc000098fe0 pc=0x1061241
hello 1
hello 2

cc @mdempsky

@cuonglm
Copy link
Contributor Author

@cuonglm cuonglm commented May 5, 2020

The problem does not occur if I removed m sync.Mutex from struct.

@agnivade agnivade added this to the Go1.15 milestone May 5, 2020
@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 5, 2020

sync.Once.done's type is uint32, not uint64, so I think this is a legit checkptr report. Not sure off hand why it wouldn't consistently report this though.

I'd guess it has something to do with either scheduling of the concurrent GC, or with placement of o in the heap.

@cuonglm
Copy link
Contributor Author

@cuonglm cuonglm commented May 5, 2020

sync.Once.done's type is uint32, not uint64, so I think this is a legit checkptr report. Not sure off hand why it wouldn't consistently report this though.

I'd guess it has something to do with either scheduling of the concurrent GC, or with placement of o in the heap.

If m sync.Mutex was removed, then checkptr does not report, sounds like another issue.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 5, 2020

Why? It's not an error to convert from *BigType to *SmallType. If you remove m sync.Mutex, then unsafe.Sizeof(onceSpy{}) <= unsafe.Sizeof(sync.Once{}), right?

@cuonglm
Copy link
Contributor Author

@cuonglm cuonglm commented May 5, 2020

Why? It's not an error to convert from *BigType to *SmallType. If you remove m sync.Mutex, then unsafe.Sizeof(onceSpy{}) <= unsafe.Sizeof(sync.Once{}), right?

Ah right 👍

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2020

Hmm, but should the race detector emit an error when a 32-bit atomic op occurs concurrently with a 64-bit atomic op?

My recollection is that on some 32-bit architectures, the 32-bit atomic ops use an entirely separate code path from the 64-bit ones, so I can't imagine that the memory model allows it.

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2020

(Oh, right: the memory model doesn't define atomic at all: that's #5045.)

@bcmills
Copy link
Member

@bcmills bcmills commented May 5, 2020

Filed that separately as #38881.

@cuonglm
Copy link
Contributor Author

@cuonglm cuonglm commented May 15, 2020

@mdempsky

I'd guess it has something to do with either scheduling of the concurrent GC, or with placement of o in the heap.

Seems concurrent GC problem, because if I set GOGC=off, then checkptr reports it consistently.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 15, 2020

Hmm, but should the race detector emit an error when a 32-bit atomic op occurs concurrently with a 64-bit atomic op?

This is worth discussing in #38881, but I think it's a red herring here. The 32-bit and 64-bit atomic ops that modify o are all happening on the same goroutine anyway, so there's no race.

Seems concurrent GC problem, because if I set GOGC=off, then checkptr reports it consistently.

Interesting. I wonder if there's a race condition in the calls to findObject in checkptrAlignment. Maybe it's not actually safe to call checkptrBase(add(p, size-1))?

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 15, 2020

Hm, I'm not able to reproduce this on linux/amd64. I don't get any checkptr reports, even with GOGC=off.

@cuonglm
Copy link
Contributor Author

@cuonglm cuonglm commented May 16, 2020

Hm, I'm not able to reproduce this on linux/amd64. I don't get any checkptr reports, even with GOGC=off.

Confirm that this does not happen on linux/amd64, but does on darwin/amd64, sounds like a false negative on linux/amd64?

@mdempsky
Copy link
Member

@mdempsky mdempsky commented May 16, 2020

Yeah, not reporting on linux/amd64 is a false negative. (With the usual caveats that checkptr is best-effort, so some false negatives will be inevitable; though I can't immediately see why the behavior here would be OS-specific.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.