Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGSEGV in race + coverage mode #60825

Closed
prattmic opened this issue Jun 15, 2023 · 17 comments
Closed

runtime: SIGSEGV in race + coverage mode #60825

prattmic opened this issue Jun 15, 2023 · 17 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. okay-after-rc1 Used by release team to mark a release-blocker issue as okay to resolve either before or after rc1 release-blocker
Milestone

Comments

@prattmic
Copy link
Member

This failure occurred on a linux-amd64-longtest builder preparing for 1.21 RC1. I don't see any others like it in greplogs.

:: Running /tmp/workdir/go/src/all.bash with args ["/tmp/workdir/go/src/all.bash"] and env ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" "HOSTNAME=5f8883b9b15f" "DEBIAN_FRONTEND=noninteractive" "HOME=/root" "USER=root" "GO_STAGE0_NET_DELAY=6s" "GO_STAGE0_DL_DELAY=100ms" "WORKDIR=/tmp/workdir" "GOROOT_BOOTSTRAP=/tmp/workdir/go1.4" "GO_BUILDER_NAME=linux-arm64-longtest" "GO_TEST_SHORT=0" "GO_TEST_TIMEOUT_SCALE=5" "GOPATH=/tmp/workdir/gopath" "TMPDIR=/tmp/workdir/tmp" "GOCACHE=/tmp/workdir/gocache" "GOPLSCACHE=/tmp/workdir/goplscache" "PWD=/tmp/workdir/go/src"] in dir /tmp/workdir/go/src
...
Building Go cmd/dist using /tmp/workdir/go1.4. (go1.17.13 linux/arm64)
Building Go toolchain1 using /tmp/workdir/go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/arm64.

##### Test execution environment.
# GOARCH: arm64
# CPU: 
# GOOS: linux
# OS Version: Linux 5.15.109+ #1 SMP Fri Jun 9 10:57:31 UTC 2023 aarch64

##### Testing packages.
...
ok  	cmd/distpack	0.019s
ok  	cmd/doc	0.171s
ok  	cmd/fix	1.302s
vcs-test.golang.org rerouted to http://127.0.0.1:39197
https://vcs-test.golang.org rerouted to https://127.0.0.1:45633
go test proxy running at GOPROXY=http://127.0.0.1:45535/mod
2023/06/15 16:47:25 http: TLS handshake error from 127.0.0.1:50530: read tcp 127.0.0.1:45633->127.0.0.1:50530: read: connection reset by peer
2023/06/15 16:48:19 http: TLS handshake error from 127.0.0.1:40128: EOF
2023/06/15 16:48:19 http: TLS handshake error from 127.0.0.1:40124: EOF
2023/06/15 16:48:19 http: TLS handshake error from 127.0.0.1:40166: EOF
2023/06/15 16:48:19 http: TLS handshake error from 127.0.0.1:40152: EOF
2023/06/15 16:48:20 http: TLS handshake error from 127.0.0.1:40208: read tcp 127.0.0.1:45633->127.0.0.1:40208: read: connection reset by peer
2023/06/15 16:48:20 http: TLS handshake error from 127.0.0.1:40180: read tcp 127.0.0.1:45633->127.0.0.1:40180: read: connection reset by peer
2023/06/15 16:48:20 http: TLS handshake error from 127.0.0.1:40196: read tcp 127.0.0.1:45633->127.0.0.1:40196: read: connection reset by peer
2023/06/15 16:48:22 http: TLS handshake error from 127.0.0.1:40770: read tcp 127.0.0.1:45633->127.0.0.1:40770: read: connection reset by peer
2023/06/15 16:48:22 http: TLS handshake error from 127.0.0.1:40762: EOF
2023/06/15 16:48:22 http: TLS handshake error from 127.0.0.1:40746: read tcp 127.0.0.1:45633->127.0.0.1:40746: read: connection reset by peer
--- FAIL: TestScript (0.14s)
    --- FAIL: TestScript/cover_build_simple (1.11s)
        script_test.go:132: 2023-06-15T16:50:25Z
        script_test.go:134: $WORK=/tmp/workdir/tmp/cmd-go-test-3737275864/tmpdir3229299089/cover_build_simple301973454
        script_test.go:156: 
            # This test checks basic "go build -cover" functionality. (0.000s)
            # Hard-wire new coverage for this test. (0.000s)
            # Build for coverage. (0.669s)
            # First execute without GOCOVERDIR set... (0.046s)
            # ... then with GOCOVERDIR set. (0.119s)
            # Program makes a direct call to os.Exit(0). (0.055s)
            # Program makes a direct call to os.Exit(1). (0.067s)
            # Program invokes panic. (0.092s)
            # Skip remainder if no race detector support. (0.050s)
            > [!race] skip
            [condition not met]
            > env GOCOVERDIR=data2/normal
            > exec ./examplewithrace.exe normal
            [stderr]
            hi
            unexpected fault address 0x119f000
            fatal error: fault
            [signal SIGSEGV: segmentation violation code=0x1 addr=0x119f000 pc=0x9f330]
            
            goroutine 1 [running]:
            runtime.throw({0x120f56?, 0xc0000b86c0?})
            	/tmp/workdir/go/src/runtime/panic.go:1077 +0x40 fp=0xc0000c17b0 sp=0xc0000c1780 pc=0x6f240
            runtime.sigpanic()
            	/tmp/workdir/go/src/runtime/signal_unix.go:875 +0x22c fp=0xc0000c1810 sp=0xc0000c17b0 pc=0x85c6c
            racecallatomic()
            	/tmp/workdir/go/src/runtime/race_arm64.s:351 +0x10 fp=0xc0000c1830 sp=0xc0000c1820 pc=0x9f330
            sync/atomic.LoadInt32()
            	/tmp/workdir/go/src/runtime/race_arm64.s:207 +0x18 fp=0xc0000c1840 sp=0xc0000c1830 pc=0x9f598
            sync/atomic.LoadUint32(0x10?)
            	<autogenerated>:1 +0x14 fp=0xc0000c1860 sp=0xc0000c1840 pc=0xa0944
            sync/atomic.(*Uint32).Load(0x119effc)
            	/tmp/workdir/go/src/sync/atomic/type.go:121 +0x30 fp=0xc0000c1880 sp=0xc0000c1860 pc=0x3d730
            runtime/coverage.(*emitState).VisitFuncs.func1({0x119efec, 0x5, 0x1?}, {0xc0000b8678, 0x5?, 0x2})
            	/tmp/workdir/go/src/runtime/coverage/emit.go:474 +0xb0 fp=0xc0000c18f0 sp=0xc0000c1880 pc=0xf41d0
            runtime/coverage.(*emitState).VisitFuncs(0xc0000aa0a0, 0xc0000de030)
            	/tmp/workdir/go/src/runtime/coverage/emit.go:549 +0x5ac fp=0xc0000c1a50 sp=0xc0000c18f0 pc=0xf38bc
            internal/coverage/encodecounter.(*CoverageDataWriter).writeCounters(0xc00009e050, {0x146508, 0xc0000aa0a0}, 0xc0000b2040)
            	/tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:281 +0x1d4 fp=0xc0000c1aa0 sp=0xc0000c1a50 pc=0xec784
            internal/coverage/encodecounter.(*CoverageDataWriter).AppendSegment(0xc00009e050, 0x1c9815deafc5380e?, {0x146508, 0xc0000aa0a0})
            	/tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:181 +0x768 fp=0xc0000c1c40 sp=0xc0000c1aa0 pc=0xec0a8
            internal/coverage/encodecounter.(*CoverageDataWriter).Write(0x1f0d02?, {0xf1, 0x10, 0x5d, 0x90, 0x1, 0x2a, 0xd9, 0xbf, 0xe, ...}, ...)
            	/tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:69 +0x78 fp=0xc0000c1ca0 sp=0xc0000c1c40 pc=0xea6f8
            runtime/coverage.(*emitState).emitCounterDataFile(0xc0000aa0a0?, {0xf1, 0x10, 0x5d, 0x90, 0x1, 0x2a, 0xd9, 0xbf, 0xe, ...}, ...)
            	/tmp/workdir/go/src/runtime/coverage/emit.go:584 +0x7c fp=0xc0000c1d10 sp=0xc0000c1ca0 pc=0xf3cdc
            runtime/coverage.emitCounterDataToDirectory({0xc0000180e3, 0xc})
            	/tmp/workdir/go/src/runtime/coverage/emit.go:324 +0x25c fp=0xc0000c1e60 sp=0xc0000c1d10 pc=0xf1f8c
            runtime/coverage.emitCounterData()
            	/tmp/workdir/go/src/runtime/coverage/emit.go:285 +0xa4 fp=0xc0000c1ed0 sp=0xc0000c1e60 pc=0xf1c64
            runtime.runExitHooks.func1(0xc0000c1f28?)
            	/tmp/workdir/go/src/runtime/exithook.go:53 +0x58 fp=0xc0000c1f00 sp=0xc0000c1ed0 pc=0x99058
            runtime.runExitHooks(0x0)
            	/tmp/workdir/go/src/runtime/exithook.go:63 +0xa8 fp=0xc0000c1f30 sp=0xc0000c1f00 pc=0x45f88
            runtime.main()
            	/tmp/workdir/go/src/runtime/proc.go:269 +0x2c4 fp=0xc0000c1fd0 sp=0xc0000c1f30 pc=0x71b34
            runtime.goexit()
            	/tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc0000c1fd0 sp=0xc0000c1fd0 pc=0x9e664
            
            goroutine 2 [runnable]:
            runtime.forcegchelper()
            	/tmp/workdir/go/src/runtime/proc.go:313 fp=0xc0000567d0 sp=0xc0000567d0 pc=0x71d40
            runtime.goexit()
            	/tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc0000567d0 sp=0xc0000567d0 pc=0x9e664
            created by runtime.init.6 in goroutine 1
            	/tmp/workdir/go/src/runtime/proc.go:310 +0x24
            
            goroutine 18 [GC sweep wait]:
            runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
            	/tmp/workdir/go/src/runtime/proc.go:398 +0xc8 fp=0xc000067f60 sp=0xc000067f40 pc=0x71f68
            runtime.goparkunlock(...)
            	/tmp/workdir/go/src/runtime/proc.go:404
            runtime.bgsweep(0x0?)
            	/tmp/workdir/go/src/runtime/mgcsweep.go:280 +0xa0 fp=0xc000067fb0 sp=0xc000067f60 pc=0x5e950
            runtime.gcenable.func1()
            	/tmp/workdir/go/src/runtime/mgc.go:200 +0x28 fp=0xc000067fd0 sp=0xc000067fb0 pc=0x536c8
            runtime.goexit()
            	/tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc000067fd0 sp=0xc000067fd0 pc=0x9e664
            created by runtime.gcenable in goroutine 1
            	/tmp/workdir/go/src/runtime/mgc.go:200 +0x6c
            
            goroutine 19 [GC scavenge wait]:
            runtime.gopark(0xc000096000?, 0x144fd0?, 0x1?, 0x0?, 0xc00008a4e0?)
            	/tmp/workdir/go/src/runtime/proc.go:398 +0xc8 fp=0xc000066f50 sp=0xc000066f30 pc=0x71f68
            runtime.goparkunlock(...)
            	/tmp/workdir/go/src/runtime/proc.go:404
            runtime.(*scavengerState).park(0x1f0820)
            	/tmp/workdir/go/src/runtime/mgcscavenge.go:425 +0x5c fp=0xc000066f80 sp=0xc000066f50 pc=0x5c1dc
            runtime.bgscavenge(0x0?)
            	/tmp/workdir/go/src/runtime/mgcscavenge.go:653 +0x44 fp=0xc000066fb0 sp=0xc000066f80 pc=0x5c6f4
            runtime.gcenable.func2()
            	/tmp/workdir/go/src/runtime/mgc.go:201 +0x28 fp=0xc000066fd0 sp=0xc000066fb0 pc=0x53668
            runtime.goexit()
            	/tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc000066fd0 sp=0xc000066fd0 pc=0x9e664
            created by runtime.gcenable in goroutine 1
            	/tmp/workdir/go/src/runtime/mgc.go:201 +0xac
            
            goroutine 20 [runnable]:
            runtime.runfinq()
            	/tmp/workdir/go/src/runtime/mfinal.go:176 fp=0xc0000527d0 sp=0xc0000527d0 pc=0x52690
            runtime.goexit()
            	/tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc0000527d0 sp=0xc0000527d0 pc=0x9e664
            created by runtime.createfing in goroutine 1
            	/tmp/workdir/go/src/runtime/mfinal.go:163 +0x80
        script_test.go:156: FAIL: testdata/script/cover_build_simple.txt:50: exec ./examplewithrace.exe normal: exit status 2
FAIL
FAIL	cmd/go	244.157s

cc @thanm @golang/runtime

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 15, 2023
@prattmic prattmic added this to the Backlog milestone Jun 15, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jun 15, 2023
@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

Thanks, I will take a look.

You wrote:

This failure occurred on a linux-amd64-longtest

however the test debris looks like it is arm64 (I will assume this is an arm64 only failure).

@prattmic
Copy link
Member Author

Yes, it was arm64, apologies.

It is also worth noting that this wasn't a flake. It failed three times in a row. I'm running a new build again now, which I will be surprised if it doesn't fail again.

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

So far unable to reproduce on a regular linux-arm64 gomote (after a couple thousand runs). Weird.
Is there something special about the linux-arm64-longtest setup that is different from linux-arm64?
I will try a longtest gomote to see if that helps.

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

Never mind, I can see the failure now. Something about the script test setup seems to be a factor.

@prattmic
Copy link
Member Author

prattmic commented Jun 15, 2023

It seems to require GOEXPERIMENT=coverageredesign.

With these files, this is ok:

$ go build -o examplewithrace.exe -race -cover example/main
$ GOCOVERDIR=/tmp ./examplewithrace.exe normal
hi

But this fails:

$ GOEXPERIMENT=coverageredesign go build -o examplewithrace.exe -race -cover example/main
$ GOCOVERDIR=/tmp ./examplewithrace.exe normal                                                                                                                                    
hi
unexpected fault address 0x119f000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x119f000 pc=0x9f330]

goroutine 1 [running]:
runtime.throw({0x120f56?, 0xc0000a86f0?})
        /tmp/workdir/go/src/runtime/panic.go:1077 +0x40 fp=0xc0000bd7b0 sp=0xc0000bd780 pc=0x6f240
runtime.sigpanic()
        /tmp/workdir/go/src/runtime/signal_unix.go:875 +0x22c fp=0xc0000bd810 sp=0xc0000bd7b0 pc=0x85c6c
racecallatomic()
        /tmp/workdir/go/src/runtime/race_arm64.s:351 +0x10 fp=0xc0000bd830 sp=0xc0000bd820 pc=0x9f330
sync/atomic.LoadInt32()
        /tmp/workdir/go/src/runtime/race_arm64.s:207 +0x18 fp=0xc0000bd840 sp=0xc0000bd830 pc=0x9f598
sync/atomic.LoadUint32(0x10?)
        <autogenerated>:1 +0x14 fp=0xc0000bd860 sp=0xc0000bd840 pc=0xa0944
sync/atomic.(*Uint32).Load(0x119effc)
        /tmp/workdir/go/src/sync/atomic/type.go:121 +0x30 fp=0xc0000bd880 sp=0xc0000bd860 pc=0x3d730
runtime/coverage.(*emitState).VisitFuncs.func1({0x119efec, 0x5, 0x1?}, {0xc0000a86a8, 0x5?, 0x2})
        /tmp/workdir/go/src/runtime/coverage/emit.go:474 +0xb0 fp=0xc0000bd8f0 sp=0xc0000bd880 pc=0xf41d0
runtime/coverage.(*emitState).VisitFuncs(0xc0000a20a0, 0xc0000d6030)
        /tmp/workdir/go/src/runtime/coverage/emit.go:549 +0x5ac fp=0xc0000bda50 sp=0xc0000bd8f0 pc=0xf38bc
internal/coverage/encodecounter.(*CoverageDataWriter).writeCounters(0xc0000da000, {0x146508, 0xc0000a20a0}, 0xc0000ae040)
        /tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:281 +0x1d4 fp=0xc0000bdaa0 sp=0xc0000bda50 pc=0xec784
internal/coverage/encodecounter.(*CoverageDataWriter).AppendSegment(0xc0000da000, 0x8dfb09e0340d5f2a?, {0x146508, 0xc0000a20a0})
        /tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:181 +0x768 fp=0xc0000bdc40 sp=0xc0000bdaa0 pc=0xec0a8
internal/coverage/encodecounter.(*CoverageDataWriter).Write(0x1f0d02?, {0x53, 0x94, 0x8b, 0x96, 0xd0, 0x72, 0x24, 0x3c, 0x2a, ...}, ...)
        /tmp/workdir/go/src/internal/coverage/encodecounter/encode.go:69 +0x78 fp=0xc0000bdca0 sp=0xc0000bdc40 pc=0xea6f8
runtime/coverage.(*emitState).emitCounterDataFile(0xc0000a20a0?, {0x53, 0x94, 0x8b, 0x96, 0xd0, 0x72, 0x24, 0x3c, 0x2a, ...}, ...)
        /tmp/workdir/go/src/runtime/coverage/emit.go:584 +0x7c fp=0xc0000bdd10 sp=0xc0000bdca0 pc=0xf3cdc
runtime/coverage.emitCounterDataToDirectory({0xc00001001c, 0x4})
        /tmp/workdir/go/src/runtime/coverage/emit.go:324 +0x25c fp=0xc0000bde60 sp=0xc0000bdd10 pc=0xf1f8c 
runtime/coverage.emitCounterData()
        /tmp/workdir/go/src/runtime/coverage/emit.go:285 +0xa4 fp=0xc0000bded0 sp=0xc0000bde60 pc=0xf1c64
runtime.runExitHooks.func1(0xc0000bdf28?)
        /tmp/workdir/go/src/runtime/exithook.go:53 +0x58 fp=0xc0000bdf00 sp=0xc0000bded0 pc=0x99058
runtime.runExitHooks(0x0)
        /tmp/workdir/go/src/runtime/exithook.go:63 +0xa8 fp=0xc0000bdf30 sp=0xc0000bdf00 pc=0x45f88
runtime.main()
        /tmp/workdir/go/src/runtime/proc.go:269 +0x2c4 fp=0xc0000bdfd0 sp=0xc0000bdf30 pc=0x71b34
runtime.goexit()
        /tmp/workdir/go/src/runtime/asm_arm64.s:1197 +0x4 fp=0xc0000bdfd0 sp=0xc0000bdfd0 pc=0x9e664

I'm not sure why this isn't failing on the linux-arm64-longtest post-submit builder. Does that not use race mode?

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

I'm not sure why this isn't failing on the linux-arm64-longtest post-submit builder. Does that not use race mode.

This is also a puzzle for me.

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

I've debugged this for a bit and it looks as though the faulting load is happening at the very end of the BSS section; I think this is some sort of off-by-one error.

If I take the test case in question and add in another dummy BSS variable, e.g.

var QQQ [1000]int

the failure mode goes away. So somehow I think in this specific case the coverage counter slab in question winds up as the very last thing in BSS, and we wind up indexing off the end by 1 (not sure how, but that is what it looks like).

@prattmic
Copy link
Member Author

Regarding GOEXPERIMENT=coverageredesign:

Without GOEXPERIMENT set, .bss ends at 0x119efc0.
With GOEXPERIMENT set, .bss ends at 0x119f000.

Though coverageredesign is enabled by default, when passed explicitly it appears in build info:

$  go version -m examplewithrace.explicit.exe 
examplewithrace.explicit.exe: devel gomote.XXXXX
        path    example/main
        mod     example (devel)
        build   -buildmode=exe
        build   -compiler=gc
        build   -race=true
        build   CGO_ENABLED=1
        build   CGO_CFLAGS=
        build   CGO_CPPFLAGS=
        build   CGO_CXXFLAGS=
        build   CGO_LDFLAGS=
        build   GOARCH=arm64
        build   GOEXPERIMENT=coverageredesign
        build   GOOS=linux

This doesn't happen when it is enabled implicitly.

$ go version -m examplewithrace.implicit.exe 
examplewithrace.implict.exe: devel gomote.XXXXX
        path    example/main
        mod     example (devel)
        build   -buildmode=exe
        build   -compiler=gc
        build   -race=true
        build   CGO_ENABLED=1
        build   CGO_CFLAGS=
        build   CGO_CPPFLAGS=
        build   CGO_CXXFLAGS=
        build   CGO_LDFLAGS=
        build   GOARCH=arm64
        build   GOOS=linux

I believe the buildinfo string is in .data prior to .bss, so making the string longer would push .bss back a little bit. In other words, this is a coincidence.

Marking as a release blocker. With the current evidence it isn't clear that this is necessary related to either coverage or race modes, that could just be the unlucky combination.

@prattmic prattmic modified the milestones: Backlog, Go1.21 Jun 15, 2023
@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

One other weird thing that I can't explain here:

when I run the faulting program with GOCOVERDEBUG=true, I get the following trace output

=+= contents of covmetalist:
=+= slot: 0 path: example/sub 
=+= slot: 1 path: main 
=+= remap table:
...
=+= 3: pk=2 visit live fcn {i=3 F0 NC1}
=+= 7: pk=1 visit live fcn {i=7 F0 NC5}
unexpected fault address 0x119f000

This is kind of obscure but "=+= 3: pk=2 visit live fcn {i=3 F0 NC1}" is showing that we don't hit a "live" function in the counter segment until slot 3. This should not be happening given the nature of the test program-- it doesn't have any dead functions, so the first element in the counter segment should be non-zero. This also seems to point to some sort of off-by-N error.

I will spend a little time in the debugger to see if I can learn more.

@prattmic prattmic added the okay-after-rc1 Used by release team to mark a release-blocker issue as okay to resolve either before or after rc1 label Jun 15, 2023
@dr2chase
Copy link
Contributor

In the spirit of the snake-bit dog and fighting the previous war, I'd go looking for failure to save a register somewhere.

@cherrymui
Copy link
Member

From the stack trace

sync/atomic.(*Uint32).Load(0x119effc)

which loads from 0x119effc, which is a valid address, last 4 bytes in BSS.

unexpected fault address 0x119f000

But we fault at 4 bytes after it.

Where does the 4 come from? Well, this code https://cs.opensource.google/go/go/+/master:src/runtime/race_arm64.s;l=351 checks for address validity and faults on invalid address. It (blindly) does an 8-byte load (!), which makes it go over the segment...

I'll send a CL.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/503937 mentions this issue: runtime: use 1-byte load for address checking in racecallatomic

@thanm
Copy link
Contributor

thanm commented Jun 15, 2023

Thanks Cherry! Good detective work, I don't think I would have spotted that.

@prattmic
Copy link
Member Author

@gopherbot Please open backports for 1.19 and 1.20. This bug can cause crashes on seemingly arbitrary atomic operations in race mode.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #60844 (for 1.19), #60845 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/503976 mentions this issue: [release-branch.go1.20] runtime: use 1-byte load for address checking in racecallatomic

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/503977 mentions this issue: [release-branch.go1.19] runtime: use 1-byte load for address checking in racecallatomic

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 19, 2023
gopherbot pushed a commit that referenced this issue Jun 22, 2023
… in racecallatomic

In racecallatomic, we do a load before calling into TSAN, so if
the address is invalid we fault on the Go stack. We currently use
a 8-byte load instruction, regardless of the data size that the
atomic operation is performed on. So if, say, we are doing a
LoadUint32 at an address that is the last 4 bytes of a memory
mapping, we may fault unexpectedly. Do a 1-byte load instead.
(Ideally we should do a load with the right size, so we fault
correctly if we're given an unaligned address for a wide load
across a page boundary. Leave that for another CL.)

Fix AMD64, ARM64, and PPC64. The code already uses 1-byte load
on S390X.

Fixes #60844.
Updates #60825.

Change-Id: I3dee93eb08ba180c85e86a9d2e71b5b520e8dcf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/503937
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit 1a7709d)
Reviewed-on: https://go-review.googlesource.com/c/go/+/503977
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Bypass: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
gopherbot pushed a commit that referenced this issue Jun 22, 2023
… in racecallatomic

In racecallatomic, we do a load before calling into TSAN, so if
the address is invalid we fault on the Go stack. We currently use
a 8-byte load instruction, regardless of the data size that the
atomic operation is performed on. So if, say, we are doing a
LoadUint32 at an address that is the last 4 bytes of a memory
mapping, we may fault unexpectedly. Do a 1-byte load instead.
(Ideally we should do a load with the right size, so we fault
correctly if we're given an unaligned address for a wide load
across a page boundary. Leave that for another CL.)

Fix AMD64, ARM64, and PPC64. The code already uses 1-byte load
on S390X.

Fixes #60845.
Updates #60825.

Change-Id: I3dee93eb08ba180c85e86a9d2e71b5b520e8dcf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/503937
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
(cherry picked from commit 1a7709d)
Reviewed-on: https://go-review.googlesource.com/c/go/+/503976
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Austin Clements <austin@google.com>
@golang golang locked and limited conversation to collaborators Jun 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. okay-after-rc1 Used by release team to mark a release-blocker issue as okay to resolve either before or after rc1 release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants