Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: process crash instead of panic on SIGBUS with SetPanicOnDefault(true) #41155

florisch opened this issue Sep 1, 2020 · 10 comments
help wanted NeedsFix


Copy link

@florisch florisch commented Sep 1, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15 windows/amd64

Does this issue reproduce with the latest release?


What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=arm
set GOBIN=
set GOCACHE=C:\Users\Florian\AppData\Local\go-build
set GOENV=C:\Users\Florian\AppData\Roaming\go\env
set GOEXE=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOMODCACHE=C:\Users\Florian\go\pkg\mod
set GOOS=linux
set GOPATH=C:\Users\Florian\go
set GOPROXY=,direct
set GOROOT=c:\go
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set GOARM=7
set AR=ar
set CC=gcc
set CXX=g++
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-fPIC -marm -fmessage-length=0 -fdebug-prefix-map=C:\Users\Florian\AppData\Local\Temp\go-build326602894=/tmp/go-build -gno-record-gcc-switches
GOROOT/bin/go version: go version go1.15 windows/amd64
GOROOT/bin/go tool compile -V: compile version go1.15
gdb --version: GNU gdb (GDB) 8.1

What did you do?

We are using Go for some embedded development (cross compiled to linux arm32). We access various FPGA registers from the Go process. In order to access those registers, we use mmap /dev/mem at the address space of those registers.

When we access registers which are not defined/accessible in the FPGA, the process crash with the error reported below.

We use defer debug.SetPanicOnFault(debug.SetPanicOnFault(true)) in the stack which makes the register read as we expect this to make the runtime panic instead of crash on this kind of memory fault.

What did you expect to see?

A panic where the bad access happened. This way, with a recover call, it would be possible to handle the case where some registers are not available.

What did you see instead?

The process crash, in an unrecoverable way, with the following output:

Unhandled fault: external abort on non-linefetch (0x018) at 0x26b48010
pgd = 5e090000
[26b48010] *pgd=1e234831, *pte=40040703, *ppte=40040e33 SIGBUS: bus error PC=0x2a8ff0 m=0 sigcode=0 goroutine 43 [running]: gobv1/pkg/hw/pmem.Access.ReadUint32(...)      C:/projects/ellisys/bv1go/pkg/hw/pmem/memAccess_linux.go:122 gobv1/pkg/hw/pmem.(*Access).ReadUint32(0x925200, 0x10, 0x28e594)      <autogenerated>:1 +0x44 fp=0x8a6acc sp=0x8a6aa4 pc=0x2a8ff0
... main.(*command).initializeDevice(0x9222c0, 0x922b80)      C:/projects/ellisys/bv1go/cmd/gobv1/main.go:154 +0x94 fp=0x8a6fe4 sp=0x8a6fa0 pc=0x371320 runtime.goexit()
... goroutine 20 [select]: io.(*pipe).Read(0x922280, 0x84c000, 0x1000, 0x1000, 0x3b50e8, 0x1136b0, 0x84c000)      C:/Go/src/io/pipe.go:57 +0xac
... goroutine 42 [runnable]:
... trap    0x0 error   0x18 oldmask 0x0 r0      0x26b48000 r1      0x3c r2      0x8a6acc r3      0x10 r4      0x1 r5      0x1 r6      0xf1 r7      0x26ccc521 r8      0x925200 r9      0x20 r10     0x883500 fp      0x7 ip      0x925203 sp      0x8a6aa4 lr      0x2a8fdc pc      0x2a8ff0 cpsr    0x80000010 fault   0x0 Program instance execution terminated


I build a custom runtime with this commit which makes the call panic as expected.

@ianlancetaylor ianlancetaylor changed the title Process crash instead of panic on SIGBUS runtime: process crash instead of panic on SIGBUS with SetPanicOnDefault(true) Sep 2, 2020
@ianlancetaylor ianlancetaylor added the NeedsFix label Sep 2, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.16 milestone Sep 2, 2020
Copy link

@tpaschalis tpaschalis commented Sep 3, 2020

I'm not sure how to replicate this failure, but I'd like to give this a shot.

Do we think that the posted workaround is something that could also be long-term solution?

Copy link

@ianlancetaylor ianlancetaylor commented Sep 3, 2020

Please avoid looking at the workaround (and, everyone, please avoid posting patches through the issue tracker). We want patches to only come in as Gerritt code reviews or GitHub pull requests, because then we have automation that confirms that the copyright assignments are in order. Thanks.

To put it another way, I can't answer your question about the posted workaround because I'm not going to look at it. Sorry.

I think you might be able to write a test that gets a SIGBUS by using mmap to map memory as read-only and then trying to write to it. I'm not really sure, though.

Copy link

@tpaschalis tpaschalis commented Sep 4, 2020

Thanks for the pointers, I'll try to get a repro done, and then see how the issue can be fixed!

Copy link

@florisch florisch commented Sep 4, 2020

Thank you for looking into this. I tough I should open a ticket for discussion before creating a PR. Sorry if I didn't respect the rules by adding a link to my workaround commit in the ticket.

If desired, I would be happy to contribute to fix this issue and make a PR. For now, I try to find a way to write a test which could be integrated with the regular test suite to reproduce this issue without our embedded FPGA platform.

I created a test doing what @ianlancetaylor suggested. Doing this doesn't reproduce the issue. This result in the expected panic: runtime error: invalid memory address or nil pointer dereference (both on a linux desktop and on our embedded platform).

Copy link

@networkimprov networkimprov commented Sep 4, 2020

Copy link

@tpaschalis tpaschalis commented Sep 5, 2020

For now, I try to find a way to write a test which could be integrated with the regular test suite to reproduce this issue without our embedded FPGA platform.

This would a good first step; I hope I can assist in that as well. (and also, thanks for having a positive attitude to getting to the bottom of this!)

The following code uses CGO and triggers a SIGBUS. I tried it on darwin and linux, but could not get the same error. This happens both with and without the debug.SetPanicOnFault(debug.SetPanicOnFault(true)) line. On the other hand CGO is a different beast, and maybe that's why the same error does not appear.

Code :
Output :

fatal error: unexpected signal during runtime execution
[signal SIGBUS: bus error code=0x2 addr=0x7ff893c38000 pc=0x46ca3d]

runtime stack:
runtime.throw(0x48bc4c, 0x2a)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
	/usr/local/go/src/runtime/signal_unix.go:704 +0x4ac

EDIT: Here's the same using syscall.Mmap instead of CGO.
Code :
Output with debug.SetPanicOnFault

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGBUS: bus error code=0x2 addr=0x7f3ac1f2c000 pc=0x46ceac]

Output without debug.SetPanicOnFault

unexpected fault address 0x7fe33aac3000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x7fe33aac3000 pc=0x494db6]

Copy link

@florisch florisch commented Sep 8, 2020

I tried the code using mmap on our embedded platform, and see the same behavior. Then I modified the runtime to print the flags and the sigcode when a SIGBUS is received.

Output of SIGBUS generated by sample from previous comment

SIGBUS flags=0x0x88 sigcode=0x2

Output with SIGBUS generated by a bad register access

SIGBUS flags=0x0x88 sigcode=0x0

Since sigcode 0 match with _SI_USER, it is not handled properly in the case of our bad register access while it is handled properly when generated using code from previous comment.

Copy link

@florisch florisch commented Sep 14, 2020

Here is a minimal code which reproduce the issue on armv7. The same code on amd64 doesn't reproduce the issue as mmap simply refuse to mmap bad addresses.

SIGBUS flags=0x0x88 sigcode=0x 0x0
SIGBUS: bus error
PC=0xa2728 m=0 sigcode=0

goroutine 1 [running]:
        gobv1/tools/crash/main.go:37 +0x240 fp=0x4227b8 sp=0x422740 pc=0xa2728
        runtime/proc.go:205 +0x208 fp=0x4227e4 sp=0x4227b8 pc=0x427f8
        runtime/asm_arm.s:857 +0x4 fp=0x4227e4 sp=0x4227e4 pc=0x6d8f0

trap    0x0
error   0x1818
oldmask 0x0
r0      0x0
r1      0x1000
r2      0x26c2c000
r3      0x0
r4      0x4
r5      0x0
r6      0x26c2cfff
r7      0x0
r8      0x7
r9      0x1
r10     0x4000e0
fp      0x14d078
ip      0xd
sp      0x422740
lr      0x119f0
pc      0xa2728
cpsr    0x20000010
fault   0x0

Copy link

@odeke-em odeke-em commented Feb 6, 2021

Punting to Go1.17, thank you all for the patience, and for the discussion, please keep it going.

@odeke-em odeke-em removed this from the Go1.16 milestone Feb 6, 2021
@odeke-em odeke-em added this to the Go1.17 milestone Feb 6, 2021
Copy link

@ianlancetaylor ianlancetaylor commented Apr 30, 2021

I don't understand why the kernel would send a signal with si_code set to SI_USER. That seems like a kernel bug. The SI_USER code is supposed to indicate an explicit use of the kill system call. I don't mind working around a kernel bug but we don't want to treat all SIGBUS signals with si_code == SI_USER as indicating an actual bus error.

@ianlancetaylor ianlancetaylor removed this from the Go1.17 milestone Apr 30, 2021
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
help wanted NeedsFix
None yet

No branches or pull requests

5 participants