Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected signal during runtime execution #41099

Open
dnephin opened this issue Aug 28, 2020 · 10 comments
Open

runtime: unexpected signal during runtime execution #41099

dnephin opened this issue Aug 28, 2020 · 10 comments

Comments

@dnephin
Copy link
Contributor

@dnephin dnephin commented Aug 28, 2020

What version of Go are you using (go version)?

go1.14.7, also observed the same problem with earlier versions: go1.13.7

Does this issue reproduce with the latest release?

Unknown

What operating system and processor architecture are you using (go env)?

Linux 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
(Oracle Linux Server release 7.6)

What did you do?

Original issue reported here: hashicorp/consul#8558

The Consul binary appears to be crashing with errors from the runtime, ex:

  • fatal error: stopg: invalid status
  • fatal error: unexpected signal during runtime execution
  • [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
  • runtime: unexpected return pc for runtime.sigpanic called from 0x0

The stack trace from the application shows the line as a select blocking on a channel from signal.Notify (source). I'm not sure if that is related to the underlying issue, or has to do with the signal from the runtime.

Full trace
runtime: gp: gp=0xc000450180, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc0005b5680, goid=0,  g->atomicstatus=0
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450300, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc000604180, goid=0,  g->atomicstatus=0
fatal error: unexpected signal during runtime execution
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450000, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc000566180, goid=0,  g->atomicstatus=0
fatal error: stopg: invalid status
runtime stack:
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
goroutine 34 [GC worker (idle)]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00028ef60 sp=0xc00028ef58 pc=0x45d6d0
runtime.gcBgMarkWorker(0xc00005ef00)
/usr/local/go/src/runtime/mgc.go:1900 +0x1bd fp=0xc00028efd8 sp=0xc00028ef60 pc=0x41c38d
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00028efe0 sp=0xc00028efd8 pc=0x45f621
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1794 +0x77
goroutine 1 [select, 149 minutes]:
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000651500, 0xc000170020, 0x8, 0x8, 0x0)
/home/circleci/project/consul/command/agent/agent.go:331 +0x13eb
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000651500, 0xc000170020, 0x8, 0x8, 0xc0001452a0)
/home/circleci/project/consul/command/agent/agent.go:78 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00031c000, 0xc00031c000, 0x80, 0xc000145ac0)
/go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x1da
main.realMain(0xc00014a058)
/home/circleci/project/consul/main.go:53 +0x393
main.main()
/home/circleci/project/consul/main.go:20 +0x22
goroutine 20 [syscall, 149 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 23 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x50186a0)
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x26f
goroutine 0 [idle]:
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x452486]
runtime stack:
runtime.throw(0x300b2ed, 0x2a)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:378 +0x47c
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:159 +0x156
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0)
/usr/local/go/src/runtime/traceback.go:722 +0xf0
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000)
/usr/local/go/src/runtime/traceback.go:676 +0x52
runtime.tracebackothers(0xc0005b5680)
/usr/local/go/src/runtime/traceback.go:929 +0x1a6
runtime.dopanic_m(0xc0005b5680, 0x42f602, 0xc000aabdf8, 0x1)
/usr/local/go/src/runtime/panic.go:974 +0x29e
runtime.fatalthrow.func1()
/usr/local/go/src/runtime/panic.go:829 +0x5f
runtime.fatalthrow()
/usr/local/go/src/runtime/panic.go:826 +0x57
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0xc000c61dd8, fp:0xc000c61e08} stack=[0xc000c60000,0xc000c62000)
000000c000c61cd8:  000000c0007bc180  01000000000003e8
000000c000c61ce8:  0000000000000004  000000000000001f
000000c000c61cf8:  0000000000000000  0000000000000000
000000c000c61d08:  0000000000000001  0000000002fdb0bd
000000c000c61d18:  000000c000c61d60  000000000045bfbf <runtime.fatalthrow.func1+95>

I looked around this issue tracker for an existing issue. I saw some related things, but nothing that sounded like exactly the same problem.

One issue mentioned C signal handlers. There should not be any such handlers in this binary.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Aug 28, 2020

Sadly, with the release of go 1.15, go 1.13 is no longer receiving active support.

Can you please try with the latest release of Go can confirm if the problem occurs.

@dnephin
Copy link
Contributor Author

@dnephin dnephin commented Aug 28, 2020

Ah, that's right, thank you. I've asked they try with a more recent release, which uses go1.14.x.

@cagedmantis cagedmantis changed the title [go1.13.15] fatal error: unexpected signal during runtime execution runtime: unexpected signal during runtime execution Aug 30, 2020
@dnephin
Copy link
Contributor Author

@dnephin dnephin commented Sep 8, 2020

More details from hashicorp/consul#8558 (comment). They were able to reproduce with go1.14.7. The full log is attached. They also said it occurs only on hosts with Intel Optane connected in RAM mode.

The stack is a bit different, but the errors look similar

fatal error: unexpected signal during runtime execution
fatal error: runtimer: bad p
runtime: unexpected return pc for runtime.sigpanic called from 0x0
fatal error: cleantimers: bad p

consul-agent-kn-0030_2020.09.02_trace.log

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 8, 2020

cc @aclements @randall77 @prattmic

@gopherbot remove WaitingForInfo

@andybons andybons added this to the Unplanned milestone Sep 8, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 8, 2020

This looks a lot like memory corruption.

@fanatl
Copy link

@fanatl fanatl commented Sep 9, 2020

Hello to all.
These fatal errors have appeared since version 1.12 and older.
And is repeated only on hosts with Intel optane as RAM.

@davecheney
Copy link
Contributor

@davecheney davecheney commented Sep 9, 2020

All the more reason to determine if the binary contains data races or cgo related memory corruption.

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 9, 2020

@dnephin have you built the app with the race detector?

@dnephin
Copy link
Contributor Author

@dnephin dnephin commented Sep 9, 2020

I've been working on running the tests with the race detector enabled. We run some packages with -race, but not all. We don't build binaries with it enabled. Would that help identify the problem?

@networkimprov
Copy link

@networkimprov networkimprov commented Sep 9, 2020

A data race could cause memory corruption, so building the app with the race detector may help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.