Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected signal during runtime execution #41099

Closed
dnephin opened this issue Aug 28, 2020 · 14 comments
Closed

runtime: unexpected signal during runtime execution #41099

dnephin opened this issue Aug 28, 2020 · 14 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@dnephin
Copy link
Contributor

dnephin commented Aug 28, 2020

What version of Go are you using (go version)?

go1.14.7, also observed the same problem with earlier versions: go1.13.7

Does this issue reproduce with the latest release?

Unknown

What operating system and processor architecture are you using (go env)?

Linux 4.14.35-1818.3.3.el7uek.x86_64 #2 SMP Mon Sep 24 14:45:01 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
(Oracle Linux Server release 7.6)

What did you do?

Original issue reported here: hashicorp/consul#8558

The Consul binary appears to be crashing with errors from the runtime, ex:

  • fatal error: stopg: invalid status
  • fatal error: unexpected signal during runtime execution
  • [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
  • runtime: unexpected return pc for runtime.sigpanic called from 0x0

The stack trace from the application shows the line as a select blocking on a channel from signal.Notify (source). I'm not sure if that is related to the underlying issue, or has to do with the signal from the runtime.

Full trace
runtime: gp: gp=0xc000450180, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc0005b5680, goid=0,  g->atomicstatus=0
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450300, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc000604180, goid=0,  g->atomicstatus=0
fatal error: unexpected signal during runtime execution
fatal error: stopg: invalid status
runtime: gp: gp=0xc000450000, goid=0, gp->atomicstatus=0
runtime:  g:  g=0xc000566180, goid=0,  g->atomicstatus=0
fatal error: stopg: invalid status
runtime stack:
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
goroutine 34 [GC worker (idle)]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00028ef60 sp=0xc00028ef58 pc=0x45d6d0
runtime.gcBgMarkWorker(0xc00005ef00)
/usr/local/go/src/runtime/mgc.go:1900 +0x1bd fp=0xc00028efd8 sp=0xc00028ef60 pc=0x41c38d
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00028efe0 sp=0xc00028efd8 pc=0x45f621
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1794 +0x77
goroutine 1 [select, 149 minutes]:
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000651500, 0xc000170020, 0x8, 0x8, 0x0)
/home/circleci/project/consul/command/agent/agent.go:331 +0x13eb
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000651500, 0xc000170020, 0x8, 0x8, 0xc0001452a0)
/home/circleci/project/consul/command/agent/agent.go:78 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00031c000, 0xc00031c000, 0x80, 0xc000145ac0)
/go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x1da
main.realMain(0xc00014a058)
/home/circleci/project/consul/main.go:53 +0x393
main.main()
/home/circleci/project/consul/main.go:20 +0x22
goroutine 20 [syscall, 149 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 23 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x50186a0)
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x26f
goroutine 0 [idle]:
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x452486]
runtime stack:
runtime.throw(0x300b2ed, 0x2a)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:378 +0x47c
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
/usr/local/go/src/runtime/traceback.go:159 +0x156
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000, 0x0)
/usr/local/go/src/runtime/traceback.go:722 +0xf0
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000450000)
/usr/local/go/src/runtime/traceback.go:676 +0x52
runtime.tracebackothers(0xc0005b5680)
/usr/local/go/src/runtime/traceback.go:929 +0x1a6
runtime.dopanic_m(0xc0005b5680, 0x42f602, 0xc000aabdf8, 0x1)
/usr/local/go/src/runtime/panic.go:974 +0x29e
runtime.fatalthrow.func1()
/usr/local/go/src/runtime/panic.go:829 +0x5f
runtime.fatalthrow()
/usr/local/go/src/runtime/panic.go:826 +0x57
runtime.throw(0x2fac299, 0x15)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.scang(0xc000450180, 0xc000060170)
/usr/local/go/src/runtime/proc.go:869 +0x379
runtime.markroot.func1()
/usr/local/go/src/runtime/mgcmark.go:221 +0x6d
runtime.markroot(0xc000060170, 0xc00000001b)
/usr/local/go/src/runtime/mgcmark.go:202 +0x2f3
runtime.gcDrain(0xc000060170, 0x7)
/usr/local/go/src/runtime/mgcmark.go:915 +0x110
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1936 +0x16b
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1146
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
runtime stack:
runtime: unexpected return pc for runtime.sigpanic called from 0x0
stack: frame={sp:0xc000c61dd8, fp:0xc000c61e08} stack=[0xc000c60000,0xc000c62000)
000000c000c61cd8:  000000c0007bc180  01000000000003e8
000000c000c61ce8:  0000000000000004  000000000000001f
000000c000c61cf8:  0000000000000000  0000000000000000
000000c000c61d08:  0000000000000001  0000000002fdb0bd
000000c000c61d18:  000000c000c61d60  000000000045bfbf <runtime.fatalthrow.func1+95>

I looked around this issue tracker for an existing issue. I saw some related things, but nothing that sounded like exactly the same problem.

One issue mentioned C signal handlers. There should not be any such handlers in this binary.

@davecheney
Copy link
Contributor

Sadly, with the release of go 1.15, go 1.13 is no longer receiving active support.

Can you please try with the latest release of Go can confirm if the problem occurs.

@dnephin
Copy link
Contributor Author

dnephin commented Aug 28, 2020

Ah, that's right, thank you. I've asked they try with a more recent release, which uses go1.14.x.

@cagedmantis cagedmantis changed the title [go1.13.15] fatal error: unexpected signal during runtime execution runtime: unexpected signal during runtime execution Aug 30, 2020
@cagedmantis cagedmantis added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 30, 2020
@dnephin
Copy link
Contributor Author

dnephin commented Sep 8, 2020

More details from hashicorp/consul#8558 (comment). They were able to reproduce with go1.14.7. The full log is attached. They also said it occurs only on hosts with Intel Optane connected in RAM mode.

The stack is a bit different, but the errors look similar

fatal error: unexpected signal during runtime execution
fatal error: runtimer: bad p
runtime: unexpected return pc for runtime.sigpanic called from 0x0
fatal error: cleantimers: bad p

consul-agent-kn-0030_2020.09.02_trace.log

@networkimprov
Copy link

cc @aclements @randall77 @prattmic

@gopherbot remove WaitingForInfo

@gopherbot gopherbot removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 8, 2020
@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 8, 2020
@andybons andybons added this to the Unplanned milestone Sep 8, 2020
@ianlancetaylor
Copy link
Contributor

This looks a lot like memory corruption.

@fanatl
Copy link

fanatl commented Sep 9, 2020

Hello to all.
These fatal errors have appeared since version 1.12 and older.
And is repeated only on hosts with Intel optane as RAM.

@davecheney
Copy link
Contributor

All the more reason to determine if the binary contains data races or cgo related memory corruption.

@networkimprov
Copy link

@dnephin have you built the app with the race detector?

@dnephin
Copy link
Contributor Author

dnephin commented Sep 9, 2020

I've been working on running the tests with the race detector enabled. We run some packages with -race, but not all. We don't build binaries with it enabled. Would that help identify the problem?

@networkimprov
Copy link

A data race could cause memory corruption, so building the app with the race detector may help.

@woblerr
Copy link

woblerr commented Feb 20, 2021

@dnephin any new about this problem?

@blanchonvincent
Copy link

@dnephin @woblerr I got the same issues in my code, and it broke since Go 1.14 with the introduction of the goroutine async preemption (I do not have control over the C code I am using, but it looks like it mess up with the signals), so I just disable the async preemption when running my tests/building the binary. Your logs suggests it could be the same issue (crash in the garbage collector that preempt the goroutines). Can you try running your tests with GODEBUG=asyncpreemptoff=1 like GODEBUG=asyncpreemptoff=1 go test ./... to see if it is the cause?

@davecheney davecheney added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 17, 2021
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@networkimprov
Copy link

I previously proposed a way to disable async preemption in all builds of an app; see #40870

It was closed without action, but you might want to comment there to request reconsideration of the idea.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

10 participants