Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: sigprof handler crashes backtracing runtime.nanotime #24925

Closed
heschik opened this issue Apr 18, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@heschik
Copy link
Contributor

commented Apr 18, 2018

If a sigprof is received while runtime.nanotime is executing, the runtime may crash with a stack like this:

runtime: invalid pc-encoded table [...]
runtime.throw(0xc52c894, 0x1c)
  src/runtime/panic.go:622 +0x8a fp=0xc435e35260 sp=0xc435e35240 pc=0x26adb4a
runtime.pcvalue(0x119a4080, 0x15ea31c0, 0xc401b4e585, 0x39ad635, 0xc435e35488, 0x15ea3101, 0x0)
  src/runtime/symtab.go:763 +0x529 fp=0xc435e35310 sp=0xc435e35260 pc=0x26cc029
runtime.funcspdelta(0x119a4080, 0x15ea31c0, 0x39ad635, 0xc435e35488, 0xb0080b1a00000000)
  src/runtime/symtab.go:815 +0x5f fp=0xc435e35380 sp=0xc435e35310 pc=0x26cc4df
runtime.gentraceback(0x26dfd17, 0x7fc8115df708, 0x0, 0xc43e926300, 0x0, 0xc435e356f8, 0x40, 0x0, 0x0, 0x6, ...)
  src/runtime/traceback.go:249 +0x1873 fp=0xc435e35698 sp=0xc435e35380 pc=0x26d38d3
runtime.sigprof(0x26dfd17, 0x7fc8115df708, 0x0, 0xc43e926300, 0xc43b160c00)
  src/runtime/proc.go:3755 +0x3e9 fp=0xc435e35938 sp=0xc435e35698 pc=0x26b81f9
runtime.sighandler(0xc40000001b, 0xc435e35bf0, 0xc435e35ac0, 0xc43e926300)
  src/runtime/signal_sighandler.go:33 +0x678 fp=0xc435e359b8 sp=0xc435e35938 pc=0x26c3218
runtime.sigtrampgo(0x1b, 0xc435e35bf0, 0xc435e35ac0)
  src/runtime/signal_unix.go:349 +0x21c fp=0xc435e35a68 sp=0xc435e359b8 pc=0x26c3c2c

To add the running stack trace to the profile, the handler needs to walk the stack frame-by-frame, and to do that it needs to know how big each stack frame is. The compiler records that information in the pc-to-sp table, but runtime.nanotime does manual stack alignment:

SUBQ $16, SP // Space for results
ANDQ $~15, SP // Align for C code

which isn't (can't be) tracked in the table. When the signal handler walks the stack, it computes the wrong size for that frame, and then reads the wrong place looking for the return address, finding stack garbage instead and crashing.

The problem was introduced in a158382, which shipped in 1.10, so this should be considered for a backport for 1.10.2.

@heschik heschik added this to the Go1.10.2 milestone Apr 18, 2018

@heschik heschik self-assigned this Apr 18, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2018

Dup of #24142?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2018

Well, not a dup as such, but perhaps fixed by https://golang.org/cl/97315.

@heschik

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2018

Nope, it turned out we needed a very small additional change, which I'll mail out shortly.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 18, 2018

Change https://golang.org/cl/107778 mentions this issue: runtime: use saved state in SIGPROF handler for vDSO calls

@jgracenin

This comment has been minimized.

Copy link

commented Jun 13, 2018

Hey, was this ever cherrypicked into an official release?

I unfortunately had a service running with 1.10.1 that crashed a few times daily if I enabled pprof with the same stack trace above. I took the cherrypick from https://golang.org/cl/107778 and it fixed it for good.

I didn't see this CL referenced in the Go 1.10.2 or 1.10.3 milestones. Looking over the source (below), I don't see it there either (or was this abandoned for another way to fix it?)

From runtime/proc.go in 1.10.3 line 3689 - 3691:
if gp == nil || sp < gp.stack.lo || gp.stack.hi < sp || setsSP(pc) { traceback = false }

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2018

@jgracenin This will be fixed in 1.11, but it is not in any earlier releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.