Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

Open
egonelbre opened this issue Jan 4, 2021 · 8 comments
Open

runtime: unexpected return pc for runtime.sigpanic called from 0x1 #43496

egonelbre opened this issue Jan 4, 2021 · 8 comments
Milestone

Comments

@egonelbre
Copy link
Contributor

@egonelbre egonelbre commented Jan 4, 2021

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Unable to reproduce.

What operating system and processor architecture are you using (go env)?

It's using golang:1.15.6 docker image as the base for CI testing.

What did you do?

We have automated tests running for changes and one of them failed with a crash.
The race-detector is enabled for the tests.

The relevant part of the crash seems to be:

unexpected fault address 0x20000e25d000
fatal error: fault
runtime: unexpected return pc for runtime.sigpanic called from 0x1
stack: frame={sp:0x7fff71c22dd0, fp:0x7fff71c22e00} stack=[0xc0016e4000,0xc0016e8000)

fatal error: unknown caller pc

runtime stack:
runtime.throw(0x21593f2, 0x11)
	/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.gentraceback(0x4672b3, 0x7fff71c22dd0, 0x0, 0xc00059af00, 0x0, 0x0, 0x7fffffff, 0x7fff71c22ed8, 0x0, 0x0, ...)
	/usr/local/go/src/runtime/traceback.go:273 +0x1aec
runtime.addOneOpenDeferFrame.func1()
	/usr/local/go/src/runtime/panic.go:721 +0x91
runtime.systemstack(0x487db4)
	/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1116


goroutine 8278 [running]:
runtime: unexpected return pc for runtime.systemstack_switch called from 0x0
stack: frame={sp:0x7fff71c22cc0, fp:0x7fff71c22cc8} stack=[0xc0016e4000,0xc0016e8000)

runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:330 fp=0x7fff71c22cc8 sp=0x7fff71c22cc0 pc=0x487ee0
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x74


goroutine 1 [chan receive]:
testing.(*T).Run(0xc000500600, 0x214ec46, 0x9, 0x223ef60, 0x0)
	/usr/local/go/src/testing/testing.go:1169 +0x5f4
testing.runTests.func1(0xc000500600)
	/usr/local/go/src/testing/testing.go:1439 +0xa7
testing.tRunner(0xc000500600, 0xc000121ce0)
	/usr/local/go/src/testing/testing.go:1123 +0x203
testing.runTests(0xc0001b6d80, 0x2daffe0, 0xc, 0xc, 0xbff39cd96b1b208f, 0x1179127ca7c, 0x2e3c840, 0xc0004219b0)
	/usr/local/go/src/testing/testing.go:1437 +0x613
testing.(*M).Run(0xc0001ac300, 0x0)
	/usr/local/go/src/testing/testing.go:1345 +0x3b4
main.main()
	_testmain.go:67 +0x237

...
@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Jan 4, 2021

Maybe related #41099

@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Jan 4, 2021

Going through the stack trace multiple times, the additional thing that stuck out was:

goroutine 8315 [runnable]:
golang.org/x/crypto/argon2.processBlockSSE(0xc002c99800, 0xc002c99400, 0xc00382cc00, 0x1)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20200820211705-5c72a883971a/argon2/blamka_amd64.go:24 +0x62b
golang.org/x/crypto/argon2.processBlockXOR(...)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20200820211705-5c72a883971a/argon2/blamka_amd64.go:59
golang.org/x/crypto/argon2.processBlocks.func1(0x300000000, 0xc000000000, 0xc000326000)
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20200820211705-5c72a883971a/argon2/argon2.go:223 +0x21b
created by golang.org/x/crypto/argon2.processBlocks
	/go/pkg/mod/golang.org/x/crypto@v0.0.0-20200g820211705-5c72a883971a/argon2/argon2.go:234 +0x1bd

And maybe:

goroutine 8530 [runnable]:
strings.Index(0x287dad8, 0x2a, 0x2147016, 0x1, 0x2)
	/usr/local/go/src/strings/strings.go:1024 +0x825
strings.genSplit(0x287dad8, 0x2a, 0x2147016, 0x1, 0x0, 0x2, 0x36, 0x14d, 0xbcc060)
	/usr/local/go/src/strings/strings.go:251 +0x131
strings.Split(...)
	/usr/local/go/src/strings/strings.go:299
@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Jan 4, 2021

My best guess is that a bad BP in asm code can cause traceback to crash. Based on seeing something similar in minio/sha256-simd#55

@toothrot toothrot added this to the Backlog milestone Jan 5, 2021
@toothrot
Copy link
Contributor

@toothrot toothrot commented Jan 5, 2021

@randall77
Copy link
Contributor

@randall77 randall77 commented Jan 5, 2021

Traceback doesn't use BP currently, so that isn't it. (Using BP for traceback is #16638.)

That minio bug involves playing with SP, not BP, which could cause problems. (Our traceback code can't handle variable-sized frames.)

goroutine 8278 does look like frames are missing as it starts with a systemstack_switch, which doesn't seem right. The first frame should be a errgroup.(*Group).Go.func1 frame, I think.
Possibly defer related? @danscales

@danscales
Copy link

@danscales danscales commented Jan 5, 2021

Since sighandler/preparePanic do some manipulation of the stack to make it look like there was a direct call to sigpanic(), is it possible that this is some extremely rare race in the sighander or preparePanic?

From the list of stack traces, we can't definitively pinpoint where the original panic happened. But during panic handling, open-coded defers do need to look at the stack trace of the panic goroutine, hence will cause a problem if the backtrace has an issue.

For reference, here is the source code for errgroup.(*Group).Go(). It seems possible that goroutine 8278 (the one that started at line 54 below) is the one that panicked and has a corrupted stack and hence cause secondary panic in the addOneOpenDeferFrame/gentraceback. It's not clear to me if the original panic or problem is defer-related, or just that the defer-processing is highlighting the bad backtrace.

func (g *Group) Go(f func() error) {
	g.wg.Add(1)

	go func() {  // line 54
		defer g.wg.Done()

		if err := f(); err != nil {
			g.errOnce.Do(func() {
				g.err = err
				if g.cancel != nil {
					g.cancel()
				}
			})
		}
	}()
}
@egonelbre
Copy link
Contributor Author

@egonelbre egonelbre commented Jan 6, 2021

That minio bug involves playing with SP.

Sidenote, is or should there be a linter that verifies that SP is not being used?

@randall77
Copy link
Contributor

@randall77 randall77 commented Jan 6, 2021

@egonelbre Yes, that would make sense. We could also be more conservative with tracebacks during profiling interrupts, as that's the only case where we would try to traceback assembly like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.