Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "traceback did not unwind completely" #62326

Closed
potuz opened this issue Aug 28, 2023 · 12 comments
Closed

runtime: "traceback did not unwind completely" #62326

potuz opened this issue Aug 28, 2023 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@potuz
Copy link

potuz commented Aug 28, 2023

$ go version
go version go1.21.0 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/heluani/.cache/go-build'
GOENV='/home/heluani/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/heluani/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/heluani/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.0'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/heluani/Documents/code/ethereum/gohashtree/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1443768762=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The tests in https://github.com/prysmaticlabs/gohashtree fail with a

fatal error: traceback did not unwind completely

Steps to reproduce:

  1. clone the repo
  2. go test .

What did you expect to see?

Tests pass

Culprit

After git bisect the commit that broke the package is https://go-review.googlesource.com/c/go/+/458218

which creates the undwinder.

@potuz potuz changed the title affected/package: prysmaticlabs/gohashtree affected/package: prysmaticlabs/gohashtree "traceback did not unwind completely" Aug 28, 2023
@bcmills bcmills added compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 28, 2023
@bcmills
Copy link
Contributor

bcmills commented Aug 28, 2023

#62182 tracks a similar failure mode on openbsd/arm64, which may or may not be related.

@bcmills
Copy link
Contributor

bcmills commented Aug 28, 2023

After git bisect the commit that broke the package is https://go-review.googlesource.com/c/go/+/458218

(attn @aclements @prattmic)

@aclements
Copy link
Member

Thanks for the bisect!

Reproduced with:

git clone https://github.com/prysmaticlabs/gohashtree
cd gohashtree
GOTOOLCHAIN=go1.21.0 go test

The crash happens in the very first test run, TestHash/hash_1_block, while we're doing a copystack. The one frame we can see is:

goroutine 19 [copystack]:
github.com/prysmaticlabs/gohashtree._hash(0xc0000da080, {0x725c20, 0x2, 0x40}, 0x1)
	/home/austin/tmp/gohashtree/hash_amd64.s:775 +0x18fd5 fp=0xc00009fe58 sp=0xc00009fe50 pc=0x58f6f5
created by testing.(*T).Run in goroutine 18
	/home/austin/r/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.21.0.linux-amd64/src/testing/testing.go:1648 +0x3ad

We appear to reach this via TestHash -> (testing goroutine) -> TestHash.func1 -> Hash -> _hash. _hash is an assembly function, which suggests this might not be a bug in the Go runtime. However, 775 is the TEXT line for _hash, and I confirmed that pc 0x58f6f5 is the instruction immediately following the CALL to morestack_noctxt, so it appears we haven't actually entered any user assembly yet.

@aclements aclements changed the title affected/package: prysmaticlabs/gohashtree "traceback did not unwind completely" runtime: "traceback did not unwind completely" Aug 28, 2023
@aclements
Copy link
Member

On entry to (*unwinder).init -> resolveInternal:
pc = 0x58f6f5, continpc = 0x0, lr = 0x0, sp = 0xc00007ee50, fp = 0x0,

(gdb) x/16gx 0xc00007ee50
0xc00007ee50:	0x00000000005758ec	0x000000c0000001a0
0xc00007ee60:	0x0000000000725c20	0x0000000000000002
0xc00007ee70:	0x0000000000000040	0x0000000000000001
0xc00007ee80:	0x000000c00004a6a8	0x000000000044f0a9
0xc00007ee90:	0x0000000000000020	0x00000000005a68a0
0xc00007eea0:	0x0000000000000001	0x000000c00004a760
0xc00007eeb0:	0x0000000000590c30	0x000000c0000001a0
0xc00007eec0:	0x00000000006ffe38	0x000000000071fc40
(gdb) x/i 0x00000000005758ec
   0x5758ec <github.com/prysmaticlabs/gohashtree.Hash+108>:	xorps  %xmm15,%xmm15

Looks reasonable.

resolveInternal computes frame.fp = 0xc00007ee58 because the spdelta is 0 at this point, which is correct.

However, resolveInternal then sees that FuncFlagSPWrite is set, so it sets frame.lr = 0, which terminates the traceback after this frame, leading to the incomplete unwind.

That leads to two question:

  1. Why is this function marked SPWrite? I don't actually see any SP writes.
  2. What did the old traceback implementation do differently that caused it not to panic in the same way?

@aclements
Copy link
Member

Why is this function marked SPWrite? I don't actually see any SP writes.

Oops, not sure how I missed it, but it clearly does write SP: https://github.com/prysmaticlabs/gohashtree/blob/aafd8b3ca202c8010581affc92aee966cc01ef79/hash_amd64.s#L2055

So, while the current rules around this are a little unfortunate, the unwinder is actually correctly failing. At the moment, we have no way to identify just parts of functions where SP has been adjusted in a way we don't understand (possibly we should), which means the unwinder assumes it can't know anything about the frame size anywhere in the function.

One way to fix this in gohashtree would be to move the "shani" code to its own NOSPLIT function called only from _hash (possibly splitting just after loading the FP-relative arguments into registers; since it would only be called by assembly and would be NOSPLIT, it doesn't have to follow the Go conventions). Ideally, this new function would declare enough stack space so it can align the stack up rather than down, in order to guarantee that Go understands the maximum stack in play. In practice, it looks like it can grow the stack at most 16 bytes, which is going to fit in the NOSPLIT zone, so it's probably fine to align it down.

Now I'm really curious why the old unwinder didn't fail in the same way.

@aclements
Copy link
Member

Now I'm really curious why the old unwinder didn't fail in the same way.

Ah hah. The old unwinder ignored SPWRITE on the innermost frame. The new unwinder rearranged these conditions slightly so it doesn't immediately panic if it sees SPWRITE on the innermost frame, but it does leave frame.lr set to 0, causing a later panic.

We should recreate the old behavior. I think the old condition was brittle, though (e.g., that's definitely the wrong behavior if you're unwinding from a trap that landed in an SPWRITE function). I'll have to figure out what the right condition is.

@mknyszek mknyszek moved this from Todo to In Progress in Go Compiler / Runtime Aug 30, 2023
@mknyszek mknyszek added this to the Backlog milestone Aug 30, 2023
@WGH-
Copy link

WGH- commented Sep 5, 2023

FWIW, also happens with github.com/minio/sha256-simd@v0.1.1 (the problematic assembly runs only on AVX512-enabled CPUs). Old version, I know, so I fixed this with dependency bump, but still.

@aclements
Copy link
Member

#62182 tracks a similar failure mode on openbsd/arm64, which may or may not be related.

I don't believe that's the same issue because this issue should always manifest as a traceback with a single frame, whereas it's printing multiple frames on the crashing goroutine in #62182.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/525835 mentions this issue: runtime: ignore SPWrite on innermost traceback frame

@aclements
Copy link
Member

@gopherbot , please backport to Go 1.21.

This causes the runtime to crash when an assembly function that modifies SP grows the stack or is preempted for GC. This should be pretty rare, but the crash is difficult to work around.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #62464 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@prestonvanloon
Copy link

Confirming that this resolves prysmaticlabs/gohashtree#13

@golang golang locked and limited conversation to collaborators Sep 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants