Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: missing deferreturn on linux/ppc64le #39049

Closed
4a6f656c opened this issue May 13, 2020 · 27 comments
Closed

runtime: missing deferreturn on linux/ppc64le #39049

4a6f656c opened this issue May 13, 2020 · 27 comments
Assignees
Labels
Milestone

Comments

@4a6f656c
Copy link
Contributor

@4a6f656c 4a6f656c commented May 13, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/ppc64le

Does this issue reproduce with the latest release?

Yes, the same issue exists on tip.

This is a regression between Go 1.13 and Go 1.14, presumably either due to the introduction of open coded defers, or due to a bug that is now being triggered.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/home/jsing/.cache/go-build"
GOENV="/home/jsing/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jsing/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/jsing/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/jsing/src/go/pkg/tool/linux_ppc64le"
GCCGO="gccgo"
GOPPC64="power8"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build042050806=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This was initially observed when trying to run tests in a large code base on linux/ppc64le. In order to reproduce the issue, a panic() and defer() needs to be run at a high PC - I've written a Go program (https://play.golang.org/p/CmKwSyteWhX) that produces a Go program that triggers this issue.

Save https://play.golang.org/p/CmKwSyteWhX as gen.go then run:

$ go run gen.go && go run crash/main.go

What did you expect to see?

$ go run gen.go && go run crash/main.go
panic: blah

goroutine 1 [running]:
main.f2()
        /home/jsing/tmp/crash/crash/main.go:20 +0x7c
main.f1()
        /home/jsing/tmp/crash/crash/main.go:16 +0x3c
main.main()
        /home/jsing/tmp/crash/crash/main.go:27 +0x24
exit status 2

What did you see instead?

$ go run gen.go && go run crash/main.go
fatal error: missing deferreturn
                                                                                      
runtime stack:           
runtime.throw(0x3ddde3d, 0x13)                                                                                                                                               
        /home/jsing/src/go/src/runtime/panic.go:1116 +0x5c   
runtime.addOneOpenDeferFrame.func1.1(0x3fffd6114a10, 0x0, 0x4260c00)
        /home/jsing/src/go/src/runtime/panic.go:753 +0x258
runtime.gentraceback(0x3dbf7ec, 0xc000084ed0, 0x0, 0xc000000180, 0x0, 0x0, 0x7fffffff, 0x3fffd6114ae0, 0x0, 0x0, ...)
        /home/jsing/src/go/src/runtime/traceback.go:334 +0xea0                                                                                                               
runtime.addOneOpenDeferFrame.func1()                                                  
        /home/jsing/src/go/src/runtime/panic.go:721 +0x8c
runtime.systemstack(0x0)
        /home/jsing/src/go/src/runtime/asm_ppc64x.s:269 +0x94
runtime.mstart()
        /home/jsing/src/go/src/runtime/proc.go:1041

goroutine 1 [running]:                                                                
runtime.systemstack_switch()                                                          
        /home/jsing/src/go/src/runtime/asm_ppc64x.s:216 +0x10 fp=0xc000084db0 sp=0xc000084d90 pc=0x625b0
runtime.addOneOpenDeferFrame(0xc000000180, 0x3dbf7ec, 0xc000084ed0)
        /home/jsing/src/go/src/runtime/panic.go:720 +0x7c fp=0xc000084e00 sp=0xc000084db0 pc=0x3886c
panic(0x3dc9680, 0x3e0d2c0)                                                           
        /home/jsing/src/go/src/runtime/panic.go:929 +0xdc fp=0xc000084ed0 sp=0xc000084e00 pc=0x38eac
main.f2()
        /home/jsing/tmp/crash/crash/main.go:20 +0x7c fp=0xc000084f00 sp=0xc000084ed0 pc=0x3dbf7ec


main.f1()                             
        /home/jsing/tmp/crash/crash/main.go:16 +0x3c fp=0xc000084f30 sp=0xc000084f00 pc=0x3dbf72c
main.main()
        /home/jsing/tmp/crash/crash/main.go:27 +0x24 fp=0xc000084f50 sp=0xc000084f30 pc=0x3dbf834
runtime.main()   
        /home/jsing/src/go/src/runtime/proc.go:203 +0x248 fp=0xc000084fc0 sp=0xc000084f50 pc=0x3bcd8
runtime.goexit()
        /home/jsing/src/go/src/runtime/asm_ppc64x.s:884 +0x4 fp=0xc000084fc0 sp=0xc000084fc0 pc=0x64b64
exit status 2            

Changing n from 4149 to 4148 in gen.go will reduce the number of instructions prior to the defer() and results in the test succeeding.

@randall77
Copy link
Contributor

@randall77 randall77 commented May 13, 2020

@danscales danscales self-assigned this May 13, 2020
@danscales
Copy link

@danscales danscales commented May 13, 2020

@4a6f656c thanks for the repro case!

In trying to reproduce this on a linux-ppc64le-buildlet gomote (after pushing the go source tree, building it with make.sh, and sshing in via 'gomote ssh', I got this error:

~# /workdir/go/bin/go run gen.go
~# /workdir/go/bin/go build crash/main.go

_/root/crash/huge1

crash/huge1/a.s:1: expected '(', found C2
crash/huge1/a.s:1004: expected '(', found C2
crash/huge1/a.s:2007: expected '(', found C2
crash/huge1/a.s:3010: expected '(', found C2
crash/huge1/a.s:4013: expected '(', found C2
crash/huge1/a.s:5016: expected '(', found C2
crash/huge1/a.s:6019: expected '(', found C2
crash/huge1/a.s:7022: expected '(', found C2
crash/huge1/a.s:8025: expected '(', found C2
crash/huge1/a.s:9028: expected '(', found C2
crash/huge1/a.s:10031: expected '(', found C2
asm: too many errors

_/root/crash/huge2

crash/huge2/a.s:1: expected '(', found C2
crash/huge2/a.s:1004: expected '(', found C2
crash/huge2/a.s:2007: expected '(', found C2
crash/huge2/a.s:3010: expected '(', found C2
crash/huge2/a.s:4013: expected '(', found C2
crash/huge2/a.s:5016: expected '(', found C2
crash/huge2/a.s:6019: expected '(', found C2
crash/huge2/a.s:7022: expected '(', found C2
crash/huge2/a.s:8025: expected '(', found C2
crash/huge2/a.s:9028: expected '(', found C2
crash/huge2/a.s:10031: expected '(', found C2
asm: too many errors

Any suggestions on why the as files are not assembling properly? Do you think I'll be able to repro on a gomote buildlet? Thanks!

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 13, 2020

@danscales - ugh, when I've copied and pasted into play.golang.org, the unicode dot (·) got replaced with <C2><B7>:

fmt.Fprintf(buf, "TEXT <C2><B7>f%d(SB),0,$0-0\n", i)

Correcting that should fix the problem. You should be able to repro on a gomote buildlet as long as it's got enough resources.

(I tried attaching gen.go to directly to this issue, but GitHub complained about invalid file types :S)

Edit: I've just updated the play.golang.org links to a version that should have this fixed.

@danscales
Copy link

@danscales danscales commented May 13, 2020

@4a6f656c Thanks for the fixed gen.go file!

I wasn't able to reproduce the problem on Gomote type linux-ppc64le-buildlet. I get the correct output 'panic: blah' and no 'missing deferreturn' message.

Do you think it would repro better on some other buildlet (maybe linux-ppc64le-power9osu -- I'll try that next)? Is there anything unusual about your ppc64le configuration?

@danscales
Copy link

@danscales danscales commented May 13, 2020

Oh, also, I tried both on current tip and on the Go 1.14.2 release, but didn't repro on either.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 14, 2020

@danscales - I don't think there is anything particularly unusual about these machines, but I'll take a closer look later today. You may want to bump `n' up to a larger value (say 8000), as it may be dependent on the system stack allocation.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 14, 2020

@danscales - I just tested on a clean machine and noticed that I'd left gen.go with an n of 4148, setting it to 4149 was insufficient on this host (so presumably memory pressure or OS stack allocation is playing into it). Setting it to 8000 did trigger the issue however.

@danscales
Copy link

@danscales danscales commented May 14, 2020

OK, thanks to the repro case from Joel, I was able to figure out that this was due to the use of trampolines on PPC64 for calling deferreturn for programs with very large text sizes. The current method for finding/marking the deferreturn stub in a function doesn't work with these trampolines. These trampolines currently are possibly used only for arm and ppc64. I will check out the best way to identify these trampolines for deferreturn.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 14, 2020

Trampolines are very simple functions and don't have deferreturn in them. I wonder why they are special. I'll take a look. Let me know if there is anything I could help.

@danscales
Copy link

@danscales danscales commented May 14, 2020

To clarify, the problem is that a call to deferreturn in a normal function (which has defers) is being turned into a trampoline, and therefore the code to recognize the deferreturn stub (for open-coded defers) based on a call to deferreturn is not working. The code for recognizing the deferreturn call is in pcln.go:computeDeferReturn(). We just need to decide if some extra pattern matching (for trampoline calls) is OK, or if we should do some more complicated passing of the relative position of the deferreturn call within the function from the compiler.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 14, 2020

Thanks, @danscales . I think I understand the issue now. I can try to write a CL, if that's helpful.

@gopherbot
Copy link

@gopherbot gopherbot commented May 14, 2020

Change https://golang.org/cl/234105 mentions this issue: cmd/link: detect trampoline of deferreturn call

@danscales
Copy link

@danscales danscales commented May 14, 2020

@cherrymui Oh, that was quick -- thanks for writing a CL. I'll take a look!

@gopherbot gopherbot closed this in 2b70ffe May 15, 2020
@Prashanth684
Copy link

@Prashanth684 Prashanth684 commented Jul 1, 2020

Is there any chance of this being backported to 1.14 ? Thanks!

@mkumatag
Copy link

@mkumatag mkumatag commented Jul 2, 2020

This is a critical bug for kubernetes project, can someone help us cherry-picking this to 1.14 release? @danscales @aclements @4a6f656c @thanm @danscales @jeremyfaller

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 2, 2020

@gopherbot please open a backport to 1.14

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 2, 2020

Backport issue(s) opened: #39991 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@mkumatag
Copy link

@mkumatag mkumatag commented Jul 2, 2020

@gopherbot please open a backport to 1.14

@ianlancetaylor Thanks for a quick response, may I know when 1.14.5 bits will be available?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 2, 2020

We normally do minor releases around the start of each month, which in this case due to the U.S. Independence Day holiday means the beginning of next week. But someone will need to backport the change.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jul 2, 2020

I'll do the backport. I'll note that this won't be a clean cherry-pick, as the fix here is applied to the new linker, while Go 1.14 still uses the old linker. The logic is easy to backport, though.

@danscales
Copy link

@danscales danscales commented Jul 2, 2020

Thanks, @cherrymui . Let me know if I can help in any way.

@danscales danscales assigned cherrymui and unassigned danscales Jul 2, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 3, 2020

Change https://golang.org/cl/240917 mentions this issue: [release-branch.go1.14] cmd/link: detect trampoline of deferreturn call

@dmitshur dmitshur added this to the Go1.15 milestone Jul 6, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Jul 6, 2020

Change https://golang.org/cl/241087 mentions this issue: cmd/oldlink: port bug fixes to old linker

gopherbot pushed a commit that referenced this issue Jul 6, 2020
This CL ports CL 234105 and CL 240621 to the old linker, which
fix critical bugs (runtime crashes).

Updates #39049.
Updates #39927.

Change-Id: I47afc84349119e320d2e60d64b7188a410835d2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/241087
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
@dmitshur dmitshur added the NeedsFix label Jul 10, 2020
gopherbot pushed a commit that referenced this issue Jul 10, 2020
This is a backport of CL 234105. This is not a clean cherry-pick,
as CL 234105 is for the new linker, whereas we still use the old
linker here. This CL backports the logic.

The runtime needs to find the PC of the deferreturn call in a few
places. So for functions that have defer, we record the PC of
deferreturn call in its funcdata.

For very large binaries, the deferreturn call could be made
through a trampoline. The current code of finding deferreturn PC
fails in this case. This CL handles the trampoline as well.

Fixes #39991.
Updates #39049.

Change-Id: I929be54d6ae436f5294013793217dc2a35f080d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/234105
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/240917
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
@runlevel5
Copy link

@runlevel5 runlevel5 commented Jul 16, 2020

I also run into the similar error with kubernetes kubelet tooling which I report upstream at kubernetes/kubelet#13

For my case the app runs correctly with 1.13.x and regressed with 1.14.x. The 1.14.5 does not address this issue unfortunately. Any guidance is greatly appreciated.

cc @cherrymui

@aclements
Copy link
Member

@aclements aclements commented Jul 16, 2020

@runlevel5 , Go 1.14.5 was a security release, so it did not include the fix for this. This should be fixed in the next non-security release (which is very likely to be 1.14.6 and I think should be out soon).

@runlevel5
Copy link

@runlevel5 runlevel5 commented Jul 16, 2020

@aclements thanks for clarification

@runlevel5
Copy link

@runlevel5 runlevel5 commented Jul 21, 2020

Finally 1.14.6 has resolved the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.