Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: SIGSEGV running 'openshift-install version' for release-4.8 using external linking on PPC64LE #45850

Closed
hamzy opened this issue Apr 29, 2021 · 32 comments

Comments

@hamzy
Copy link

@hamzy hamzy commented Apr 29, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.3 linux/ppc64le

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
[root@mihawklp111 installer]# go env
GO111MODULE=""
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_ppc64le"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
GOPPC64="power8"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/go/src/github.com/openshift/installer/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build456481733=/tmp/go-build -gno-record-gcc-switches"

[root@mihawklp111 installer]# cat /etc/redhat-release 
Red Hat Enterprise Linux release 8.3 (Ootpa)
[root@mihawklp111 installer]# uname -a
Linux mihawklp111 4.18.0-240.el8.ppc64le #1 SMP Wed Sep 23 05:08:15 EDT 2020 ppc64le ppc64le ppc64le GNU/Linux

What did you do?

[root@mihawklp111 ~]# cd ~/go/src/github.com/openshift/
[root@mihawklp111 openshift]# git clone https://github.com/openshift/installer.git && cd installer
[root@mihawklp111 installer]# git checkout origin/release-4.8
...
HEAD is now at 52fef9e1d Merge pull request #4875 from multi-arch/ppc64le-link
[root@mihawklp111 installer]# hack/build.sh
[root@mihawklp111 installer]# bin/openshift-install version
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0]
goroutine 1 [running]:
github.com/aws/aws-sdk-go/service/devicefarm.init()
        /root/go/src/github.com/openshift/installer/vendor/github.com/aws/aws-sdk-go/service/devicefarm/errors.go:89 +0x24

What did you expect to see?

The version of the openshift installer.

What did you see instead?

A SIGSEGV

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Does it work with a different version of Go, 1.16 or 1.16.1, or tip of the master branch? Issue #45564 is recently fixed (and not triggered in some earlier versions of Go). It is possible that you don't need to force external linking.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Or different versions of openshift?

@hamzy
Copy link
Author

@hamzy hamzy commented Apr 29, 2021

FYI, I've tried 1.16 and 1.16.1 and they both fail. Hopefully David Benoit can comment. He also reproduced the problem and investigated a little more thoroughly.

And the previous version of openshift installer had the trampoline problem.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

It should not have trampoline problem if you use Go 1.16, 1.16.1, or tip, per #45564 . Could you try that? Thanks.

@dbenoit17
Copy link

@dbenoit17 dbenoit17 commented Apr 29, 2021

@cherrymui I've been able to reproduce this issue, and can also confirm that combining
https://go-review.googlesource.com/c/go/+/314449/ with internal linking does not produce the error. This SIGSEGV seems to occur with external linking only.

@hamzy
Copy link
Author

@hamzy hamzy commented Apr 29, 2021

[root@mihawklp111 installer]# /bin/rm -rf /usr/local/go/; tar -C /usr/local -xzf /root/go1.16.1.linux-ppc64le.tar.gz
[root@mihawklp111 installer]# go version
go version go1.16.1 linux/ppc64le
[root@mihawklp111 installer]# git checkout 911eb7f69a1316c48e883d08644cda0e7304a373
[root@mihawklp111 installer]# hack/build.sh
...
+ go build -mod=vendor -ldflags ' -X github.com/openshift/installer/pkg/version.Raw=unreleased-master-4518-g911eb7f69a1316c48e883d08644cda0e7304a373 -X github.com/openshift/installer/pkg/version.Commit=911eb7f69a1316c48e883d08644cda0e7304a373 -s -w' -tags ' release' -o bin/openshift-install ./cmd/openshift-install
# github.com/openshift/installer/cmd/openshift-install
github.com/ovirt/go-ovirt.(*IconServiceGetRequest).Send: unexpected trampoline for shared or dynamic linking
...
/usr/local/go/pkg/tool/linux_ppc64le/link: too many errors
[root@mihawklp111 installer]# git checkout origin/release-4.8
Previous HEAD position was 911eb7f69 Merge pull request #4868 from wking/stable-4.8
HEAD is now at 52fef9e1d Merge pull request #4875 from multi-arch/ppc64le-link
[root@mihawklp111 installer]# hack/build.sh
...
+ go build -mod=vendor -ldflags ' -X github.com/openshift/installer/pkg/version.Raw=unreleased-master-4520-g52fef9e1d7b765470e493670177eefd73bc27c6f -X github.com/openshift/installer/pkg/version.Commit=52fef9e1d7b765470e493670177eefd73bc27c6f -s -w -linkmode external' -tags ' release' -o bin/openshift-install ./cmd/openshift-install
[root@mihawklp111 installer]# bin/openshift-install version
unexpected fault address 0x2fffffff4170
fatal error: fault     
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2fffffff4170 pc=0x1a69e95c]
                                                     
goroutine 1 [running, locked to thread]:                                                                                                                                                       
runtime.throw(0x217086d3, 0x5)                          
        /usr/local/go/src/runtime/panic.go:1117 +0x5c fp=0xc00149f4f8 sp=0xc00149f4b8 pc=0x152d1d6c                                                                                            
runtime.sigpanic()           
        /usr/local/go/src/runtime/signal_unix.go:741 +0x274 fp=0xc00149f540 sp=0xc00149f4f8 pc=0x152e9984
github.com/aws/aws-sdk-go/service/efs.init()
        /root/go/src/github.com/openshift/installer/vendor/github.com/aws/aws-sdk-go/service/efs/errors.go:194 +0x24 fp=0xc00149f5b0 sp=0xc00149f560 pc=0x18aa1764
runtime.doInit(0x29778280)
        /usr/local/go/src/runtime/proc.go:6265 +0xf0 fp=0xc00149f710 sp=0xc00149f5b0 pc=0x152e29a0
runtime.doInit(0x297c9440)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149f870 sp=0xc00149f710 pc=0x152e2914
runtime.doInit(0x29796cc0)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149f9d0 sp=0xc00149f870 pc=0x152e2914
runtime.doInit(0x2978a3a0)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149fb30 sp=0xc00149f9d0 pc=0x152e2914
runtime.doInit(0x297a3ae0)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149fc90 sp=0xc00149fb30 pc=0x152e2914
runtime.doInit(0x29780700)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149fdf0 sp=0xc00149fc90 pc=0x152e2914
runtime.doInit(0x297a6900)
        /usr/local/go/src/runtime/proc.go:6242 +0x64 fp=0xc00149ff50 sp=0xc00149fdf0 pc=0x152e2914
runtime.main()
        /usr/local/go/src/runtime/proc.go:208 +0x214 fp=0xc00149ffc0 sp=0xc00149ff50 pc=0x152d43d4
runtime.goexit()
        /usr/local/go/src/runtime/asm_ppc64x.s:869 +0x4 fp=0xc00149ffc0 sp=0xc00149ffc0 pc=0x15307b74
...
@cherrymui cherrymui changed the title SIGSEGV running 'openshift-install version' for release-4.8 cmd/link: SIGSEGV running 'openshift-install version' for release-4.8 using external linking on PPC64LE Apr 29, 2021
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Thanks. Would it be possible to try Go 1.15.x ? Thanks.

cc @pmur @laboger

(Maybe openshift can stop forcing external linking after next Go 1.16.x and Go 1.17 release.)

@hamzy
Copy link
Author

@hamzy hamzy commented Apr 29, 2021

Either a trap or a trampoline, you take your pick :)

[root@mihawklp111 installer]# /bin/rm -rf /usr/local/go/; tar -C /usr/local -xzf /root/go1.15.11.linux-ppc64le.tar.gz
[root@mihawklp111 installer]# go version
go version go1.15.11 linux/ppc64le
[root@mihawklp111 installer]# git status                                                                                                                                                        HEAD detached at origin/release-4.8                                                                                                                                                            
nothing to commit, working tree clean
[root@mihawklp111 installer]# hack/build.sh
...
+ go build -mod=vendor -ldflags ' -X github.com/openshift/installer/pkg/version.Raw=unreleased-master-4520-g52fef9e1d7b765470e493670177eefd73bc27c6f -X github.com/openshift/installer/pkg/version.Commit=52fef9e1d7b765470e493670177eefd73bc27c6f -s -w -linkmode external' -tags ' release' -o bin/openshift-install ./cmd/openshift-install
[root@mihawklp111 installer]# bin/openshift-install version
unexpected fault address 0xffffffffffff3a60
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x3 addr=0xffffffffffff3a60 pc=0x1bbf3664]
...

[root@mihawklp111 installer]# git checkout 911eb7f69a1316c48e883d08644cda0e7304a373
Previous HEAD position was 52fef9e1d Merge pull request #4875 from multi-arch/ppc64le-link
HEAD is now at 911eb7f69 Merge pull request #4868 from wking/stable-4.8
[root@mihawklp111 installer]# hack/build.sh
...
+ go build -mod=vendor -ldflags ' -X github.com/openshift/installer/pkg/version.Raw=unreleased-master-4518-g911eb7f69a1316c48e883d08644cda0e7304a373 -X github.com/openshift/installer/pkg/version.Commit=911eb7f69a1316c48e883d08644cda0e7304a373 -s -w' -tags ' release' -o bin/openshift-install ./cmd/openshift-install
# github.com/openshift/installer/cmd/openshift-install
github.com/ovirt/go-ovirt.(*ClusterServiceRemoveRequest).Send: unexpected trampoline for shared or dynamic linking
...
@pmur
Copy link
Contributor

@pmur pmur commented Apr 29, 2021

I am investigating. It looks like the external linker is inserted a TOC relative trampoline to runtime.makemap_small<1> from github.com/aws/aws-sdk-go/service/devicefarm.init I see a few other ld generated trampolines too. We aren't maintaining the TOC pointer in this build configuration.

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

This is the same problem we saw a while ago with some descriptions in #20497 and also a related discussion in #20492. We don't want ld to be inserting trampolines like this because of this issue -- by default Go doesn't maintain r2. The odd thing is that when I look at the openshift-installer binary I don't see any tramp symbols and the text section is not split unless I'm missing something. I'm not sure why we haven't hit this before either.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

@pmur yes, this is also what I'm seeing. I wonder if it works with -buildmode=pie, which will save/restore R2.

@laboger it is not the Go linker inserting trampolines, but the external linker. I do see external linker inserted trampolines.

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

it is not the Go linker inserting trampolines, but the external linker. I do see external linker inserted trampolines.

Yes, the problem is that ld is inserting the tramplines and that is why we are getting the segv. I thought there were some cases where Go would still generate trampolines even with external linking according to the issues I mentioned above, but there are none in this case.

It looks like the text section is not being split like it should, and that is why ld is inserting them. Not sure why that would be. Here is the check before splitting:

if ctxt.Arch.InFamily(sys.PPC64) && ldr.OuterSym(s) == 0 && ctxt.IsExternal()

Is OuterSym !=0 for some reason?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

I think openshift is interesting in that it uses plugins, which makes the Go linker to go with DynlinkingGo path, which makes the Go linker not generating trampolines but wait for the external linker. Maybe checking DynlinkingGo is not what we want.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Is OuterSym !=0 for some reason?

OuterSym != 0 is for symbols from C objects, for which we have a section as an "outer" symbol and individual symbols within that section as "sub" symbols. This is just to skip sub symbols, which just follow the outer symbol, because you cannot break an Outer symbol to smaller pieces.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Yeah, -buildmode=pie works. (the builder machine is really too slow, it takes ~hour to build...)

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

I think there are multiple issues here.

  1. If Go code (not cgo) calls a plugin we need to generate PIC so that it maintains a valid r2. It doesn't need to link with the external linker unless there is cgo. So that leads to some confusion in the code I think.
  2. In this case we have a very large text sections that is not being split so ld is inserting incorrect trampolines.

Are you saying that 2) is related to the use of DynlinkingGo? Or what did you mean when you said we shouldn't be checking DynlinkingGo?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Yes, 2 is related to the use of DynlinkingGo. I think we want to change it to a more precise condition that we actually save/restore R2.

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

Where is that check being done?

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

I'm wondering if Openshift only recently added the use of plugins so that is why we haven't seen these problems before?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

1 is interesting. If Go code uses plugins it must be cgo and must be externally linked, because the plugin implementation internally uses C function dlopen etc..

At compile time, it doesn't know whether we are going to use plugins. Currently, it doesn't generate code that maintains R2. It sounds like plugins on PPC64 is just broken?

If that's the case, maybe the only thing to make it work is that we always save/restore R2, at least at compile time. The linker replace them with NOPs if plugin is not used.

Or we document that if you want to use plugin you must build PIE on PPC64.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Where is that check being done?

At least this one https://tip.golang.org/src/cmd/link/internal/ppc64/asm.go#L661 . There may be others.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

( @hamzy @dbenoit17 in case this helps you -- besides waiting for the fix of #45564 to be released and change to internal linking, another workaround is to use go build -buildmode=pie )

@dbenoit17
Copy link

@dbenoit17 dbenoit17 commented Apr 29, 2021

Thanks, I have tested and can confirm that -buildmode=pie does effectively work around #45564.

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

At compile time, it doesn't know whether we are going to use plugins. Currently, it doesn't generate code that maintains R2. It sounds like plugins on PPC64 is just broken?

In general plugins aren't broken but I agree this setting of DynlinkingGo is not right when there are plugins. But for some reason now we are hitting problems with plugins in large text sections where we haven't before. In the normal case, when the plugin is called it should go through C code and that will set up R2 before invoking the plugin code.

With external linking, I believe the text section should be split if it is too large but in this case it is not and I don't see where that should be related to the setting of DynlinkingGo. If there is a call between text sections there should be a trampoline inserted by Go and that is not happening because of the value of DynlinkingGo. I have to do some experiments to verify my thinking.

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

If that's the case, maybe the only thing to make it work is that we always save/restore R2, at least at compile time. The linker replace them with NOPs if plugin is not used.

We've had enough issues with the value of R2 that this would be fine with me, but that means we'd have to generate R2 at the beginning of each function. However what about existing code from previous releases (or does that all get recompiled automatically).

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Could we always split text sections when external linking (even our code maitains R2)? Is there any problem with that?

or does that all get recompiled automatically

Yes. We never link Go objects that are not built with the same version of the toolchains.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

we'd have to generate R2 at the beginning of each function

Maybe we can teach the linker to skip those instructions and resize the symbol. I guess it doesn't really matter compared to replacing them with NOPs at link time. (We have alignment paddings between functions anyway)

@laboger
Copy link
Contributor

@laboger laboger commented Apr 29, 2021

The way it works now (with -shared, etc.), is that when a call is made to a function it will know if it is a local or global call in terms of using the same R2, so it will skip those if it would be the same. Just extra code in the binary but only executed if needed.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 29, 2021

Okay, thanks. I think the way forward is either fix the DynlinkingGo condition or just drop the condition and always split sections. Up to you.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 29, 2021

Change https://golang.org/cl/315289 mentions this issue: cmd/link/internal: change checks for DynlinkingGo with ppc64le trampolines

@pmur
Copy link
Contributor

@pmur pmur commented May 3, 2021

@gopherbot this affects building openshift with go1.16, please consider a backport.

@gopherbot
Copy link

@gopherbot gopherbot commented May 3, 2021

Backport issue(s) opened: #45927 (for 1.16), #46002 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants