Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: openshift build fails due to another problem with too far branches on ppc64le #20497

Closed
laboger opened this issue May 25, 2017 · 10 comments

Comments

Projects
None yet
3 participants
@laboger
Copy link
Contributor

commented May 25, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.9

What operating system and processor architecture are you using (go env)?

Ubuntu 16.04

What did you do?

Built upstream openshift binaries, which builds using packages from Kubernetes.

What did you expect to see?

Binaries built that work.

What did you see instead?

Some binaries now fail due to a new linker limit being exceeded, which causes the GNU linker to generate plt stubs for long calls. The stub generated in this case by the GNU linker expects that R2 contains the address of the TOC (according to the PPC64LE v2 ABI) but code generated by golang in this case does not maintain that value in R2. As a result the generated address is bad and attempts to branch to a bad address.

I know trying to set up and maintain R2 in this case is not the best solution. I believe the solution as requested here #17917 is the right answer but I need to test it out more. (I had some concerns about this solution which is why I wasn't able to get it into go 1.9).

To reproduce this, clone the repo from here:
https://github.com/openshift/origin.git
cd origin; make all

Try to run one of the tests in:
_output/local/bin/linux/ppc64le with the -h option and the result is a SEGV.

@laboger laboger changed the title cmd/link: another problem with too far branches on ppc64le cmd/link: openshift build fails due to another problem with too far branches on ppc64le May 31, 2017

@laboger laboger changed the title cmd/link: openshift build fails due to another problem with too far branches on ppc64le cmd/link: openshift build fails due to another problem with too far branches on ppc64le, x86 Jun 1, 2017

@laboger laboger changed the title cmd/link: openshift build fails due to another problem with too far branches on ppc64le, x86 cmd/link: openshift build fails due to another problem with too far branches on ppc64le Jun 1, 2017

@laboger

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2017

The failure occurs the first time main.init tries to call a plt_branch sequence like this:

000000001380448c <00000038.plt_branch.36:b9ef>:
1380448c: ff ff 82 3d addis r12,r2,-1
13804490: 10 4e 8c e9 ld r12,19984(r12)
13804494: a6 03 89 7d mtctr r12
13804498: 20 04 80 4e bctr

But even though this is a dynamic executable built by the GNU linker, the code is not generated to load R2 or maintain it. As a result, this sequence expects R2 to be valid when it is not, and jumps off to a bad address, or 0.

This is the first time I've seen a plt_branch to handle this situation, if the branch offset is not too large then it is just a long_branch and that works fine.

I am able to build valid binaries successfully by using -buildmode=pie.

In reading the PPC64 v2 ABI, it sounds like the linker should know whether or not the procedure has initialized R2 and generate the call stub based on that information? If that is true then it shouldn't be creating this type of call stub unless it doesn't have correct information about the procedure making the call. I'm trying to track that down.

@jcajka

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2017

I have hit this too(Fedora27/26, go1.8.3), with bins(origin/genman,origin/openshift) ~300M+ in size and was able to get (at least partially) functional binaries using linkmode internal(for non shared bins, haven't tried shared at all), but I haven't verified that binaries are generally correct(yet). They don't segfault when executed(which seems to be according to your notes).

@laboger

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2017

The workaround for this is to use -bulidmode=pie when building the executables that are nonstatic. If there are static binaries being built, those shouldn't have a problem because they should still work with internal linking if they are big. I've tested with it and seems to work fine. Using -buildmode=pie will ensure that r2 is valid as needed.

A better fix is planned for Go 1.9 and is being tested to verify no regressions are introduced.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 8, 2017

CL https://golang.org/cl/45130 mentions this issue.

@jcajka

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2017

Actually internal linker is not a workaround(at least not in full static Go/default build, surprisingly it is when you have intermediate files left from failed build with external linker, haven't investigated further this path yet), it fails with relocation overflow. With build mode pie(without -linkshared) and go1.8.3 I'm now hitting illegal instruction in genman, investigating further.

seems as jump to ctr which point in to data:

  >│0x242e1ec0                      .long 0x0                                                                                                      
          
#0  0x00000000242e1ec0 in ?? () at /usr/lib/golang/src/runtime/asm_ppc64x.s:109
#1  0x00000000242e1db4 in runtime.rt0_go () at /usr/lib/golang/src/runtime/asm_ppc64x.s:17
#2  0x00003fffb7d2291c in generic_start_main (main=0x242e49b0 <main>, argc=<optimized out>, argv=0x3fffffffeed8, auxvec=0x3ffffffff040, init=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=<optimized out>, fini=<optimized out>) at ../csu/libc-start.c:289
#3  0x00003fffb7d22b18 in __libc_start_main (argc=<optimized out>, argv=<optimized out>, ev=<optimized out>, auxvec=<optimized out>, rtld_fini=<optimized out>, stinfo=<optimized out>, 
    stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:102
#4  0x0000000000000000 in ?? ()
@laboger

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2017

@jcajka Can you provide directions on how you build genman? This was reported to me when building openshift using the directions above, and I don't see how to build genman. If I build openshift with -buildmode=pie using go 1.8.3 it works AFAICT, and it certainly doesn't die in __libc_start_main.

Maybe you noticed, the upstream patch was submitted this morning, and we are testing the go 1.8.3 patch now so that should be available very soon.

@laboger

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2017

I will be changing my solution for this problem as a result of the discussion in #20492.

@jcajka

This comment has been minimized.

Copy link
Contributor

commented Jun 9, 2017

@laboger genman is built using hack/generate-docs.sh, I did some tracing and it seem that following will build genman(invoked in tree of failed build of OS).

export GOPATH=origin/_output/local/go
go build -x -ldflags='-linkmode=internal' -pkgdir origin/_output/local/pkgdir/linux/ppc64le -o genman -tags 'include_gcs include_oss containers_image_openpgp ' github.com/openshift/origin/tools/genman

(path adjustment needed)
Recently I have done most of the investigation by hacking common.sh like this way(which is our proposed workaround)

+        linkshared=""
+      if [[ "${platform}" == "linux/ppc64le" ]]; then
+        local_ldflags+=" -buildmode=pie"
+        linkshared="-linkshared"
+      fi
+
       if [[ ${#nonstatics[@]} -gt 0 ]]; then
-        GOOS=${platform%/*} GOARCH=${platform##*/} go install \
+        GOOS=${platform%/*} GOARCH=${platform##*/} go install $linkshared \
           -pkgdir "${OS_OUTPUT_PKGDIR}/${platform}" \
           -tags "${OS_GOFLAGS_TAGS-} ${!platform_gotags_envvar:-}" \
           -ldflags="${local_ldflags}" \

Dropping the linkshared part will result in the up-mention illegal instruction.

@gopherbot gopherbot closed this in 738739f Jun 14, 2017

@laboger

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2017

A few notes on the workaround. Most of my testing had been done upstream, and I made the wrong assumption that go 1.8 would behave the same. But I've since discovered an upstream fix (not in go 1.8) that is needed to make this workaround work in all cases. Sorry about that confusion. @jcajka

@gopherbot

This comment has been minimized.

Copy link

commented Oct 14, 2017

Change https://golang.org/cl/70837 mentions this issue: [release-branch.go1.8] cmd/link: implement trampolines for ppc64le with ext linking

gopherbot pushed a commit that referenced this issue Oct 25, 2017

[release-branch.go1.8] cmd/link: implement trampolines for ppc64le wi…
…th ext linking

When using golang on ppc64le there have been issues
when building executables that generate extremely large text
sections.  This is due to the call instruction and the limitation
on the offset field, which is smaller than most platforms.  If the
size of the call target offset is too big for the offset field in
the call instruction, then link errors can occur.

The original solution to this problem in golang was to split the
text section when it became too large, allowing the external (GNU)
linker to insert the necessary stub to handle the long call.  That
worked fine until the another size limit for the program size was hit,
where a plt_branch was created instead of a long branch.  In that case
the plt_branch code sequence expects r2 to contain the address of the
TOC, but when golang creates dynamic executables by default
(-buildmode=exe) r2 does not always contain the address of the TOC
and as a result when building programs that reach this extremely
large size, a runtime SEGV or SIGILL can occur due to branching to a bad
address.

When using internal linking, trampolines are generated to handle the
long calls but the text sections are not split.  With this change,
text sections will still be split approrpriately with external linking
but if the buildmode being used does not maintain r2 as the TOC
addresses, then trampolines will be created for those calls.

Fixes #20497

Change-Id: If5400b0f86c2c08e106b332be6db0b259b07d93d
Reviewed-on: https://go-review.googlesource.com/45130
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-on: https://go-review.googlesource.com/70837
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>

@golang golang locked and limited conversation to collaborators Oct 14, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.