Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: internal link issue with outline atomics feature on arm64 #39466

Open
shawn-xdji opened this issue Jun 9, 2020 · 29 comments
Open

cmd/link: internal link issue with outline atomics feature on arm64 #39466

shawn-xdji opened this issue Jun 9, 2020 · 29 comments

Comments

@shawn-xdji
Copy link
Contributor

@shawn-xdji shawn-xdji commented Jun 9, 2020

What version of Go are you using (go version)?

$ go version
tip

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env


GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/xiaji01/.cache/go-build"
GOENV="/home/xiaji01/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/xiaji01/.go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xiaji01/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/xiaji01/util/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/xiaji01/util/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build044102122=/tmp/go-build -gno-record-gcc-switches"

What did you do?

GCC-9.4 introduced the outline atomics support for arm64 and the feature was made default since GCC-10.1. Go's internal linking mode doesn't work well with the feature, here is what have been spotted so far and a sample case, further analysis is ongoing.

With GCC-10.1 and default CGO flags, or GCC-9.4+with an additional "-moutline-atomics" flag, the sample case runs into a link issue with '-linkmode=internal', internal linking tries to resolve undefined external references in libgcc.a, while the problematic variable __aarch64_have_lse_atomic and its init function don't get resolved.

$ go build -ldflags='-linkmode=internal'
libgcc(.text): unknown symbol __aarch64_have_lse_atomics

lse.c

int test_cas_atomic_int (int *ptr, int *expected, int desired)
{
  return __atomic_compare_exchange_n (ptr, expected, desired, 0, 0, 0);
}

mult.c

#include <stdlib.h>
#include <stdio.h>

extern int test_cas_atomic_int (int *ptr, int *expected, int desired);

int test_cgo(int a, int b) {
  int old = a;
  int new = b;

  printf("before: old = %d, new = %d\n", old, new);
  test_cas_atomic_int(&old, &old, new);
  printf("after: old = %d, new = %d\n", old, new);
  return new * old + old;
}

mult.go

package main

/*
extern int test_cgo(int a, int b);
*/
import "C"
import (
 "fmt"
)

func main() {
 a := 3
 b := 5
 c := C.test_cgo(C.int(a), C.int(b))
 fmt.Println("test_cgo:", int(c))
}

What did you expect to see?

Trying to apply GCC-10 and its new LSE features to CGO part, expect they work smoothly.

What did you see instead?

Internal linking doesn't work.

@shawn-xdji shawn-xdji changed the title cmd/link: cmd/link: internal link issue with outline atomics feature on arm64 Jun 9, 2020
@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 9, 2020

Wondering if intentionally loading the lse-init.o from libgcc.a, if it's present and other criteria satisfies, and letting cgo call the init routine at start-up a right way.
@ianlancetaylor

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 9, 2020

more details regarding reproducing the issue:

  1. GCC-9.4+
    A. go build -ldflags='-linkmode=internal', no issue
    B. CGO_CFLAGS="-g -O2 -moutline-atomics" go build -ldflags='-linkmode=internal', error out

  2. GCC-10.1
    A. go build -ldflags='-linkmode=internal', error out
    B. CGO_CFLAGS="-g -O2 -mno-outline-atomics" go build -ldflags='-linkmode=internal', no issue

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 9, 2020

It's important for internal linking to work when using the standard library. We don't consider it important for internal linking to work when using cgo code that is not in the standard library. There are always going to be cases where it fails. This is probably just another of those cases.

Why do you want to use internal linking?

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 9, 2020

@ianlancetaylor It was exposed by ./all.bash, and could be reproduced by running "go test -ldflags='-linkmode=internal' under misc/cgo/test, then we created the sample case to analyze,here is the screen shot of ./all.bash, we didn't set any additional flags/variables.

By the way, given the following error message, is there any way to help identify which test causes the issue, thought it should be issue5242 and failed to reproduce the issue by extracting relevant code into a small case.

Thanks.

===============================================================

# misc/cgo/test
./test.go: In function 'issue5242':
./test.go:376:5: note: parameter passing for argument of type 'bar' changed in GCC 9.1
376 | int issue5242(foo f, bar b) {
       |     ^~~~~~~~~
# misc/cgo/test.test
libgcc(.text): unknown symbol __aarch64_have_lse_atomics
FAIL	misc/cgo/test [build failed]
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 9, 2020

Thanks. I misunderstood. Sounds like arm64 internal linking no longer works with GCC 10, at least if the C code calls any atomic instructions. The simple fix might be to add "arm64" to the cases for which (*tester).internalLink returns false in cmd/dist/test.go.

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 9, 2020

Thanks @ianlancetaylor , have a few more questions.

  1. Did you see any value and feasibility to have internal linking work with the new outline atomics?
  2. Besides the fix to dist test, I wonder if a meaningful message is expected rather than the "unknown symbol" text for user's cgo program, assume we decide not supporting internal linking with the outline atomics being enabled.
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 9, 2020

Looking at the implementation of -moutline-atomics, I now wonder why it doesn't work. The Go linker in internal linking mode will normally link in libgcc, which is whether the required symbols are defined. So it is probably worth somebody looking into exactly what is going wrong.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 9, 2020

I wonder if it has something to do with the symbol __aarch64_have_lse_atomics being marked as hidden in libgcc. It is not surprising to me if there is some bugs in the linker with hidden symbols. In the linker there are some scary TODOs with VisibilityHidden here and there, and, admittedly I don't really understand them. I'll take a careful look... (In the case of internal linking with libgcc.a, I don't know why VisibilityHidden even matters...)

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 9, 2020

Out of curiosity, why GCC 10 defaults to outline atomics? (Seems a weird decision to me. I must have missed something...)

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 10, 2020

Looking at the implementation of -moutline-atomics, I now wonder why it doesn't work. The Go linker in internal linking mode will normally link in libgcc, which is whether the required symbols are defined. So it is probably worth somebody looking into exactly what is going wrong.

Thanks, I'd like to work on it, could you please help assign?

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 10, 2020

I wonder if it has something to do with the symbol __aarch64_have_lse_atomics being marked as hidden in libgcc. It is not surprising to me if there is some bugs in the linker with hidden symbols. In the linker there are some scary TODOs with VisibilityHidden here and there, and, admittedly I don't really understand them. I'll take a careful look... (In the case of internal linking with libgcc.a, I don't know why VisibilityHidden even matters...)

That symbol is of R_AARCH64_ADR_PREL_PG_HI21 and R_AARCH64_ADD_ABS_LO12_NC types, adding support to hidden visibility of these two types in linker could resolve the symbol, but looks like more changes are needed (tried a simple fix and build worked but execution crashed), among of them might be a call to init_have_lse_atomics at proper place.
Wondering if force loading the object file where the symbol resides (lse-init.o) and calling init_have_lse_atomics in cgo start-up code is reasonable.

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 10, 2020

Out of curiosity, why GCC 10 defaults to outline atomics? (Seems a weird decision to me. I must have missed something...)

// The author's serial check-ins
https://patchwork.ozlabs.org/project/gcc/list/?series=73689

// announcement email
https://lists.linaro.org/pipermail/cross-distro/2020-June/000937.html

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 10, 2020

static void __attribute__((constructor))
init_have_lse_atomics (void)

We need to handle constructors in internal linking now? (I don't think we do currently.) That will be ... interesting.

@ianlancetaylor why do we always statically link in libgcc in internal linking mode? I wonder if we could provide an option for dynamic libgcc?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 10, 2020

In general the way that cmd/link treats libgcc is driven by Windows systems. We don't want to assume that a Go program built for Windows will be run on a system that has a libgcc DLL available.

I suppose we could have an option to link dynamically against libgcc. It's a bit complex as libgcc is broken into two parts, one part that must be statically linked, so the linker is typically invoked with -lgcc_s -lgcc -lgcc_eh (and let's not even get into --as-needed). If we aren't going to invoke the external linker we would still have to figure out which parts of -lgcc we need to pull in.

I agree that it would be annoying to have to support __attribute__((constructor)) when doing internal linking. Bother.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 10, 2020

Thanks, @ianlancetaylor ! Statically linking in libgcc makes sense for Windows.

If we aren't going to invoke the external linker we would still have to figure out which parts of -lgcc we need to pull in.

Okay, it seems this is also a complex option...

It seems init_have_lse_atomics only sets a boolean. @shawn-xdji what happens if this function does not run? Will the program fail or just less efficient?

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 11, 2020

It seems init_have_lse_atomics only sets a boolean. @shawn-xdji what happens if this function does not run? Will the program fail or just less efficient?

@cherrymui The program will be less efficient in general, as a load-exclusive-store-exclusive loop will be emitted even on CPUs having the LSE feature if the init function is skipped.

@erifan
Copy link
Contributor

@erifan erifan commented Jun 11, 2020

what happens if this function does not run? Will the program fail or just less efficient?

The symbol __aa64_have_atomics is the same as the implementation mechanism and function of arm64HasATOMICS in Go. Choose different versions of certain atomic functions by checking whether the hardware supports LSE.

If we don't guarantee that any cgo code written by the user will work in internal linking mode, maybe we can find the problematic test and add a cgo option -mno-outline-atomics to workaround this problem.

But libgcc.a seems to contain this symbol, 0000000000000000 B __aarch64_have_lse_atomics
I don't understand why the linker can't find it. There may be some work to do related to hidden symbols for the internal linker ? I am not sure, just guess.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jun 11, 2020

Thanks. If the program can still work correctly, it seems fine for a first step not to call the constructor. Internal linking is not the default mode so it is okay to be less efficient.

@shawn-xdji @erifan if I understand correctly, this only fails in tests, not actually standard library cgo packages (net, os/user, etc.) and commands (cmd/go, etc.)? If so, then it seems fine to skip the test or passing additional flag.

I don't understand why the linker can't find it. There may be some work to do related to hidden symbols for the internal linker ? I am not sure, just guess.

As I commented earlier (#39466 (comment)), it is probably a bug in handling of hidden symbols.

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 12, 2020

Thanks @cherrymui ,I'm submitting a change to disable the internal linking for cgo test first.

@gopherbot
Copy link

@gopherbot gopherbot commented Jun 15, 2020

Change https://golang.org/cl/237858 mentions this issue: cmd/dist: fix internal linking check for arm64

@jcajka
Copy link
Contributor

@jcajka jcajka commented Jun 24, 2020

This is still observable when building go from master/go1.15beta1 on arm64(with gcc-10), failing the tests in ./all.bash. Is there a plan to fix/workaround this issue for go1.15?

# misc/cgo/test.test
libgcc(.text): unknown symbol __aarch64_have_lse_atomics
FAIL	misc/cgo/test [build failed]
@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 24, 2020

Hi @jcajka , a new patchset as per Cherry's suggestion is ready, but I was not able to have it upstreamed before out of office this week, hopefully it could be under review early next week.
Could you please change CGO_CFLAGS to add -mno-outline-atomics as a workaround for now? Thanks.

@jcajka
Copy link
Contributor

@jcajka jcajka commented Jun 24, 2020

@shawn-xdji thank you :). I will try it.

@jcajka
Copy link
Contributor

@jcajka jcajka commented Jun 25, 2020

works for me

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jun 29, 2020

Thanks @jcajka, could you please help to verify the Patchset 3 of https://go-review.googlesource.com/c/go/+/237858 as well? Thanks a lot.

@jcajka
Copy link
Contributor

@jcajka jcajka commented Jun 30, 2020

@shawn-xdji sorry for long notice. I have tested it on Fedora 32(gcc10/binutils2.34), rockpro64 sbc and it resolves the test issue and I haven't seen any regressions. I haven't tested it much past the GC compile/testsuite.

On that note I have missed this issue originally in Fedora 32(go1.14)by mistake and honestly I haven't seen it in the wild(Fedora packages) only in the testsuite. But still it would be great to see it fixed in go1.15(next beta/rc) and I would also much appreciate fix in the upcoming go1.14 point release.

Do you plan back port for go1.14?

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jul 1, 2020

Thanks @jcajka, thanks for helping verify the patch, I'm working on a simplified version as per Cherry's comment, personally I think backport is needed but would like the owners @cherrymui @ianlancetaylor to make the call.

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jul 3, 2020

Enclose a single-file test-case: https://play.golang.org/p/J-AleD6Zeu6

@shawn-xdji
Copy link
Contributor Author

@shawn-xdji shawn-xdji commented Jul 3, 2020

Test7978 in misc/cgo/test is the only one case that experiences the problem now, and the test commands which will fail at build phase are:

  1. GOFLAGS=-ldflags=-linkmode=internal go test -tags=internal
  2. go test -buildmode=pie -ldflags=-linkmode=internal

I've tried two fixes:

  1. to isolate the test-case in separate files and guarded under build constraints, it's doable for command 1, but for command 2 looks like a new tag is needed.
  2. utilizing CGO_CFLAGS to pass -mno-outline-atomics for the two commands on arm64 if the C compiler supports the option.

@cherrymui and @ianlancetaylor, I'm sending an updated patch having the fix 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.