New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testshared: shared libs tests fail on arm64 with segmentation fault #28334

Open
kyoukim opened this Issue Oct 23, 2018 · 11 comments

Comments

Projects
None yet
3 participants
@kyoukim

kyoukim commented Oct 23, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9.4, 1.10.3, 1.11.1, and perhaps more

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="arm64"
GOBIN=""
GOEXE=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/aion1223/go"
GORACE=""
GOROOT="/usr/lib/golang"
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_arm64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build048848957=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

git clone https://github.com/golang/go go
cd go/src
git fetch --all
git checkout -b mygolang 'git rev-list -n 1 go1.11.1'
./all.bash

I have tried go1.11.1 and go1.10.3 in the same way.

What did you expect to see?

The build process terminates successfully after passing all the tests

What did you see instead?

../misc/cgo/testshared
--- FAIL: TestTrivialExecutable (3.33s)
shared_test.go:41: executing ./bin/trivial (trivial executable) failed signal: segmentation fault (core dumped):
--- FAIL: TestDivisionExecutable (0.59s)
shared_test.go:41: executing ./bin/division (division executable) failed signal: segmentation fault (core dumped):
--- FAIL: TestCgoExecutable (1.15s)
shared_test.go:41: executing ./bin/execgo (cgo executable) failed signal: segmentation fault (core dumped):
--- FAIL: TestGopathShlib (3.47s)
shared_test.go:41: executing ./bin/exe (executable linked to GOPATH library) failed signal: segmentation fault (core dumped):
--- FAIL: TestTwoGopathShlibs (3.42s)
shared_test.go:41: executing ./bin/exe2 (executable linked to GOPATH library) failed signal: segmentation fault (core dumped):
--- FAIL: TestThreeGopathShlibs (5.23s)
shared_test.go:41: executing ./bin/exe3 (executable linked to GOPATH library) failed signal: segmentation fault (core dumped):
--- FAIL: TestABIChecking (3.08s)
shared_test.go:861: exe failed, but without line "abi mismatch detected between the executable and libdepBase.so"; got output:
--- FAIL: TestImplicitInclusion (1.35s)
shared_test.go:41: executing ./bin/implicitcmd (running executable linked against library that contains same package as it) failed signal: segmentation fault (core dumped):
--- FAIL: TestInterface (1.93s)
shared_test.go:41: executing ./bin/iface (running type/itab uniqueness tester) failed signal: segmentation fault (core dumped):
--- FAIL: TestGlobal (1.33s)
shared_test.go:41: executing ./bin/global (global executable) failed signal: segmentation fault (core dumped):
2018/10/23 10:44:37 executing go test -installsuffix=8674665223082153551 -linkshared -test.short sync/atomic failed exit status 1:
signal: segmentation fault (core dumped)
FAIL sync/atomic 0.088s
exit status 1
FAIL _/home/aion1223/go/misc/cgo/testshared 43.680s

misc/cgo/testshared failed with segmentation fault. Yes, it exactly looks like the following link:
#24873
However, unlike the issue, my issue is reproducible with all versions of go I know. My issue is reproducible only on Oracle Linux 7.5 for ARM64, which is available as a docker image. I could not reproduce it with the golang:1.11.1 docker container available in Debian. I did see the fix of 24873 is already in the go1.11.1 source code.

@kyoukim

This comment has been minimized.

kyoukim commented Oct 23, 2018

The issue is reproducible with go1.9.4. Only that I had to:
"GOROOT_BOOTSTRAP=/usr/lib/golang ./all.bash" in place of "./all.bash"

The error log was almost the same if not exactly the same.

@ianlancetaylor ianlancetaylor changed the title from misc/cgo: shared libs tests fail on arm64 with segmentation fault to misc/cgo/testshared: shared libs tests fail on arm64 with segmentation fault Oct 23, 2018

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Oct 23, 2018

@cherrymui

This comment has been minimized.

Contributor

cherrymui commented Oct 23, 2018

Which linker (C linker, not Go cmd/link) are you using, bfd linker or gold or lld? Which version? I vaguely remember that some old version of gold linker on ARM64 doesn't work well with this.

@kyoukim

This comment has been minimized.

kyoukim commented Oct 24, 2018

Glibc in Oracle Linux seems to be based on 2.17. However, there seems a bunch of backports to it from the latest glibc upstream. The linker seems to be "Gold."

What I suspect is that this bug might be reproducible with new glibcs, not old glibcs. The issue 24873 says that it has problems with glibc 2.27. The golang:1.11.1 docker container has no problem but the glibc is as old as 2.24. I am pretty sure that this new Oracle Linux package has patches backported even from the latest release of glibc.

@kyoukim

This comment has been minimized.

kyoukim commented Oct 25, 2018

I have looked into this bug a bit more. Firstly, I modified the following file, and ran ./run.bash -run testshared:

$ git diff shared_test.go
diff --git a/misc/cgo/testshared/shared_test.go b/misc/cgo/testshared/shared_test.go
index 846a271..7b03c95 100644
--- a/misc/cgo/testshared/shared_test.go
+++ b/misc/cgo/testshared/shared_test.go
@@ -46,14 +46,15 @@ func run(t *testing.T, msg string, args ...string) {
 // t.Fatalf if the command fails.
 func goCmd(t *testing.T, args ...string) {
        newargs := []string{args[0], "-installsuffix=" + suffix}
-       if testing.Verbose() {
-               newargs = append(newargs, "-x")
-       }
+
+       newargs = append(newargs, "-x")
+       newargs = append(newargs, "-work")        
+
        newargs = append(newargs, args[1:]...)
        c := exec.Command("go", newargs...)
        var output []byte
        var err error
-       if testing.Verbose() {
+       if true {
                fmt.Printf("+ go %s\n", strings.Join(newargs, " "))
                c.Stdout = os.Stdout
                c.Stderr = os.Stderr

The log is here as I cannot find a menu to attach a file:
Testshared Log file

All the broken test cases are an executable. They are built with "-linkshared" option. Sometimes, they have "-buildmode=shared," and sometimes no "-buildmode."

Interestingly, when "-buildmode=pie" is present, I did not see a segmentation fault. It seems that two test benches have the option. More interestingly, a test bench called "trivial" is built twice: once with -buildmode=pie and once without it. It gets segmentation fault only when -buildmode=pie was not given.

@kyoukim

This comment has been minimized.

kyoukim commented Oct 30, 2018

After binary searching the glibc of the distro, I could see the backport of this patch caused the issue:

backport of Sep 15, 2017 commit 6cd380dd366d728da9f579eeb9f7f4c47f48e474
Author: Wang Boshi

eXecute-Only Memory (XOM) is a protection mechanism against some ROP
attacks. XOM sets the code as executable and unreadable, so the access
to any data, like literal pools, in the code section causes the fault
with XOM. The compiler can disable literal pools for C source files,
but not for assembly files, so I use movz/movk instead of literal pools
in start.S for XOM.

I add MOVL macro with movz/movk instructions like movl pseudo-instruction
in armasm, and use the macro instead of literal pools.

* sysdeps/aarch64/start.S: Use MOVL instead of literal pools.
* sysdeps/aarch64/sysdep.h (MOVL): Add MOVL macro.

gitdiff.txt

@kyoukim

This comment has been minimized.

kyoukim commented Nov 2, 2018

https://patchwork.ozlabs.org/patch/810876/

The patch above to glibc is applied to the distro's glibc, and triggered the issue. In start.S, the "MOVL" macro defined in sysdeps/aarch64/sysdep.h is used. The macros are like this:

/* Load an immediate into R.
   Note R is a register number and not a register name.  */
#ifdef __LP64__
# define MOVL(R, NAME)					\
	movz	PTR_REG (R), #:abs_g3:NAME;		\
	movk	PTR_REG (R), #:abs_g2_nc:NAME;		\
	movk	PTR_REG (R), #:abs_g1_nc:NAME;		\
	movk	PTR_REG (R), #:abs_g0_nc:NAME;
#else
# define MOVL(R, NAME)					\
	movz	PTR_REG (R), #:abs_g1:NAME;		\
	movk	PTR_REG (R), #:abs_g0_nc:NAME;
#endif

In start.S, the _start function is supposed to set x0, whose value is used to branch from __glibc_start_main to a function. The value is set to 0 with the patch; i.e. with the MOVL macro. On the contrary, the value is set to something, which passed the "testshared" tests.

I have tried to build an executable with gcc without -pie. I think the same MOVL macro was used. However, this time, the address is assigned appropriately.

I am not yet sure which component is buggy; runtime linker, assembler, or "go link."

@kyoukim

This comment has been minimized.

kyoukim commented Nov 4, 2018

For the case named "trivial" that fails with a segmentation fault, I did the following after increasing verbosity by touching misc/cgo/testshared/src/*.go:
../bin/go tool dist test -run testshared

The last link command to build the executable was:
/home/aion1223/workspace/goupssrc/pkg/tool/linux_arm64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -installsuffix 6129484611666145821_dynlink -buildmode=exe -buildid=i1c7n4lZlgifimsCpVP_/wk1AcZFYvPpL3_5Mq6D_/uQ2s4EatSVx4VatKIAZJ/i1c7n4lZlgifimsCpVP_ -linkshared -w -extld=gcc $WORK/b001/pkg.a

There, I also passed -v to the "link" and -extldflags "-fuse-ld=bfd"

I think the external linker took the option. This time, I did not see the segmentation fault.

Thus, I guess somehow when gold linker replaces #:abs_g0_nc:main, it fails to use the right value.

Here is the objdump --all /usr/lib64/crt1.o:

SYMBOL TABLE:
0000000000000000 l d .note.ABI-tag 0000000000000000 .note.ABI-tag
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .rodata.cst4 0000000000000000 .rodata.cst4
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000
.note.GNU-stack
0000000000000000 UND 0000000000000000 __libc_csu_fini
0000000000000000 UND 0000000000000000 abort
0000000000000000 g F .text 0000000000000000 _start
0000000000000000 UND 0000000000000000 __libc_csu_init
0000000000000000 UND 0000000000000000 main
0000000000000000 w .data 0000000000000000 data_start
0000000000000000 g O .rodata.cst4 0000000000000004 _IO_stdin_used
-UUU:**--F1 *

RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000018 R_AARCH64_MOVW_UABS_G3 main
000000000000001c R_AARCH64_MOVW_UABS_G2_NC main
0000000000000020 R_AARCH64_MOVW_UABS_G1_NC main
0000000000000024 R_AARCH64_MOVW_UABS_G0_NC main
0000000000000028 R_AARCH64_MOVW_UABS_G3 __libc_csu_init
000000000000002c R_AARCH64_MOVW_UABS_G2_NC __libc_csu_init
0000000000000030 R_AARCH64_MOVW_UABS_G1_NC __libc_csu_init
0000000000000034 R_AARCH64_MOVW_UABS_G0_NC __libc_csu_init
0000000000000038 R_AARCH64_MOVW_UABS_G3 __libc_csu_fini
000000000000003c R_AARCH64_MOVW_UABS_G2_NC __libc_csu_fini
0000000000000040 R_AARCH64_MOVW_UABS_G1_NC __libc_csu_fini
0000000000000044 R_AARCH64_MOVW_UABS_G0_NC __libc_csu_fini
0000000000000048 R_AARCH64_CALL26 __libc_start_main
000000000000004c R_AARCH64_CALL26 abort

I am not a linker expert. However, I do not see that much difference between main and __libc_csu_fini. I guess this might be the combination of the gold linker and go tools.

@kyoukim

This comment has been minimized.

kyoukim commented Nov 6, 2018

Finally, I think I know what's going on here. Also, in my opinion, this issue should be fixed in Go tools.

Here are the summary and justification. In short, the "main" function is NOT in the go.o but in the Go-tool-built .so file. Thus, it seems that the external linker should link them against Scrti.o rather than crti.o. However, the go tool, which seems "go link," invokes gold to link them against crti.o.

This has not happened because the crti.o and Scrti.o had had no difference in terms of this issue. The glibc has changed on the upgrade. The thing is the very upstream glibc as of today still has the glibc change. Therefore, I assume that glibc in multiple distro is moving toward this direction, which would cause the reported issue (#28334). In my opinion, Go link tool needs some change to address the issue.

Here are the details of the problem, and how I reach the conclusion.

Basically, on Oracle Linux docker container for aarch64, which seems OL 7.5 according to the tag, the "misc/cgo/testshared" failed as described in the very first comments in this issue report.

As an example, let's see misc/cgo/testshared/src/trivial. The test lets Go tools build libruntime,sync-atomic.so, and the executable named "trivial." I believe the following two commands are what happened at the top level.

go install -installsuffix=5577006791947779410 -x -work -buildmode=shared runtime sync/atomic
go install -installsuffix=5577006791947779410 -x -work -linkshared trivial

The very last "go link" command to create the executable, "trivial," seems like this:

/home/aion1223/workspace/goupssrc/pkg/tool/linux_arm64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -installsuffix 5577006791947779410_dynlink -buildmode=exe -buildid=BwyXpCAKFz8ETFyLTSbF/Zo5wBg54Kfw5VeeP5Uiu/hieskdUyBctShze1jv11/BwyXpCAKFz8ETFyLTSbF -linkshared -w -extld=gcc $WORK/b001/_pkg_.a

I added -extldflags="-v -Wl,-v" after -extld=gcc. I could see that what is the linker command actually used. It looks as follows:

/usr/bin/ld.gold --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -export-dynamic -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -o ./nopie.a.out /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib6
4/crt1.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtbegin.o -L$PWD -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5 -L/usr/lib/gcc/aarch64-redhat-linux/4
.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../.. -znow -znocopyreloc --compress-debug-sections=zlib-gnu ./go.o -rpath=$PWD -lruntime,sync-atomic -v -l
gcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtend.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crtn.o

Please, note that crti.o is used rather than Scrti.o.

Somehow, I hacked the "go link," so it does not delete the go.o. I found the libruntime,sync-atomic.so by giving -work option.

Seems like go.o does NOT have "main." Instead, the .so has it:

$ objdump -d go.o | egrep -w "<main>:" | wc -l
0
$ objdump -d libruntime,sync-atomic.so | egrep -w "<main>:"
0000000000163338 <main>:
@kyoukim

This comment has been minimized.

kyoukim commented Nov 7, 2018

It appears to me that the problem is what the linker should do if it is requested to link a PIC libmain.so that has THE main function and a non-PIC foo.o that has some utility functions. Regarding the testshared, that appears to be what Go tools do.

$ cat main.c
#include <stdio.h>
extern int foo(int, int);

int main()
{
  printf("%d\n", foo(1, 3));
  return 0;
}

$ cat foo.c
int __attribute__ ((noinline)) foo(int x, int y)
{
  return x & y;
}

Then, I built libmain.so and the non-PIC foo.o as follows:

$ gcc -o main.o -fPIC -c main.c
$ gcc -shared -o libmain.so main.o
$ gcc -o foo.o -c foo.c

Following that, the executable is built like this:

$ gcc -fuse-ld=gold -o a.out -lmain -L$PWD -Wl,-v,-rpath=$PWD foo.o
collect2 version 4.8.5 20150623 (Red Hat 4.8.5-28.0.4)
/usr/bin/ld.gold --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -o a.out /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crt1.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtbegin.o -L/home/aion1223/shared -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../.. -lmain -v -rpath=/home/aion1223/shared foo.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtend.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crtn.o
GNU gold (version 2.27-28.base.0.2.el7_5.1) 1.12
$ ./a.out 
Segmentation fault (core dumped)

What is the correct behavior of a linker?

If main were in the non-PIC .o and linked against the PIC .so file, both linker used crti.o and have no problem:

$ cat foo.c 
int __attribute__ ((noinline)) foo(int x, int y)
{
  return x & y;
}
$ cat main.c
#include <stdio.h>
extern int foo(int, int);

int main()
{
  printf("%d\n", foo(1, 3));
  return 0;
}

$ gcc -fPIC -o foo.o -c foo.c
$ gcc -shared -o libfoo.so foo.o
$ gcc -o main.o -c main.c
$ gcc -fuse-ld=gold -o a.out -lfoo -L$PWD -Wl,-v,-rpath=$PWD main.o
collect2 version 4.8.5 20150623 (Red Hat 4.8.5-28.0.4)
/usr/bin/ld.gold --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -o a.out /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crt1.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtbegin.o -L/home/aion1223/shared2 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../.. -lfoo -v -rpath=/home/aion1223/shared2 main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtend.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crtn.o
GNU gold (version 2.27-28.base.0.2.el7_5.1) 1.12
$ ./a.out 
1
$ gcc -fuse-ld=bfd -o a.out -lfoo -L$PWD -Wl,-v,-rpath=$PWD main.o
collect2 version 4.8.5 20150623 (Red Hat 4.8.5-28.0.4)
/usr/bin/ld.bfd --build-id --no-add-needed --eh-frame-hdr --hash-style=gnu -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -o a.out /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crt1.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crti.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtbegin.o -L/home/aion1223/shared2 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../.. -lfoo -v -rpath=/home/aion1223/shared2 main.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-redhat-linux/4.8.5/crtend.o /usr/lib/gcc/aarch64-redhat-linux/4.8.5/../../../../lib64/crtn.o
GNU ld version 2.27-28.base.0.2.el7_5.1
$ ./a.out 
1
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 7, 2018

As mentioned elsewhere, it looks like you have a pure C test case that should be reported at https://sourceware.org/bugzilla.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 7, 2018

For the record this was reported at https://sourceware.org/bugzilla/show_bug.cgi?id=23870 . Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment