Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs #24758

Open
mvdan opened this issue Apr 7, 2018 · 20 comments
Open

misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs #24758

mvdan opened this issue Apr 7, 2018 · 20 comments

Comments

@mvdan
Copy link
Member

@mvdan mvdan commented Apr 7, 2018

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1b pc=0xe56e4]

runtime stack:
runtime.throw(0x16739f, 0x2a)
	/workdir/go/src/runtime/panic.go:598 +0x54
runtime.sigpanic()
	/workdir/go/src/runtime/signal_unix.go:372 +0x22c

goroutine 5 [syscall, locked to thread]:
runtime.cgocall(0xe5678, 0x429f78, 0xe5208)
	/workdir/go/src/runtime/cgocall.go:128 +0x64 fp=0x429f60 sp=0x429f48 pc=0x123c0
_/workdir/go/misc/cgo/testtls._Cfunc_getTLS(0x0)
	_cgo_gotypes.go:43 +0x38 fp=0x429f74 sp=0x429f60 pc=0xe5134
_/workdir/go/misc/cgo/testtls.testTLS(0x478120)
	/workdir/go/misc/cgo/testtls/tls.go:21 +0x34 fp=0x429fb0 sp=0x429f74 pc=0xe5214
_/workdir/go/misc/cgo/testtls.TestTLS(0x478120)
	/workdir/go/misc/cgo/testtls/tls_test.go:12 +0x1c fp=0x429fb8 sp=0x429fb0 pc=0xe50ec
testing.tRunner(0x478120, 0x168c34)
	/workdir/go/src/testing/testing.go:791 +0xac fp=0x429fe4 sp=0x429fb8 pc=0xb3074
runtime.goexit()
	/workdir/go/src/runtime/asm_arm.s:840 +0x4 fp=0x429fe4 sp=0x429fe4 pc=0x63184
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:838 +0x240

goroutine 1 [runnable]:
testing.(*T).Run(0x478120, 0x1604a5, 0x7, 0x168c34, 0x410540)
	/workdir/go/src/testing/testing.go:839 +0x260
testing.runTests.func1(0x478090)
	/workdir/go/src/testing/testing.go:1081 +0x50
testing.tRunner(0x478090, 0x44bef8)
	/workdir/go/src/testing/testing.go:791 +0xac
testing.runTests(0x40c040, 0x206598, 0x1, 0x1, 0x458100)
	/workdir/go/src/testing/testing.go:1079 +0x21c
testing.(*M).Run(0x458100, 0x0)
	/workdir/go/src/testing/testing.go:996 +0x134
main.main()
	_testmain.go:42 +0x12c
exit status 2
FAIL	_/workdir/go/misc/cgo/testtls	0.046s

Some recent occurences:

https://build.golang.org/log/b7e1728e48a73089be64d42ea3b7e581eeae029c
https://build.golang.org/log/56a9b6b774c15433baff5b11a2f2eeb8aaa936a1

@mvdan mvdan added the Testing label Apr 7, 2018
@josharian
Copy link
Contributor

@josharian josharian commented Apr 7, 2018

cc @FiloSottile

Thanks for filing bugs for the flaky tests, @mvdan. Hope we get back to a stable dashboard soon.

@FiloSottile
Copy link
Member

@FiloSottile FiloSottile commented Apr 7, 2018

This is not my kind of TLS ;) it's Thread Local Storage.

@josharian
Copy link
Contributor

@josharian josharian commented Apr 7, 2018

Hahaha. Oops. :P

cc @bcmills @ianlancetaylor

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Apr 10, 2018

The faulting PC, 0xe56e4, is in C code getTLS:

TEXT getTLS(SB) 
  :0                    0xe56d0                 e59f0014                MOVW 0x14(R15), R0      
  :0                    0xe56d4                 e92d4010                PUSH [R4,R14]           
  :0                    0xe56d8                 e08f0000                ADD R0, R15, R0         
  :0                    0xe56dc                 fa001a0a                BLX 0xebf0c             // __tls_get_addr
  :0                    0xe56e0                 e59f3008                MOVW 0x8(R15), R3       
  :0                    0xe56e4                 e7930000                MOVW (R3)(R0), R0       // <--- faulting PC
  :0                    0xe56e8                 e8bd8010                POP [R4,R15]            
  :0                    0xe56ec                 00120938                AND.S.EQ R8>>R9, R2, R0 
  :0                    0xe56f0                 0000001c                AND.EQ R12<<R0, R0, R0  

Here, we just loaded R3=0x1c from the constant pool. The faulting address is 0x1b, which means R0=-1, which means __tls_get_addr returned -1. I haven't looked into what could cause it. Maybe the TLS was not initialized correctly?

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Apr 18, 2018
@ianlancetaylor ianlancetaylor changed the title cmd/dist: testtls segmentation fault on a linux-arm builder runtime: testtls segmentation fault on a linux-arm builder Jun 14, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 10, 2018
@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills
Copy link
Member

@bcmills bcmills commented Sep 11, 2019

Wow — the last non-trivial change to this test was for issue #6992 back in 2014!

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 11, 2019

@cherrymui, do you know who the right person to look into this further would be?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 11, 2019

I'll look into this (as soon as I can, maybe not this week though).

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 19, 2019

First, I spent some time rediscovered what I found last year (#24758 (comment)). I should have read all the comments first...

Then, by attaching strace I found that when the test passes the TLS accesses are made on the main thread, whereas when it fails the TLS accesses are made on a non-main thread. (at least for all the runs I investigated.)

If I change the test so that it always run on a non-main thread, it fails 100%. The same change works fine on Linux/AMD64.

On all the test logs, the failed one is the one with static linking (https://go.googlesource.com/go/+/refs/tags/go1.13/src/cmd/dist/test.go#1072). Here, we build the C code with -fPIC, which will generate __tls_get_addr for TLS access. Then we link with -static. On AMD64, this will make the linker patch __tls_get_addr to static TLS access. This doesn't happen on ARM. From glibc source code (https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/arm/libc-tls.c;h=22ec3b0c194d0315dc87e73126a0f5c71241346e;hb=0f02b6cfc44af73d4d4363c46b3cbb18b8ff9171#l22), it claims that linker patching is not necessary. But it doesn't seem to be the case.

The following C program fails on ARM but works fine on AMD64:

#include <stdio.h>
#include <pthread.h>

static __thread int x;

void* nonmain() {
	printf("nonmain &x=%p\n", &x);
	printf("nonmain x=%d\n", x);
	return 0;
}

int main() {
	pthread_t t;

	printf("main &x=%p\n", &x);
	printf("main x=%d\n", x);
	pthread_create(&t, 0, nonmain, 0);
	pthread_join(t, 0);
	return 0;
}
# cc -fPIC tls.c -pthread -static
# ./a.out 
main &x=0x844dc
main x=0
nonmain &x=0x13
Segmentation fault (core dumped)

So, it seems that -fPIC and -static just don't work together on ARM...

BTW, if would be easier if the builder has gdb installed. Could we install gdb there?

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 19, 2019

Change https://golang.org/cl/196378 mentions this issue: misc/cgo/testtls: force TLS accesses on a non-main thread

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 19, 2019

Maybe we could work around this by letting the go command scan ldflags, and if it is linking statically on ARM, force recompiling all the C code in non-PIC mode?

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 24, 2019

@cherrymui, would you say this is a bug in the C compiler? If so, should we try upgrading the C compiler on the linux-arm builder?

Or, could we have misc/cgo/testtls probe for this bug explicitly (by, say, running the C test program) and skip the test (or suppress the error) if the test program fails?

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 24, 2019

Yeah, I think this is a bug in the C toolchain, probably either the linker or glibc. The compiler seems ok.

I looked at the source code of glibc and the linker (gold) at tip, and I didn't see anything changed that might fix this. I doubt upgrading the C toolchain would help.

@bcmills bcmills added the NeedsFix label Sep 24, 2019
@bcmills bcmills changed the title runtime: testtls segmentation fault on a linux-arm builder misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs Sep 24, 2019
@bcmills bcmills added the help wanted label Sep 24, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.