New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race detector on ARM64 doesn't fully work #28848

Open
cherrymui opened this Issue Nov 17, 2018 · 8 comments

Comments

Projects
None yet
5 participants
@cherrymui
Contributor

cherrymui commented Nov 17, 2018

What version of Go are you using (go version)?

tip (6797b32)

Does this issue reproduce with the latest release?

No.

What operating system and processor architecture are you using (go env)?

linux/arm64

What did you do?

CL https://go-review.googlesource.com/c/go/+/138675/ adds race detector support to Linux/ARM64. But when I try to enable the race detector tests in cmd/dist (therefore all.bash), as in CL https://go-review.googlesource.com/c/go/+/149967, it doesn't pass. Similar failures are seen when running race.bash on Linux/ARM64.

##### Testing race detector
ok  	runtime/race	30.416s
ok  	flag	1.059s
ok  	net	1.221s
ok  	os	1.136s
ok  	os/exec	3.118s
ok  	encoding/gob	1.062s
runtime: frame sync/atomic.StoreUint32 untyped args 0x40001626b0+0xffffffff80000000
fatal error: missing stackmap

runtime stack:
runtime.throw(0x667912, 0x10)
	/workdir/go/src/runtime/panic.go:608 +0x54 fp=0xffff6b7fe310 sp=0xffff6b7fe2e0 pc=0x4400a4
runtime.getStackMap(0xffff6b7fe510, 0xffff6b7fe650, 0x6ff101, 0x17, 0x6ff138, 0x17, 0xffff6b7fe458, 0x0, 0x0, 0x0)
	/workdir/go/src/runtime/stack.go:1269 +0x4f8 fp=0xffff6b7fe3f0 sp=0xffff6b7fe310 pc=0x45ac08
runtime.adjustframe(0xffff6b7fe510, 0xffff6b7fe638, 0x812020)
	/workdir/go/src/runtime/stack.go:628 +0x64 fp=0xffff6b7fe4b0 sp=0xffff6b7fe3f0 pc=0x458ff4
runtime.tracebackdefers(0x4000001680, 0x673010, 0xffff6b7fe638)
	/workdir/go/src/runtime/traceback.go:77 +0x90 fp=0xffff6b7fe570 sp=0xffff6b7fe4b0 pc=0x462fb0
runtime.adjustdefers(0x4000001680, 0xffff6b7fe638)
	/workdir/go/src/runtime/stack.go:723 +0x3c fp=0xffff6b7fe5a0 sp=0xffff6b7fe570 pc=0x4592ec
runtime.copystack(0x4000001680, 0x1000, 0x1)
	/workdir/go/src/runtime/stack.go:868 +0x10c fp=0xffff6b7fe760 sp=0xffff6b7fe5a0 pc=0x45969c
runtime.newstack()
	/workdir/go/src/runtime/stack.go:1050 +0x264 fp=0xffff6b7fe900 sp=0xffff6b7fe760 pc=0x459aa4
runtime.morestack()
	/workdir/go/src/runtime/asm_arm64.s:310 +0x70 fp=0xffff6b7fe900 sp=0xffff6b7fe900 pc=0x46c9a0

...

exit status 2
FAIL	_/workdir/go/misc/cgo/test	2.564s
2018/11/17 18:45:18 Failed: exit status 1
ok  	flag	1.037s
ok  	os/exec	3.104s

When using the race detector, sync/atomic.StoreUint32 and other sync/atomic functions are functions whose definitions and declarations are not in the same package, which therefore don't have implicit arg maps. It fails when those functions are called in defer statement.

@cherrymui cherrymui added this to the Go1.12 milestone Nov 17, 2018

@zorrorffm

This comment has been minimized.

Contributor

zorrorffm commented Nov 19, 2018

Any thoughts to resolve this issue? I'm willing to implement that.

@gopherbot

This comment has been minimized.

gopherbot commented Nov 20, 2018

Change https://golang.org/cl/149967 mentions this issue: cmd/dist: enable race detector test on Linux/ARM64

@gopherbot

This comment has been minimized.

gopherbot commented Nov 20, 2018

Change https://golang.org/cl/150457 mentions this issue: runtime: add arg maps for sync/atomic functions in ARM64 race mode

@cherrymui

This comment has been minimized.

Contributor

cherrymui commented Nov 20, 2018

I send CL https://go-review.googlesource.com/c/go/+/150457 to add explicit arg maps. Now the race detector tests in all.bash pass. Many tests in race.bash also pass, but some failed with signal: killed. It's just a plain killed without a stack trace, so I don't know what's going on. OOMing? Probably unlikely, as the builder machine has 32 GB of RAM.

##### Testing packages.
ok  	archive/tar	1.883s
ok  	archive/zip	55.800s
ok  	bufio	5.500s
ok  	bytes	70.969s
ok  	compress/bzip2	7.930s
ok  	compress/flate	30.174s
ok  	compress/gzip	6.491s
ok  	compress/lzw	16.037s
ok  	compress/zlib	98.005s
ok  	container/heap	2.433s
ok  	container/list	1.380s
ok  	container/ring	2.147s
ok  	context	2.629s
ok  	crypto	1.255s
ok  	crypto/aes	1.353s
ok  	crypto/cipher	13.757s
ok  	crypto/des	1.847s
ok  	crypto/dsa	1.782s
ok  	crypto/ecdsa	17.403s
ok  	crypto/elliptic	11.333s
ok  	crypto/hmac	3.644s
ok  	crypto/internal/subtle	9.796s
ok  	crypto/md5	1.745s
ok  	crypto/rand	6.669s
ok  	crypto/rc4	14.028s
ok  	crypto/rsa	4.754s
ok  	crypto/sha1	1.149s
ok  	crypto/sha256	9.450s
ok  	crypto/sha512	2.139s
ok  	crypto/subtle	26.328s
ok  	crypto/tls	20.445s
ok  	crypto/x509	14.565s
ok  	database/sql	3.980s
ok  	database/sql/driver	29.752s
ok  	debug/dwarf	1.681s
ok  	debug/elf	32.789s
ok  	debug/gosym	2.254s
ok  	debug/macho	1.506s
ok  	debug/pe	2.596s
ok  	debug/plan9obj	2.422s
ok  	encoding/ascii85	4.809s
ok  	encoding/asn1	76.031s
ok  	encoding/base32	1.812s
ok  	encoding/base64	10.587s
ok  	encoding/binary	10.628s
ok  	encoding/csv	2.312s
ok  	encoding/gob	3.010s
ok  	encoding/hex	1.870s
ok  	encoding/json	25.972s
ok  	encoding/pem	1.455s
ok  	encoding/xml	1.835s
ok  	errors	1.099s
ok  	expvar	1.548s
ok  	flag	3.437s
ok  	fmt	76.130s
ok  	go/ast	2.069s
ok  	go/build	41.921s
ok  	go/constant	1.944s
ok  	go/doc	2.624s
ok  	go/format	2.454s
ok  	go/importer	76.030s
ok  	go/internal/gccgoimporter	1.644s
signal: killed
FAIL	go/internal/gcimporter	55.711s
ok  	go/internal/srcimporter	104.031s
ok  	go/parser	33.370s
ok  	go/printer	16.614s
ok  	go/scanner	2.469s
ok  	go/token	4.723s
ok  	go/types	111.766s
ok  	hash	1.419s
ok  	hash/adler32	2.346s
ok  	hash/crc32	1.672s
ok  	hash/crc64	1.219s
ok  	hash/fnv	1.424s
ok  	html	1.413s
ok  	html/template	3.088s
ok  	image	6.471s
ok  	image/color	1.299s
ok  	image/draw	4.784s
ok  	image/gif	75.468s
ok  	image/jpeg	12.432s
ok  	image/png	3.299s
ok  	index/suffixarray	1.627s
ok  	internal/cpu	2.110s
ok  	internal/fmtsort	1.764s
ok  	internal/poll	41.782s
ok  	internal/singleflight	5.417s
signal: killed
FAIL	internal/trace	58.495s
ok  	internal/traceparser	94.211s
ok  	internal/traceparser/filebuf	4.297s
ok  	io	3.624s
ok  	io/ioutil	2.128s
ok  	log	2.030s
ok  	log/syslog	35.747s
ok  	math	1.868s
ok  	math/big	93.163s
ok  	math/bits	3.254s
ok  	math/cmplx	1.842s
ok  	math/rand	95.677s
ok  	mime	2.156s
ok  	mime/multipart	34.477s
ok  	mime/quotedprintable	8.616s
ok  	net	55.056s
ok  	net/http	24.772s
ok  	net/http/cgi	7.925s
ok  	net/http/cookiejar	1.715s
ok  	net/http/fcgi	1.537s
ok  	net/http/httptest	2.308s
ok  	net/http/httptrace	2.024s
ok  	net/http/httputil	2.022s
ok  	net/http/internal	3.499s
ok  	net/http/pprof	3.737s
ok  	net/internal/socktest	2.256s
ok  	net/mail	1.198s
ok  	net/rpc	1.672s
ok  	net/rpc/jsonrpc	1.660s
ok  	net/smtp	1.687s
ok  	net/textproto	30.901s
ok  	net/url	2.934s
ok  	os	95.448s
ok  	os/exec	27.750s
ok  	os/signal	95.985s
ok  	os/user	4.281s
ok  	path	3.215s
ok  	path/filepath	4.899s
ok  	reflect	16.060s
ok  	regexp	16.286s
ok  	regexp/syntax	44.480s
ok  	runtime	63.619s
ok  	runtime/debug	9.030s
ok  	runtime/internal/atomic	10.660s
ok  	runtime/internal/math	8.720s
ok  	runtime/internal/sys	1.130s
signal: killed
FAIL	runtime/pprof	87.301s
ok  	runtime/pprof/internal/profile	3.957s
ok  	runtime/race	122.535s
ok  	runtime/trace	21.935s
ok  	sort	11.344s
ok  	strconv	76.323s
ok  	strings	76.240s
ok  	sync	89.754s
ok  	sync/atomic	37.453s
ok  	syscall	3.390s
ok  	testing	76.249s
ok  	testing/quick	41.821s
ok  	text/scanner	1.149s
ok  	text/tabwriter	2.422s
ok  	text/template	4.365s
ok  	text/template/parse	2.361s
ok  	time	7.312s
ok  	unicode	2.067s
ok  	unicode/utf16	1.444s
ok  	unicode/utf8	2.151s
ok  	vendor/golang_org/x/crypto/chacha20poly1305	5.835s
ok  	vendor/golang_org/x/crypto/cryptobyte	1.528s
ok  	vendor/golang_org/x/crypto/curve25519	4.389s
ok  	vendor/golang_org/x/crypto/hkdf	1.599s
ok  	vendor/golang_org/x/crypto/internal/chacha20	4.854s
ok  	vendor/golang_org/x/crypto/poly1305	2.623s
ok  	vendor/golang_org/x/net/dns/dnsmessage	2.472s
ok  	vendor/golang_org/x/net/http/httpguts	2.482s
ok  	vendor/golang_org/x/net/http/httpproxy	1.257s
ok  	vendor/golang_org/x/net/http2/hpack	1.570s
ok  	vendor/golang_org/x/net/idna	2.296s
ok  	vendor/golang_org/x/net/nettest	6.203s
ok  	vendor/golang_org/x/text/transform	2.296s
ok  	vendor/golang_org/x/text/unicode/norm	1.206s
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 20, 2018

If it's not an OOM, the second most common cause for this is getting a SIGSEGV while running in the SIGSEGV signal handler.

In general you will see this kind of thing if you get a fatal signal such as SIGSEGV or SIGILL when that signal is blocked. The kernel can't deliver the signal, since it is blocked, but it can't let the program continue to run, because it's a fatal problem, so it unconditionally kills the process.

gopherbot pushed a commit that referenced this issue Nov 21, 2018

runtime: add arg maps for sync/atomic functions in ARM64 race mode
In race mode, these functions are defined and declared in
different packages, which therefore don't have implicit arg maps.
When they are defer'd, and the stack needs to move, the runtime
fails with missing stack maps. This CL adds arg maps (FUNCDATA)
to them.

Updates #28848

Change-Id: I0271563b7e78e7797ce2990c303dced957efaa86
Reviewed-on: https://go-review.googlesource.com/c/150457
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@mengzhuo

This comment has been minimized.

Contributor

mengzhuo commented Nov 21, 2018

@cherrymui I ran all.bash and race.bash at CL150457 on my arm64 box (not qemu, 6GB RAM), all tests passed.

@cherrymui

This comment has been minimized.

Contributor

cherrymui commented Nov 21, 2018

Thanks @ianlancetaylor for the information. The builder machine doesn't seem to have gdb installed, so I couldn't easily find out whether it received a SIGSEGV.

Thanks @mengzhuo for sharing your results. How many CPU cores are there on your machine?

I think we still want to find out why it fails on the builder. One thing I noticed is that the builder has 96 cores, so when running the tests it may launch quite a number of processes/threads. This might lead to high memory consumption?

About OOMing, I tried to run an individual test that is likely to fail (runtime/pprof), having top open in another window to monitor the memory usage. The memory usage seems stay low, never exceeding 10%, and there are always more than 18 GiB free memory. However, it does consume swap space: it starts with more than 1.3 GiB of free swap, but before it's dying there is only a very small amount of free swap left. This may be the reason of killing?

KiB Mem:  32876144 total, 14176960 used, 18699184 free,   461440 buffers
KiB Swap:  2441980 total,  2440304 used,     1676 free.  2225808 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                
52443 root      20   0 6845760 2.855g    848 S 106.6  9.1   1:55.67 pprof.test                                                                                                                                    

It seems odd to me that it keeps consuming swap while there is still a good amount of free memory. Maybe there is some memory limit? ulimit doesn't show anything.

As a comparison, running the same test in non-race mode doesn't use much swap.

bradfitz pushed a commit that referenced this issue Nov 21, 2018

runtime: add arg maps for sync/atomic functions in ARM64 race mode
In race mode, these functions are defined and declared in
different packages, which therefore don't have implicit arg maps.
When they are defer'd, and the stack needs to move, the runtime
fails with missing stack maps. This CL adds arg maps (FUNCDATA)
to them.

Updates #28848

Change-Id: I0271563b7e78e7797ce2990c303dced957efaa86
Reviewed-on: https://go-review.googlesource.com/c/150457
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@zorrorffm

This comment has been minimized.

Contributor

zorrorffm commented Nov 22, 2018

@cherrymui I also ran race.bash on 3 machines with different configuration, all the tests passed.
machine A: 8 cores, 64GB RAM, 4GB swap
machine B: 32 cores, 128GB RAM, 33GB swap
machine C: 46 cores, 96GB RAM, 0 swap

I didn't observe obvious consumption of swap space during the test on machin A and B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment