Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gollvm: runtime and runtime/pprof packages test failed #37257

Open
erifan opened this issue Feb 17, 2020 · 20 comments
Open

gollvm: runtime and runtime/pprof packages test failed #37257

erifan opened this issue Feb 17, 2020 · 20 comments
Assignees
Milestone

Comments

@erifan
Copy link
Contributor

@erifan erifan commented Feb 17, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14rc1 gollvm LLVM 11.0.0git linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/erifan01/.cache/go-build"
GOENV="/home/erifan01/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/erifan01/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/erifan01/gollvm-master/install"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/erifan01/gollvm-master/install/tools"
GCCGO="/home/erifan01/gollvm-master/install/bin/llvm-goc"
AR="ar"
CC="/usr/bin/cc"
CXX="/usr/bin/c++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build180143903=/tmp/go-build -gno-record-gcc-switches -funwind-tables"

What did you do?

$ ninja check-gollvm

What did you expect to see?

All test pass

What did you see instead?

Tests of package runtime and runtime/pprof failed

@thanm @ianlancetaylor @cherrymui

@erifan erifan changed the title gollvm: runtime and runtime/pprof package test failed gollvm: runtime and runtime/pprof packages test failed Feb 17, 2020
@erifan
Copy link
Contributor Author

@erifan erifan commented Feb 17, 2020

Error info of runtime package test:

Invalid user of intrinsic instruction!
i64 ptrtoint (void (i8*, i8*, i64, i1)* @llvm.memmove.p0i8.p0i8.i64 to i64)
llvm-goc: /home/erifan01/gollvm-master/llvm-project/llvm/tools/gollvm/bridge/go-llvm.cpp:164: void Llvm_backend::verifyModule(): Assertion `!broken && "Module not well-formed."' failed.
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x2933ea1)[0x556ca3d21ea1]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x2933f34)[0x556ca3d21f34]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x2931cc5)[0x556ca3d1fcc5]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x293381e)[0x556ca3d2181e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f2a744d0890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f2a7332ee97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f2a73330801]
/lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f2a7332039a]
/lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f2a73320412]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x8faada)[0x556ca1ce8ada]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7e77df)[0x556ca1bd57df]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7e44a5)[0x556ca1bd24a5]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7e85d8)[0x556ca1bd65d8]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7d8611)[0x556ca1bc6611]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7d87d3)[0x556ca1bc67d3]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7cd5dc)[0x556ca1bbb5dc]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2a73311b97]
/home/erifan01/gollvm-master/build/./bin/llvm-goc(+0x7cca2a)[0x556ca1bbaa2a]
Stack dump:
0. Program arguments: /home/erifan01/gollvm-master/build/./bin/llvm-goc -L /home/erifan01/gollvm-master/build/tools/gollvm/libgo -fgo-compiling-runtime -g -g -fgo-pkgpath=runtime -c -I . -fno-toplevel-reorder -o gotest.o export_debuglog_test.go export_linux_test.go export_mmap_test.go export_test.go export_unix_test.go proc_runtime_test.go alg.go atomic_pointer.go cgo_gccgo.go cgocall.go cgocheck.go chan.go compiler.go cpuprof.go cputicks.go debug.go debuglog.go debuglog_off.go env_posix.go error.go extern.go fastlog2.go fastlog2table.go ffi.go float.go hash64.go heapdump.go iface.go lfstack.go lfstack_64bit.go lock_futex.go malloc.go map.go map_fast32.go map_fast64.go map_faststr.go mbarrier.go mbitmap.go mcache.go mcentral.go mem_gccgo.go mfinal.go mfixalloc.go mgc.go mgc_gccgo.go mgcmark.go mgcscavenge.go mgcsweep.go mgcsweepbuf.go mgcwork.go mheap.go mpagealloc.go mpagealloc_64bit.go mpagecache.go mpallocbits.go mprof.go mranges.go msan0.go msize.go mstats.go mwbbuf.go nbpipe_pipe2.go netpoll.go netpoll_epoll.go os_gccgo.go os_linux.go os_linux_noauxv.go panic.go panic32.go preempt.go preempt_nonwindows.go print.go proc.go profbuf.go proflabel.go race0.go rdebug.go relax_stub.go runtime.go runtime1.go runtime2.go rwmutex.go select.go sema.go signal_gccgo.go signal_unix.go sigqueue.go sigqueue_note.go sizeclasses.go slice.go string.go stubs.go stubs2.go stubs3.go stubs_linux.go symtab.go time.go time_nofake.go timestub.go timestub2.go trace.go traceback_gccgo.go type.go typekind.go utf8.go write_err.go runtime_sysinfo.go sigtab.go
/home/erifan01/gollvm-master/llvm-project/llvm/tools/gollvm/gofrontend/libgo/testsuite/gotest: line 471: 5976 Aborted (core dumped) $GC -g $pkgpatharg $prefixarg -c -I . -fno-toplevel-reorder -o gotest.o $gofiles $pkgbasefiles

@erifan
Copy link
Contributor Author

@erifan erifan commented Feb 17, 2020

The error log of the test of package runtime/pprof if very long, part of the log:

--- FAIL: TestCPUProfile (7.53s)
pprof_test.go:365: total 9 CPU profile samples collected:
2: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a2af (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)

    1: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a2ab (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)

    1: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a290 (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)

    1: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a298 (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)

    1: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a289 (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)

    3: 0x51c9b7 (runtime.callers:60) 0x52f256 (runtime.sigprof:3617) 0x40a2be (runtime/pprof.cpuHog0:58) 0x40a235 (runtime/pprof.cpuHog1:52) 0x40a1b9 (runtime/pprof.cpuHogger:38) 0x40a57d (pprof.TestCPUProfile..func1:90) 0x40a6a0 (pprof.testCPUProfile:276) 0x40a46a (runtime/pprof.TestCPUProfile:89) 0x586c80 (testing.tRunner:1029)
@toothrot toothrot added this to the gollvm milestone Feb 18, 2020
@toothrot
Copy link
Contributor

@toothrot toothrot commented Feb 18, 2020

/cc @thanm

@thanm thanm self-assigned this Feb 18, 2020
@thanm
Copy link
Member

@thanm thanm commented Feb 18, 2020

I'll take a look

@erifan
Copy link
Contributor Author

@erifan erifan commented Mar 25, 2020

I spent some time looking at the test failure of the runtime package. It contains several problems. One is that Memmove intrinsic functions cannot be called indirectly. The second is a segmentation fault. This segfault happens during invoking morestack function. Since there is very little stack trace information, I don't know how to debug it further. Any suggestions?

........
[New Thread 0x7fff39ffb700 (LWP 39198)]
[New Thread 0x7fff397fa700 (LWP 39199)]
[New Thread 0x7fff38ff9700 (LWP 39200)]
[New Thread 0x7fff1bfff700 (LWP 39201)]
[New Thread 0x7fff1b7fe700 (LWP 39202)]
[New Thread 0x7fff1affd700 (LWP 39203)]
[New Thread 0x7fff1a7fc700 (LWP 39204)]
[New Thread 0x7fff19ffb700 (LWP 39205)]
[New Thread 0x7fff197fa700 (LWP 39206)]
[New Thread 0x7fff18a3e700 (LWP 39207)]
[Thread 0x7fff9b3fd700 (LWP 39174) exited]
[Detaching after fork from child process 39236]

Thread 34 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff3a7fc700 (LWP 39197)]
0x00000000005357fc in __generic_morestack ()
(gdb) bt
#0  0x00000000005357fc in __generic_morestack ()
#1  0x00000000005353e6 in __morestack ()
#2  0x00007ffff6f4b6ba in runtime_mstart (arg=<optimized out>)
    at /home/erifan01/gollvm-runtime/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:593
#3  0x00007ffff59776db in start_thread (arg=0x7fff3a7fc700) at pthread_create.c:463
#4  0x00007ffff5cb088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@erifan
Copy link
Contributor Author

@erifan erifan commented Mar 25, 2020

By the way, the segmentation fault only happens on Linux Amd64 platform when building in release mode. This is my cmake command:
cmake -DCMAKE_C_FLAGS="-g" -DCMAKE_CXX_FLAGS="-g" -DGOLLVM_EXTRA_GOCFLAGS="-g" -DCMAKE_INSTALL_PREFIX=/home/eric/gollvm/install -DCMAKE_BUILD_TYPE=Release -G Ninja ../llvm-project/llvm

gofrontend version: tip
gollvm version: tip
llvm version: af57b139a0808be41383e8b3838bb8277423c2ab

@thanm
Copy link
Member

@thanm thanm commented Mar 25, 2020

I have been very busy with other work, sorry for not responding. Not sure how much time I can devote to this, things are a bit oversubscribed.

The memmove problem is due to recent changes in the runtime test (d6bf2d7 in the main Go repo) and in the LLVM IR verifier. The verifier now issues an error if you try to take the address of an intrinsic function, which is now happening in one of the runtime tests. Problem can be reproduced with this code:

package bad

import "unsafe"

//go:noescape
//extern __builtin_memmove
func memmove(to, from unsafe.Pointer, n uintptr)

var MyMemmove = memmove

I think the way forward here is to catch this case in the bridge and emit a stub or wrapper function, but I haven't written the code for this yet.

If you comment out that test and/or run with a release build (no verifier assertions), the SEGV looks like it is happening in runtime_test.TestSemaHandoff.

From my gdb session:

(gdb) thread apply all bt

Thread 8 (Thread 0x7fff9affb700 (LWP 165734)):
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
#1  0x00007ffff65c3ed0 in ?? () at pthread_create.c:362 from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fff9affb700 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7fffa96c3700 (LWP 165733)):
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
#1  0x00007ffff65c2e0f in create_thread (pd=pd@entry=0x7fff9affb700, attr=attr@entry=0x7fffa82bfd40, stopped_start=stopped_start@entry=0x7fffa82bfc9e, stackaddr=stackaddr@entry=0x7fff9affae40, thread_ran=thread_ran@entry=0x7fffa82bfc9f) at ../sysdeps/unix/sysv/linux/createthread.c:101
#2  0x00007ffff65c4842 in __pthread_create_2_1 (newthread=newthread@entry=0x7fffa82bfd38, attr=attr@entry=0x7fffa82bfd40, start_routine=start_routine@entry=0x7ffff7343f00 <runtime_mstart>, arg=arg@entry=0xc000380000) at pthread_create.c:826
#3  0x00007ffff7343b96 in runtime.newosproc (mp=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:262
#4  0x000000000049b9f1 in runtime.newm1 (mp=0x3d0f00) at proc.go:1695
#5  0x0000000000497f0f in runtime.newm (fn=<optimized out>, _p_=0xc000030600) at proc.go:1690
#6  0x000000000049bd18 in runtime.startm (_p_=0xc000030600, spinning=1 '\001') at proc.go:1811
#7  0x00000000004997f9 in runtime.wakep () at proc.go:1895
#8  0x0000000000499fca in runtime.resetspinning () at proc.go:2356
#9  runtime.schedule () at proc.go:2483
#10 0x0000000000536e2a in __morestack ()
#11 0x0000000000499cf2 in runtime.mstart1 () at proc.go:1081
#12 0x00007ffff7343fd9 in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:611
#13 0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#14 0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fffb33ec700 (LWP 165732)):
#0  pthread_sigmask (how=0, newmask=<optimized out>, oldmask=0x7fffb33ec648) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
#1  0x0000000000536dd4 in __morestack ()
#2  0x000000000049d14d in runtime.goschedImpl (gp=0xc000278000) at proc.go:2661
#3  0x0000000000498145 in runtime.gosched_m (gp=0xc000278000) at proc.go:2669
#4  0x00007ffff7343faa in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:593
#5  0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fffbd092700 (LWP 165731)):
#0  pthread_sigmask (how=2, newmask=<optimized out>, oldmask=0x0) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
#1  0x0000000000536e04 in __morestack ()
#2  0x000000000049d136 in runtime.goschedImpl (gp=0xc00027a000) at proc.go:2657
#3  0x0000000000498145 in runtime.gosched_m (gp=0xc00027a000) at proc.go:2669
#4  0x00007ffff7343faa in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:593
#5  0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fffcdd9f700 (LWP 165730)):
#0  0x00000000005371fc in __generic_morestack ()
#1  0x0000000000536de6 in __morestack ()
#2  0x00007ffff7343faa in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:593
#3  0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fffceb2c700 (LWP 165729)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000000000494162 in runtime.futex (addr=0x80, op=<optimized out>, val=<optimized out>, ts=0x0, addr2=0x0, val3=0) at os_linux.go:17
#2  0x0000000000479c29 in runtime.futexsleep (addr=0x6861d8 <runtime.sched+24>, val=0, ns=<optimized out>) at os_linux.go:53
#3  0x00000000004693e9 in runtime.lock (l=0x6861d8 <runtime.sched+24>) at lock_futex.go:113
#4  0x000000000049a256 in runtime.findrunnable () at proc.go:2051
#5  runtime.schedule () at proc.go:2476
#6  0x0000000000536e2a in __morestack ()
#7  0x000000000049d14d in runtime.goschedImpl (gp=0xc000278a80) at proc.go:2661
#8  0x0000000000498145 in runtime.gosched_m (gp=0xc000278a80) at proc.go:2669
#9  0x00007ffff7343faa in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:593
#10 0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#11 0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fffcf438700 (LWP 165728)):
#0  0x00007ffff66ce1c3 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffcec37f70) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00007ffff7344a80 in runtime.usleep (us=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/yield.c:58
#2  0x0000000000497ace in runtime.sysmon () at proc.go:4185
#3  0x0000000000536e2a in __morestack ()
#4  0x0000000000499cc4 in runtime.mstart1 () at proc.go:1074
#5  0x00007ffff7343fd9 in runtime_mstart (arg=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/proc.c:611
#6  0x00007ffff65c3fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#7  0x00007ffff66d62df in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff65b68c0 (LWP 165723)):
#0  __go_makecontext () at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/go-context.S:62
#1  0x000000000049e523 in __go_go (fn=5393648, arg=<optimized out>) at proc.go:3300
#2  0x00000000005246c1 in runtime_test.testSemaHandoff () at sema_test.go:65
#3  runtime_test.TestSemaHandoff (t=0xc000230120) at sema_test.go:21
#4  0x00007ffff77d840c in testing.tRunner (t=0xc000406fd0, fn=0x629f70 <runtime_test.TestSemaHandoff..f>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/testing/testing.go:1028
#5  0x00007ffff77d8cab in testing.testing..thunk31 (__go_thunk_parameter=<optimized out>) at /ssd2/llvm-project/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/testing/testing.go:1079
#6  0x0000000000499bc5 in runtime.kickoff () at proc.go:1053
#7  0x0000000000000000 in ?? ()
(gdb) 


So far I don't see anything too suspicious -- at this point in the test it is basically just starting a lot of goroutines.

I linked the test with a better copy of libgcc.a (for debugging symbols), here's what it reported:

Thread 3 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffceb2c700 (LWP 173818)]
0x000000000053720c in __generic_morestack (pframe_size=0x7fffceb2bd30, old_stack=0x7fffceb2bd50, param_size=0) at ../../../gcc-trunk/libgcc/generic-morestack.c:545
545	  if (*pp != NULL && (*pp)->size < frame_size)
(gdb) p pp
$3 = (struct stack_segment **) 0x7fffceb2c6f8
(gdb) p *pp
$4 = (struct stack_segment *) 0x7fffce22a000
(gdb) p **pp
Cannot access memory at address 0x7fffce22a000
(gdb) p (*pp)->size
Cannot access memory at address 0x7fffce22a010

This seems to point to __morestack_current_segment being corrupted somehow, which I don't have a good theory about.

@erifan
Copy link
Contributor Author

@erifan erifan commented Mar 25, 2020

Yes, I have debugged here too. I just don't know why __morestack_current_segment->next is an illegal address. I commented out several tests which can trigger SIGSEGV, including TestCallersNilPointerPanic, TestCallersDeferNilFuncPanic and TestCallersDivZeroPanic in caller_test.go, and TestPanicInlined in crash_test.go, but the segment fault still occurs. In addition, I used valgrind to check the memory leak situation. The error message is very long, mostly like this:

==9587== Warning: invalid file descriptor -1 in syscall read()
==9587== Warning: invalid file descriptor -1 in syscall write()
==9587== Warning: invalid file descriptor -1 in syscall close()
==9587== Thread 8:
==9587== Invalid write of size 8
==9587==    at 0x5353F6: __morestack (in /home/erifan01/gollvm-runtime/build/tools/gollvm/libgo/gotest24482/test/a.out)
==9587==  Address 0x3fa6d770 is on thread 1's stack
==9587==
==9587== Invalid read of size 8
==9587==    at 0x535D44: __morestack_unblock_signals (in /home/erifan01/gollvm-runtime/build/tools/gollvm/libgo/gotest24482/test/a.out)
==9587==  Address 0x3fa6d700 is on thread 1's stack
==9587==
==9587== Invalid read of size 8
==9587==    at 0x535D54: __morestack_unblock_signals (in /home/erifan01/gollvm-runtime/build/tools/gollvm/libgo/gotest24482/test/a.out)
==9587==  Address 0x3fa6d6c8 is on thread 1's stack
==9587==
==9587== Invalid read of size 8
==9587==    at 0x70911BC: pthread_sigmask (pthread_sigmask.c:27)
==9587==  Address 0x3fa6d728 is on thread 1's stack

Under normal circumstances, valgrind check will also report many errors, but the error type is not the case:

==14407== Use of uninitialised value of size 8
==14407==    at 0x5D8285C: __generic_releasestack (in /home/erifan01/gollvm-w/build/tools/gollvm/libgo/libgo.so.11git)
==14407==
==14407== Invalid read of size 8
==14407==    at 0x40199F0: __tls_get_addr (tls_get_addr.S:30)
==14407==    by 0x5D82953: __morestack_unblock_signals (in /home/erifan01/gollvm-w/build/tools/gollvm/libgo/libgo.so.11git)
==14407==    by 0x5D8334F: __morestack (in /home/erifan01/gollvm-w/build/tools/gollvm/libgo/libgo.so.11git)
==14407==    by 0x589D042: runtime.gogo (proc.c:290)
==14407==    by 0x4943A4: runtime.execute (proc.go:2162)
==14407==    by 0x492E17: runtime.schedule (proc.go:2577)
==14407==    by 0x492291: runtime.mstart1 (proc.go:1241)
==14407==    by 0x589D708: runtime_mstart (proc.c:611)
==14407==    by 0x5135F8: main (go-main.c:58)
==14407==  Address 0x4046988 is in a rw- anonymous segment
==14407==
==14407== Invalid read of size 8
==14407==    at 0x4019A00: __tls_get_addr (tls_get_addr.S:33)
==14407==    by 0x5D82953: __morestack_unblock_signals (in /home/erifan01/gollvm-w/build/tools/gollvm/libgo/libgo.so.11git)
==14407==    by 0x5D8334F: __morestack (in /home/erifan01/gollvm-w/build/tools/gollvm/libgo/libgo.so.11git)
==14407==    by 0x589D042: runtime.gogo (proc.c:290)
==14407==    by 0x4943A4: runtime.execute (proc.go:2162)
==14407==    by 0x492E17: runtime.schedule (proc.go:2577)
==14407==    by 0x492291: runtime.mstart1 (proc.go:1241)
==14407==    by 0x589D708: runtime_mstart (proc.c:611)
==14407==    by 0x5135F8: main (go-main.c:58)
==14407==  Address 0x40472e0 is in a rw- anonymous segment

I have some time to look at this issue recently, but I don't know which direction to go deeper.

@thanm
Copy link
Member

@thanm thanm commented Mar 25, 2020

I suspect that valgrind is not going to be a lot of help; it is not really set up for Go binaries. In particular it seems to be confused by the TLS references that the morestack code is making.

Regarding commenting out tests, you can also run just a single testpoint with the
-test.run=XXX flag, e.g.

./a.out -test.short -test.v -test.run=TestSemaHandoff

In terms of directions to go, maybe running under the debugger and using hardware watchpoints? It would also be useful to see why the same test isn't failing with gccgo (e.g. what's different between gccgo and gollvm).

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Mar 25, 2020

Just a quick thought: Could you try disabling getg inlining? (I lost access to my Linux machine so it's not easy for me to test myself.) The getg inlining is one place where gollvm and gccgo differ, and it delicately depends on the detail of LLVM optimizer. If this is broken, a bad G pointer could lead to a bad stack context, which could cause a fault in morestack.

@erifan
Copy link
Contributor Author

@erifan erifan commented Mar 25, 2020

@erifan
Copy link
Contributor Author

@erifan erifan commented Apr 3, 2020

Hi Cherry,disabling inline or building in debug mode can fix this issue.

Hi, this is not correct. Sorry for the false information. I retested turning off inline, and the problem persists. Ian perfected this test in the Go project not long ago (https://go-review.googlesource.com/c/go/+/219277), and I tried his patch in gollvm. This problem still exists.

@erifan
Copy link
Contributor Author

@erifan erifan commented Apr 3, 2020

Turning off split-stack fixes the failure, So I think it should be related to stack splitting, and the location where the error happens also reveals this. I found the following text in the source code of morestack (generic-morestack.c) I don't know if it is related to this problem.

/* The stack segment that we think we are currently using.  This will
--
be correct in normal usage, but will be incorrect if an exception
unwinds into a different stack segment or if longjmp jumps to a
different stack segment.  */
 
__thread struct stack_segment *__morestack_current_segment
__attribute__ ((visibility ("default")));

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 3, 2020

@erifan That comment is unlikely to indicate a problem. That comment is mainly aimed at C code that uses -fsplit-stack. It doesn't apply to Go code, which has neither exceptions nor longjmp.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 15, 2020

Change https://golang.org/cl/228239 mentions this issue: runtime: fix several minor bugs in runtime and runtime/pprof package

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 22, 2020

Change https://golang.org/cl/229337 mentions this issue: Revert "libgo: prune sighandler frames in runtime.sigprof"

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 26, 2020

Change https://golang.org/cl/230218 mentions this issue: Revert "test: improve runtime/pprof tests for gccgo"

@erifan
Copy link
Contributor Author

@erifan erifan commented Apr 27, 2020

With this patch https://go-review.googlesource.com/c/gofrontend/+/229337/, I found that the pprof package still has a random segfault on arm64. This segfault may occur when running these tests:
TestMathBigDivide, TestTracebackAll, TestAtomicLoadStore64, TestCPUProfileWithFork. Through TestTracebackAll, I found a test case from #29448, and then I tried it with and found that it crashes on arm64 with high probability. The error message is the same as pprof's segfault. The following is the stack trace information when a segfault occurs:

Thread 7 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffff5769f1e0 (LWP 100162)]
0x0000ffffbf5afeb8 in aarch64_fallback_frame_state (context=0xffff940e4350, context=0xffff940e4350,
    fs=0xffff940e4710) at ./md-unwind-support.h:74
74        if (pc[0] != MOVZ_X8_8B || pc[1] != SVC_0)
(gdb) bt
#0  0x0000ffffbf5afeb8 in aarch64_fallback_frame_state (context=0xffff940e4350, context=0xffff940e4350,
    fs=0xffff940e4710) at ./md-unwind-support.h:74
#1  uw_frame_state_for (context=context@entry=0xffff940e4350, fs=fs@entry=0xffff940e4710)
    at ../../../../gcc/libgcc/unwind-dw2.c:1271
#2  0x0000ffffbf5b16ac in _Unwind_Backtrace (trace=trace@entry=0x554140 <unwind>,
    trace_argument=trace_argument@entry=0xffff940e4db8) at ../../../../gcc/libgcc/unwind.inc:302
#3  0x0000000000554294 in backtrace_full (state=state@entry=0xffffbf6ef000, skip=skip@entry=0,
    callback=callback@entry=0x41f460 <callback>, error_callback=error_callback@entry=0x41f298 <error_callback>,
    data=data@entry=0xffff940e4e48)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/libgo/libbacktrace/backtrace.c:127
#4  0x000000000041f728 in runtime_callers (skip=<optimized out>, locbuf=0xffff940e4f40, m=<optimized out>,
    keep_thunks=false)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/runtime/go-callers.c:255
#5  0x00000000004ddd6c in runtime.callers (skip=<optimized out>, locbuf=...)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/runtime/traceback_gccgo.go:60
#6  0x00000000004ee5c8 in runtime.sigprof (pc=5175376, gp=0x40005b3800, mp=0x4000660000)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/runtime/proc.go:3617
#7  runtime.sighandler (sig=<optimized out>, info=<optimized out>, ctxt=<optimized out>, gp=0x40005b3800)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/runtime/signal_unix.go:471
#8  runtime.sigtrampgo (sig=<optimized out>, info=<optimized out>, ctx=<optimized out>)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/runtime/signal_unix.go:418
#9  <signal handler called>
#10 0x00000000004ef850 in runtime.makeslice (et=<optimized out>, len=<optimized out>, cap=<optimized out>)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/runtime/slice.go:59
#11 0x000000000046f4f8 in math..z2fbig.nat.setWord (pointer=<optimized out>, x=1)
    at /home/erifan01/gollvm-inasm/llvm-project/llvm/tools/gollvm/gofrontend/libgo/go/math/big/nat.go:76
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x24baf15fe1658f99 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p *context
$1 = {reg = {0xffff940e5ed8, 0xffff940e5ee0, 0xffff940e5ee8, 0xffff940e5ef0, 0xffff940e5ef8, 0xffff940e5f00,
    0xffff940e5f08, 0xffff940e5f10, 0xffff940e5f18, 0xffff940e5f20, 0xffff940e5f28, 0xffff940e5f30,
    0xffff940e5f38, 0xffff940e5f40, 0xffff940e5f48, 0xffff940e5f50, 0xffff940e5f58, 0xffff940e5f60,
    0xffff940e5f68, 0xffff8769fbd8, 0xffff8769fbd0, 0xffff8769fb68, 0xffff8769fb60, 0xffff8769fb58,
    0xffff940e5f98, 0xffff940e5fa0, 0xffff940e5fa8, 0xffff940e5fb0, 0xffff940e5fb8, 0xffff940e5fc0,
    0xffff8769fbc0, 0x0 <repeats 33 times>, 0xffff940e6000, 0xffff940e6010, 0xffff940e6020, 0xffff940e6030,
    0xffff940e6040, 0xffff940e6050, 0xffff940e6060, 0xffff940e6070, 0xffff940e6080, 0xffff940e6090,
    0xffff940e60a0, 0xffff940e60b0, 0xffff940e60c0, 0xffff940e60d0, 0xffff940e60e0, 0xffff940e60f0,
    0xffff940e6100, 0xffff940e6110, 0xffff940e6120, 0xffff940e6130, 0xffff940e6140, 0xffff940e6150,
    0xffff940e6160, 0xffff940e6170, 0xffff940e6180, 0xffff940e6190, 0xffff940e61a0, 0xffff940e61b0,
    0xffff940e61c0, 0xffff940e61d0, 0xffff940e61e0, 0xffff940e61f0, 0x4ef850 <runtime.makeslice+188>, 0x0},
  cfa = 0xffff8769fbe0, ra = 0x24baf15fe1658f99, lsda = 0x0, bases = {tbase = 0x0, dbase = 0x0,
    func = 0x46f4bc <math..z2fbig.nat.setWord>}, flags = 4611686018427387904, version = 0, args_size = 0,
  by_value = '\000' <repeats 96 times>, "\001"}

The frames below the frame <signal handler called> is random and the above frames are always the same. I don't know how this happened. Is there a way to know the stack information at "??" ? Thanks.

@erifan
Copy link
Contributor Author

@erifan erifan commented May 7, 2020

I fixed a few issues but there are still three more difficult ones.
1, Invalid use of intrinsic function memmove, it's better to fix it in middle-end.
2, Building gollvm in release mode on linux/amd64, running TestSemaHandoff will randomly trigger segmentation fault.
3, Building gollvm in arbitrary mode on linux/arm64, running "ninja check_libgo_runtime_pprof" will randomly trigger segmentation fault.
I feel that there is something to do with the second one and the third one, because the errors are both stack corruption. I have been looking at these issues intermittently for some time, but I have no clue at this time. If you know how to fix them, please help do it.

@thanm
Copy link
Member

@thanm thanm commented May 7, 2020

I'm willing to tackle the first issue (taking address of memmove) at some point. Sorry for the delays in responding on this, there have been a lot of other competing projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.