Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: MSpanList_Insert stack trace on go get #17570

Closed
donrudo opened this issue Oct 24, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@donrudo
Copy link

commented Oct 24, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.7.3 (UPDATE: installed from brew)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/carlos.morales/Code/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zl/mzwmc4fn12zfnnp_67sc_vm9yflzkj/T/go-build360470061=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
Darwin 16.0.0 Darwin Kernel Version 16.0.0: Mon Aug 29 17:56:20 PDT 2016; root:xnu-3789.1.32~3/RELEASE_X86_64 x86_64

What did you do?

upgraded to 1.7.3

run any of

go get -u golang.org/x/tools/...
go get -v -x -u github.com/rogpeppe/godef/...
go get -u -v -x github.com/constabulary/gb/...
 go get -u -v -x gopkg.in/olivere/elastic.v2/...
go get -u -v -x github.com/derekparker/delve/cmd/dlv/...

What did you expect to see?

I was expecting to be able to setup common tools like

What did you see instead?

failed MSpanList_Insert 0x47bd10 0x189339c26f5e9 0x0 0x0
fatal error: MSpanList_Insert

runtime stack:
runtime.throw(0x2a3340, 0x10)
    /usr/local/go/src/runtime/panic.go:530 +0x90
runtime.(*mSpanList).insert(0x3a58a8, 0x47bd10)
    /usr/local/go/src/runtime/mheap.go:933 +0x293
runtime.(*mheap).freeSpanLocked(0x3a50a0, 0x47bd10, 0x100, 0x0)
    /usr/local/go/src/runtime/mheap.go:809 +0x4be
runtime.(*mheap).grow(0x3a50a0, 0x8, 0x0)
    /usr/local/go/src/runtime/mheap.go:675 +0x2a0
runtime.(*mheap).allocSpanLocked(0x3a50a0, 0x1, 0x47bc98)
    /usr/local/go/src/runtime/mheap.go:553 +0x4e3
runtime.(*mheap).alloc_m(0x3a50a0, 0x1, 0x1e, 0x47bca8)
    /usr/local/go/src/runtime/mheap.go:437 +0x119
runtime.(*mheap).alloc.func1()
    /usr/local/go/src/runtime/mheap.go:502 +0x41
runtime.systemstack(0xc820043e58)
    /usr/local/go/src/runtime/asm_amd64.s:307 +0xab
runtime.(*mheap).alloc(0x3a50a0, 0x1, 0x1000000001e, 0xeedf)
    /usr/local/go/src/runtime/mheap.go:503 +0x63
runtime.(*mcentral).grow(0x3a7090, 0x0)
    /usr/local/go/src/runtime/mcentral.go:209 +0x93
runtime.(*mcentral).cacheSpan(0x3a7090, 0xc800000000)
    /usr/local/go/src/runtime/mcentral.go:89 +0x47d
runtime.(*mcache).refill(0x400960, 0xc80000001e, 0xc82566b608)
    /usr/local/go/src/runtime/mcache.go:119 +0xcc
runtime.mallocgc.func2()
    /usr/local/go/src/runtime/malloc.go:642 +0x2b
runtime.systemstack(0xc820020000)
    /usr/local/go/src/runtime/asm_amd64.s:291 +0x79
runtime.mstart()
    /usr/local/go/src/runtime/proc.go:1048

goroutine 1 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:245 fp=0xc82566b190 sp=0xc82566b188
runtime.mallocgc(0x380, 0x1e8be0, 0xc800000000, 0x0)
    /usr/local/go/src/runtime/malloc.go:643 +0x869 fp=0xc82566b268 sp=0xc82566b190
runtime.newobject(0x1e8be0, 0x0)
    /usr/local/go/src/runtime/malloc.go:781 +0x42 fp=0xc82566b290 sp=0xc82566b268
cmd/link/internal/ld.writelines()
    /usr/local/go/src/cmd/link/internal/ld/dwarf.go:1651 +0xefc fp=0xc82566b648 sp=0xc82566b290
cmd/link/internal/ld.Dwarfemitdebugsections()
    /usr/local/go/src/cmd/link/internal/ld/dwarf.go:2108 +0xa59 fp=0xc82566b830 sp=0xc82566b648
cmd/link/internal/amd64.asmb()
    /usr/local/go/src/cmd/link/internal/amd64/asm.go:663 +0x8bb fp=0xc82566b9c0 sp=0xc82566b830
cmd/link/internal/ld.Ldmain()
    /usr/local/go/src/cmd/link/internal/ld/pobj.go:248 +0x1f5d fp=0xc82566be40 sp=0xc82566b9c0
cmd/link/internal/amd64.Main()
    /usr/local/go/src/cmd/link/internal/amd64/obj.go:44 +0x19 fp=0xc82566be48 sp=0xc82566be40
main.main()
    /usr/local/go/src/cmd/link/main.go:27 +0x36f fp=0xc82566bf20 sp=0xc82566be48
runtime.main()
    /usr/local/go/src/runtime/proc.go:188 +0x2b0 fp=0xc82566bf70 sp=0xc82566bf20
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82566bf78 sp=0xc82566bf70

Some runs will fail if retried with a similar stack in other tool, until it is able to go through all the tools without errors after some retries like godef, dlv and x/tools; others will keep failing (like gb)

@ianlancetaylor ianlancetaylor changed the title fatal error: MSpanList_Insert stack trace on go get runtime: fatal error: MSpanList_Insert stack trace on go get Oct 25, 2016

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Oct 25, 2016

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2016

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

I was unable to reproduce this on linux/amd64.

failed MSpanList_Insert 0x47bd10 0x189339c26f5e9 0x0 0x0

This is really weird. In order this is: span, span.next, span.prev, span.list. But the span pointer looks too low (maybe Darwin mmaps really low?), span.next is clearly nonsense, and span.next should really never be non-nil and span.list is nil.

@donrudo, could you paste a few more failures? I want to look for patterns. Assuming the tracebacks are the same, just the first line of the failure is all I need.

@zchee

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2016

@aclements It does not reproduce it in the same my environment, but

failed MSpanList_Insert 0x47bd10 0x189339c26f5e9 0x0 0x0

I think the cause is different, but it is similar to first ▶︎ 1. ./make.bash: error from this issue thread.
#16570

@donrudo

This comment has been minimized.

Copy link
Author

commented Oct 25, 2016

No problem, here is a number of logs I created by running this:

gogetlog.tar.gz

rm -rf Code/go
mkdir Code/go
cd Code/go
 go get -u -v -x 9fans.net/go/acme/... > log/acme.log 2>&1
 go get -u -v -x 9fans.net/go/acme/... > log/acme-2.log 2>&1
 go get -u -v -x 9fans.net/go/acme/... > log/acme-3.log 2>&1
 go get -u -v -x 9fans.net/go/acme/... > log/acme-4.log 2>&1
 go get -u -v -x 9fans.net/go/acme/... > log/acme-5.log 2>&1
 go get -u -v -x github.com/nsf/gocode/... > log/gocode-2.log 2>&1
 go get -u -v -x github.com/nsf/gocode/... > log/gocode-3.log 2>&1
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv xtools.log xtools-stuck-hadtoCtrlC.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-stuck-hadtoCtrlC-2.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-stuck-hadtoCtrlC-3.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-stuck-hadtoCtrlC-4.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-5.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-6.log
 go get -u -v -x golang.org/x/tools/... > log/xtools.log 2>&1
 mv log/xtools.log log/xtools-7.log
 go get -u -v -x github.com/derekparker/delve/cmd/dlv/...  > log/dlv.log 2>&1
 mv log/dlv.log log/dlv-1.log
 go get -u -v -x github.com/derekparker/delve/cmd/dlv/...  > log/dlv-2.log 2>&1

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

@zchee, thanks for the pointer. This does appear to be memory corruption, similar to #16570.

@donrudo, thanks for the logs. Digging into this more, it looks like Darwin does in fact allocate mspans from very low addresses, so the low span pointer is not a surprise. But if the span.next pointer were actually pointing to a span, it would also be a low value. That means this is almost certainly memory corruption. This is odd, because the mspans are in a very "stable" region of memory compared to the heap (once an mspan is allocated, it never gets freed, so the memory is type-stable) and are nowhere near the Go heap, so a bad heap pointer is unlikely to cause this. They are, however, close to the data segment (global variables) and intermingled with some runtime-internal data structures.

Also supporting the memory corruption theory is that one of your logs (xtools-5.log) shows a crash following a pointer in workbuf structure.

Also interesting is that the bad span.next value does look like a pointer (it's a "largish" 64-bit number, so it's probably not an int, and it's clearly not byte data). They're also all in roughly the same region: 0x[78]xxxxxxxxxx (even the workbuf crash has a pointer in this region if you undo the lfstack mangling). However, this isn't a Go pointer of any sort; it's nowhere near low enough to point to text, data, or runtime structures, and it doesn't have the right pattern to be a heap or heap bitmap pointer.

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

Also interesting is that the bad span.next value does look like a pointer

It's not a pointer. It's nanoseconds since roughly 2016-10-25T15:15 UTC. I have no idea what the significance of that epoch is, but the progression of the span.next values in the panics correlates precisely with mtimes of your log files.

@aclements

This comment has been minimized.

Copy link
Member

commented Oct 25, 2016

/cc @quentinmit. Random timestamp values corrupting memory on Darwin. Sounds right up your alley. :)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2016

Are you absolutely sure that the go tool you are running is one built from 1.7.3? Because this is what I would expect to see when running an earlier version of the go tool on Sierra. For example, what does go version print?

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2016

This is exactly consistent with using a Go version <1.7. "Nanoseconds since roughly 2016-10-25T15:15" sounds like the value of mach_absolute_time. Sierra started writing the value of mach_absolute_time into *rdx, which prior to 1.7 contained garbage. Now we put 0 there.

@donrudo You said that you are running 1.7.3 from brew, but I noticed that your logs are using GOROOT=/usr/local/go. That's where our .pkg installer places go; do you happen to have an older Go installed there, and it's shadowing the Go 1.7.3 from brew on your $PATH?

Maybe try running something like

which -a go
go version

to confirm that it's really 1.7.3.

@donrudo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2016

@quentinmit is right, after running his commands I saw this:

Code/templates » which -a go
go version
/usr/local/bin/go
/usr/local/go/bin/go
go version go1.7.3 darwin/amd64

the second /usr/local/go/bin/go, the same as my $GOROOT pointed to an old 1.6.1 setup.

After fixing the $GOROOT to make it point to the correct brew install ( /usr/local/Cellar/go/1.7.3/libexec ) all the libraries were able to be correctly built.

sorry for making you lose your time with this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.