Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: handle hitting the top of the address space in the allocator more gracefully #35954

Open
rathann opened this issue Dec 4, 2019 · 8 comments
Assignees
Labels
Milestone

Comments

@rathann
Copy link

@rathann rathann commented Dec 4, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/386

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go env
GO111MODULE=""
GOARCH="386"
GOBIN=""
GOCACHE="/builddir/.cache/go-build"
GOENV="/builddir/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="386"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/builddir/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_386"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m32 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build011990499=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the test suite for github.com/klauspost/compress 1.9.3.

What did you expect to see?

All tests pass.

What did you see instead?

zstd test fails with SIGSEGV.

I reported it upstream and they suggested reporting it here as well as it shouldn't segfault.

@ALTree

This comment has been minimized.

Copy link
Member

@ALTree ALTree commented Dec 4, 2019

Thanks for reporting this. If I've understood the report you linked, the issue is that an OOM caused by a goroutines leak is crashing the runtime with a SIGSEV instead of showing a more graceful failure message.

The crash starts like this:


fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x565db1a3]
runtime stack:
runtime.throw(0x5672f88b, 0x2a)
	/usr/lib/golang/src/runtime/panic.go:774 +0x70
runtime.sigpanic()
	/usr/lib/golang/src/runtime/signal_unix.go:378 +0x40a
runtime.(*mheap).setSpans(0x568e89c0, 0xff000000, 0x800, 0xe12c2118)
	/usr/lib/golang/src/runtime/mheap.go:1158 +0x43
runtime.(*mheap).growAddSpan(0x568e89c0, 0xff000000, 0x1000000)
	/usr/lib/golang/src/runtime/mheap.go:1311 +0xcd
runtime.(*mheap).grow(0x568e89c0, 0x800, 0xffffffff)
	/usr/lib/golang/src/runtime/mheap.go:1296 +0x6b
runtime.(*mheap).allocSpanLocked(0x568e89c0, 0x800, 0x568f9368, 0x0)
	/usr/lib/golang/src/runtime/mheap.go:1170 +0x24e
runtime.(*mheap).alloc_m(0x568e89c0, 0x800, 0x568e0101, 0x3bd57)
	/usr/lib/golang/src/runtime/mheap.go:1022 +0xed
runtime.(*mheap).alloc.func1()
	/usr/lib/golang/src/runtime/mheap.go:1093 +0x3f
runtime.(*mheap).alloc(0x568e89c0, 0x800, 0x56010101, 0x61c8e700)
	/usr/lib/golang/src/runtime/mheap.go:1092 +0x72
runtime.largeAlloc(0x1000000, 0x57b00101, 0x5)
	/usr/lib/golang/src/runtime/malloc.go:1138 +0x81
runtime.mallocgc.func1()
	/usr/lib/golang/src/runtime/malloc.go:1033 +0x3a
runtime.systemstack(0x2d673500)
	/usr/lib/golang/src/runtime/asm_386.s:399 +0x62
runtime.mstart()
	/usr/lib/golang/src/runtime/proc.go:1146
goroutine 22760 [running]:
runtime.systemstack_switch()
	/usr/lib/golang/src/runtime/asm_386.s:360 fp=0x59d98e18 sp=0x59d98e14 pc=0x5660fcc0
runtime.mallocgc(0x1000000, 0x567b7e20, 0x1, 0x565f9f25)
	/usr/lib/golang/src/runtime/malloc.go:1032 +0x6ff fp=0x59d98e6c sp=0x59d98e18 pc=0x565c264f
runtime.makeslice(0x567b7e20, 0x0, 0x1000000, 0x565f9ff5)
	/usr/lib/golang/src/runtime/slice.go:49 +0x50 fp=0x59d98e80 sp=0x59d98e6c pc=0x565f9f70
runtime.makeslice64(0x567b7e20, 0x0, 0x0, 0x1000000, 0x0, 0xf7cc036c)
	/usr/lib/golang/src/runtime/slice.go:63 +0x50 fp=0x59d98e94 sp=0x59d98e80 pc=0x565fa040
github.com/klauspost/compress/zstd.(*Decoder).DecodeAll(0x577cc050, 0x6a8043c0, 0x3b, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/decoder.go:318 +0x346 fp=0x59d98f10 sp=0x59d98e94 pc=0x566f1ac6
github.com/klauspost/compress/zstd.TestEncoderRegression.func1.1.1(0x41abc320)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/encoder_test.go:224 +0x46a fp=0x59d98f9c sp=0x59d98f10 pc=0x5671fe2a
testing.tRunner(0x41abc320, 0x60a9a370)
	/usr/lib/golang/src/testing/testing.go:909 +0xae fp=0x59d98fe8 sp=0x59d98f9c pc=0x5667749e
runtime.goexit()
	/usr/lib/golang/src/runtime/asm_386.s:1325 +0x1 fp=0x59d98fec sp=0x59d98fe8 pc=0x56611721
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:960 +0x2d3
goroutine 1 [chan receive]:
testing.(*T).Run(0x57a3a320, 0x56729ce1, 0x15, 0x567e43b8, 0x301)
	/usr/lib/golang/src/testing/testing.go:961 +0x2f2
testing.runTests.func1(0x56cdc000)
	/usr/lib/golang/src/testing/testing.go:1202 +0x5b
testing.tRunner(0x56cdc000, 0x56c666dc)
	/usr/lib/golang/src/testing/testing.go:909 +0xae
testing.runTests(0x56c0e080, 0x568d0d40, 0x27, 0x27, 0x0)
	/usr/lib/golang/src/testing/testing.go:1200 +0x28b
testing.(*M).Run(0x56c8e180, 0x0)
	/usr/lib/golang/src/testing/testing.go:1117 +0x16e
main.main()
	_testmain.go:152 +0x14f
goroutine 7254 [chan receive]:
testing.(*T).Run(0x57a3a3c0, 0x567268a8, 0x7, 0x579c2700, 0x0)
	/usr/lib/golang/src/testing/testing.go:961 +0x2f2
github.com/klauspost/compress/zstd.TestEncoderRegression(0x57a3a320)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/encoder_test.go:170 +0x23e
testing.tRunner(0x57a3a320, 0x567e43b8)
	/usr/lib/golang/src/testing/testing.go:909 +0xae
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:960 +0x2d3

and then goes on with a list of the thousands of (leaked) goroutines.

cc @aclements @mknyszek

@ALTree ALTree changed the title SIGSEGV when running zstd test of github.com/klauspost/compress 1.9.3 on linux/386 runtime: SIGSEGV crash when OOM after goroutines leak on linux/386 Dec 4, 2019
@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

I can reproduce on 1.13.4, with github.com/klauspost/compress at b87d2c411722993723a9de80b3510411f15ae0a8.

We get a different error with tip:

~/gopath/src/github.com/klauspost/compress$ GOARCH=386 GO111MODULE=off ~/sandbox/readonly/bin/go test -count=10 github.com/klauspost/compress/zstd
fatal error: index out of range

runtime stack:
runtime.throw(0x81eb1de, 0x12)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:1106 +0x6a
runtime.panicCheck1(0x806b3e6, 0x81eb1de, 0x12)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:34 +0xb9
runtime.goPanicIndexU(0x3fc, 0x3fc)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:91 +0x37
runtime.(*pageAlloc).update(0x832b704, 0xff000000, 0x800, 0x1)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mpagealloc.go:409 +0x4b6
runtime.(*pageAlloc).grow(0x832b704, 0xff000000, 0x1000000)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mpagealloc.go:377 +0x16a
runtime.(*mheap).grow(0x832b700, 0x800, 0x0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:1321 +0x7f
runtime.(*mheap).allocSpan(0x832b700, 0x800, 0x100, 0x833c128, 0x8d840e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:1120 +0x5df
runtime.(*mheap).alloc.func1()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:867 +0x4c
runtime.(*mheap).alloc(0x832b700, 0x800, 0x8c20101, 0x9577500)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:861 +0x63
runtime.largeAlloc(0x1000000, 0x101, 0x974c0e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/malloc.go:1150 +0x76
runtime.mallocgc.func1()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/malloc.go:1045 +0x39
runtime.systemstack(0x8d840e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/asm_386.s:395 +0x53
runtime.mstart()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/proc.go:1077

Slightly better, but still not great.

I can reproduce with this:

package main

type T struct {
	next *T
	buf  [1<<23 - 4]byte
}

func main() {
	var p *T
	for {
		t := &T{}
		t.next = p
		p = t
	}
}

Run with GOARCH=386 and it fails in the same way as the OP's program.
Replace 1<<23 with 1<22 and it fails correctly, giving an out of memory error.

This seems to be a regression from 1.12 to 1.13.

My gut feeling is we should just fix this at tip.
As it is a regression, backporting to 1.13 is possible, but given it just affects the failure message, it doesn't seem worth it. And the allocator has changed quite a bit, which means a whole different CL, probably.
In the other direction, we could just wait until 1.15. I guess it depends on how safe the fix is.

@randall77 randall77 added this to the Go1.14 milestone Dec 9, 2019
@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

Tentatively milestoning as 1.14, but not release blocker.

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

Actually, this is somewhat nondeterministic even with my small repro. Larger sizes tend to make it fail more often, but the demarcation size is not as clear cut as I indicated previously.

@rathann

This comment has been minimized.

Copy link
Author

@rathann rathann commented Dec 10, 2019

From my side, I applied upstream-recommended work-around and I run the test suite with -short on x86 32-bit, which doesn't trigger the out-of-memory condition. Feel free to take your time and implement a proper fix. I wonder why it doesn't happen on ARM 32-bit, though.

@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Dec 10, 2019

@randall77, what commit was that traceback from? (mpagealloc.go:409 is a comment at current tip.)

@randall77

This comment has been minimized.

Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

@aclements, tip was bf3ee57

@mknyszek

This comment has been minimized.

Copy link
Contributor

@mknyszek mknyszek commented Dec 10, 2019

AFAICT this is the result of the runtime not handling allocations at the top of the address space well.

We're running on 32-bit platforms and the arguments to both setSpans (the failing function in the original post) and grow (the failing function at tip that @randall77 produced) are a base address of 0xff000000 and 0x800 pages which translates to 0x1000000 bytes. Adding that to the base address gives us zero with wrap-around, so it's no wonder its failing.

I think we need a bunch of overflow checking in more than one place in the runtime to fix this properly. I'm not sure why setSpans would suddenly start having problems in Go 1.13 or even Go 1.12 as that function hasn't been touched in several releases IIRC. I suspect memory behavior just changed in recent releases which is surfacing this behavior in this instance.

@mknyszek mknyszek modified the milestones: Go1.14, Go1.15 Jan 16, 2020
@mknyszek mknyszek changed the title runtime: SIGSEGV crash when OOM after goroutines leak on linux/386 runtime: handle hitting the top of the address space in the allocator more gracefully Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.