Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: handle hitting the top of the address space in the allocator more gracefully #35954

Closed
rathann opened this issue Dec 4, 2019 · 22 comments
Closed
Assignees
Milestone

Comments

@rathann
Copy link

@rathann rathann commented Dec 4, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/386

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go env
GO111MODULE=""
GOARCH="386"
GOBIN=""
GOCACHE="/builddir/.cache/go-build"
GOENV="/builddir/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="386"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/builddir/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_386"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m32 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build011990499=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the test suite for github.com/klauspost/compress 1.9.3.

What did you expect to see?

All tests pass.

What did you see instead?

zstd test fails with SIGSEGV.

I reported it upstream and they suggested reporting it here as well as it shouldn't segfault.

@ALTree
Copy link
Member

@ALTree ALTree commented Dec 4, 2019

Thanks for reporting this. If I've understood the report you linked, the issue is that an OOM caused by a goroutines leak is crashing the runtime with a SIGSEV instead of showing a more graceful failure message.

The crash starts like this:


fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x565db1a3]
runtime stack:
runtime.throw(0x5672f88b, 0x2a)
	/usr/lib/golang/src/runtime/panic.go:774 +0x70
runtime.sigpanic()
	/usr/lib/golang/src/runtime/signal_unix.go:378 +0x40a
runtime.(*mheap).setSpans(0x568e89c0, 0xff000000, 0x800, 0xe12c2118)
	/usr/lib/golang/src/runtime/mheap.go:1158 +0x43
runtime.(*mheap).growAddSpan(0x568e89c0, 0xff000000, 0x1000000)
	/usr/lib/golang/src/runtime/mheap.go:1311 +0xcd
runtime.(*mheap).grow(0x568e89c0, 0x800, 0xffffffff)
	/usr/lib/golang/src/runtime/mheap.go:1296 +0x6b
runtime.(*mheap).allocSpanLocked(0x568e89c0, 0x800, 0x568f9368, 0x0)
	/usr/lib/golang/src/runtime/mheap.go:1170 +0x24e
runtime.(*mheap).alloc_m(0x568e89c0, 0x800, 0x568e0101, 0x3bd57)
	/usr/lib/golang/src/runtime/mheap.go:1022 +0xed
runtime.(*mheap).alloc.func1()
	/usr/lib/golang/src/runtime/mheap.go:1093 +0x3f
runtime.(*mheap).alloc(0x568e89c0, 0x800, 0x56010101, 0x61c8e700)
	/usr/lib/golang/src/runtime/mheap.go:1092 +0x72
runtime.largeAlloc(0x1000000, 0x57b00101, 0x5)
	/usr/lib/golang/src/runtime/malloc.go:1138 +0x81
runtime.mallocgc.func1()
	/usr/lib/golang/src/runtime/malloc.go:1033 +0x3a
runtime.systemstack(0x2d673500)
	/usr/lib/golang/src/runtime/asm_386.s:399 +0x62
runtime.mstart()
	/usr/lib/golang/src/runtime/proc.go:1146
goroutine 22760 [running]:
runtime.systemstack_switch()
	/usr/lib/golang/src/runtime/asm_386.s:360 fp=0x59d98e18 sp=0x59d98e14 pc=0x5660fcc0
runtime.mallocgc(0x1000000, 0x567b7e20, 0x1, 0x565f9f25)
	/usr/lib/golang/src/runtime/malloc.go:1032 +0x6ff fp=0x59d98e6c sp=0x59d98e18 pc=0x565c264f
runtime.makeslice(0x567b7e20, 0x0, 0x1000000, 0x565f9ff5)
	/usr/lib/golang/src/runtime/slice.go:49 +0x50 fp=0x59d98e80 sp=0x59d98e6c pc=0x565f9f70
runtime.makeslice64(0x567b7e20, 0x0, 0x0, 0x1000000, 0x0, 0xf7cc036c)
	/usr/lib/golang/src/runtime/slice.go:63 +0x50 fp=0x59d98e94 sp=0x59d98e80 pc=0x565fa040
github.com/klauspost/compress/zstd.(*Decoder).DecodeAll(0x577cc050, 0x6a8043c0, 0x3b, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/decoder.go:318 +0x346 fp=0x59d98f10 sp=0x59d98e94 pc=0x566f1ac6
github.com/klauspost/compress/zstd.TestEncoderRegression.func1.1.1(0x41abc320)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/encoder_test.go:224 +0x46a fp=0x59d98f9c sp=0x59d98f10 pc=0x5671fe2a
testing.tRunner(0x41abc320, 0x60a9a370)
	/usr/lib/golang/src/testing/testing.go:909 +0xae fp=0x59d98fe8 sp=0x59d98f9c pc=0x5667749e
runtime.goexit()
	/usr/lib/golang/src/runtime/asm_386.s:1325 +0x1 fp=0x59d98fec sp=0x59d98fe8 pc=0x56611721
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:960 +0x2d3
goroutine 1 [chan receive]:
testing.(*T).Run(0x57a3a320, 0x56729ce1, 0x15, 0x567e43b8, 0x301)
	/usr/lib/golang/src/testing/testing.go:961 +0x2f2
testing.runTests.func1(0x56cdc000)
	/usr/lib/golang/src/testing/testing.go:1202 +0x5b
testing.tRunner(0x56cdc000, 0x56c666dc)
	/usr/lib/golang/src/testing/testing.go:909 +0xae
testing.runTests(0x56c0e080, 0x568d0d40, 0x27, 0x27, 0x0)
	/usr/lib/golang/src/testing/testing.go:1200 +0x28b
testing.(*M).Run(0x56c8e180, 0x0)
	/usr/lib/golang/src/testing/testing.go:1117 +0x16e
main.main()
	_testmain.go:152 +0x14f
goroutine 7254 [chan receive]:
testing.(*T).Run(0x57a3a3c0, 0x567268a8, 0x7, 0x579c2700, 0x0)
	/usr/lib/golang/src/testing/testing.go:961 +0x2f2
github.com/klauspost/compress/zstd.TestEncoderRegression(0x57a3a320)
	/builddir/build/BUILD/compress-1.9.3/_build/src/github.com/klauspost/compress/zstd/encoder_test.go:170 +0x23e
testing.tRunner(0x57a3a320, 0x567e43b8)
	/usr/lib/golang/src/testing/testing.go:909 +0xae
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:960 +0x2d3

and then goes on with a list of the thousands of (leaked) goroutines.

cc @aclements @mknyszek

@ALTree ALTree changed the title SIGSEGV when running zstd test of github.com/klauspost/compress 1.9.3 on linux/386 runtime: SIGSEGV crash when OOM after goroutines leak on linux/386 Dec 4, 2019
@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

I can reproduce on 1.13.4, with github.com/klauspost/compress at b87d2c411722993723a9de80b3510411f15ae0a8.

We get a different error with tip:

~/gopath/src/github.com/klauspost/compress$ GOARCH=386 GO111MODULE=off ~/sandbox/readonly/bin/go test -count=10 github.com/klauspost/compress/zstd
fatal error: index out of range

runtime stack:
runtime.throw(0x81eb1de, 0x12)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:1106 +0x6a
runtime.panicCheck1(0x806b3e6, 0x81eb1de, 0x12)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:34 +0xb9
runtime.goPanicIndexU(0x3fc, 0x3fc)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/panic.go:91 +0x37
runtime.(*pageAlloc).update(0x832b704, 0xff000000, 0x800, 0x1)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mpagealloc.go:409 +0x4b6
runtime.(*pageAlloc).grow(0x832b704, 0xff000000, 0x1000000)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mpagealloc.go:377 +0x16a
runtime.(*mheap).grow(0x832b700, 0x800, 0x0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:1321 +0x7f
runtime.(*mheap).allocSpan(0x832b700, 0x800, 0x100, 0x833c128, 0x8d840e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:1120 +0x5df
runtime.(*mheap).alloc.func1()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:867 +0x4c
runtime.(*mheap).alloc(0x832b700, 0x800, 0x8c20101, 0x9577500)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/mheap.go:861 +0x63
runtime.largeAlloc(0x1000000, 0x101, 0x974c0e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/malloc.go:1150 +0x76
runtime.mallocgc.func1()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/malloc.go:1045 +0x39
runtime.systemstack(0x8d840e0)
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/asm_386.s:395 +0x53
runtime.mstart()
	/usr/local/google/home/khr/sandbox/readonly/src/runtime/proc.go:1077

Slightly better, but still not great.

I can reproduce with this:

package main

type T struct {
	next *T
	buf  [1<<23 - 4]byte
}

func main() {
	var p *T
	for {
		t := &T{}
		t.next = p
		p = t
	}
}

Run with GOARCH=386 and it fails in the same way as the OP's program.
Replace 1<<23 with 1<22 and it fails correctly, giving an out of memory error.

This seems to be a regression from 1.12 to 1.13.

My gut feeling is we should just fix this at tip.
As it is a regression, backporting to 1.13 is possible, but given it just affects the failure message, it doesn't seem worth it. And the allocator has changed quite a bit, which means a whole different CL, probably.
In the other direction, we could just wait until 1.15. I guess it depends on how safe the fix is.

@randall77 randall77 added this to the Go1.14 milestone Dec 9, 2019
@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

Tentatively milestoning as 1.14, but not release blocker.

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 9, 2019

Actually, this is somewhat nondeterministic even with my small repro. Larger sizes tend to make it fail more often, but the demarcation size is not as clear cut as I indicated previously.

@rathann
Copy link
Author

@rathann rathann commented Dec 10, 2019

From my side, I applied upstream-recommended work-around and I run the test suite with -short on x86 32-bit, which doesn't trigger the out-of-memory condition. Feel free to take your time and implement a proper fix. I wonder why it doesn't happen on ARM 32-bit, though.

@aclements
Copy link
Member

@aclements aclements commented Dec 10, 2019

@randall77, what commit was that traceback from? (mpagealloc.go:409 is a comment at current tip.)

@randall77
Copy link
Contributor

@randall77 randall77 commented Dec 10, 2019

@aclements, tip was bf3ee57

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Dec 10, 2019

AFAICT this is the result of the runtime not handling allocations at the top of the address space well.

We're running on 32-bit platforms and the arguments to both setSpans (the failing function in the original post) and grow (the failing function at tip that @randall77 produced) are a base address of 0xff000000 and 0x800 pages which translates to 0x1000000 bytes. Adding that to the base address gives us zero with wrap-around, so it's no wonder its failing.

I think we need a bunch of overflow checking in more than one place in the runtime to fix this properly. I'm not sure why setSpans would suddenly start having problems in Go 1.13 or even Go 1.12 as that function hasn't been touched in several releases IIRC. I suspect memory behavior just changed in recent releases which is surfacing this behavior in this instance.

@mknyszek mknyszek modified the milestones: Go1.14, Go1.15 Jan 16, 2020
@mknyszek mknyszek changed the title runtime: SIGSEGV crash when OOM after goroutines leak on linux/386 runtime: handle hitting the top of the address space in the allocator more gracefully Jan 16, 2020
@aclements
Copy link
Member

@aclements aclements commented Apr 16, 2020

Ping @mknyszek

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 16, 2020

Ah thanks, nearly forgot about this. Time to play overflow whack-a-mole!

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 28, 2020

Change https://golang.org/cl/230719 mentions this issue: runtime: use inclusive upper bounds for all address ranges

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 29, 2020

I've confirmed that the latest version of https://golang.org/cl/230719 properly crashes @randall77's program above via an "out of memory" error from the OS allocator, as opposed to a random segfault.

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231341 mentions this issue: runtime: have makeAddrRange accept a base and a size

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231344 mentions this issue: runtime: make pageAlloc.end inclusive

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231339 mentions this issue: runtime: change the addrRange representation to base+length

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231345 mentions this issue: runtime: make addrRange.end() return an inclusive upper bound

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231338 mentions this issue: runtime: avoid overflow in linearAlloc

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231342 mentions this issue: runtime: make addrsToSummaryRange accept inclusive address bounds

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231343 mentions this issue: runtime: have sysGrow accept a base and a size instead of addresses

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231337 mentions this issue: runtime: avoid overflow with mheap.curArena

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231346 mentions this issue: runtime: test the top of the address space in the page allocator

@gopherbot
Copy link

@gopherbot gopherbot commented May 1, 2020

Change https://golang.org/cl/231340 mentions this issue: runtime: clean up scavenging logic with new removeAbove helper

gopherbot pushed a commit that referenced this issue May 7, 2020
Currently when checking if we can grow the heap into the current arena,
we do an addition which may overflow. This is particularly likely on
32-bit systems.

Avoid this situation by explicitly checking for overflow, and adding in
some comments about when overflow is possible, when it isn't, and why.

For #35954.

Change-Id: I2d4ecbb1ccbd43da55979cc721f0cd8d1757add2
Reviewed-on: https://go-review.googlesource.com/c/go/+/231337
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot gopherbot closed this in 2e455ec May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.