Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: OOM when runing in sparse available memory system #39400

Open
jeremyshisong opened this issue Jun 4, 2020 · 8 comments
Open

runtime: OOM when runing in sparse available memory system #39400

jeremyshisong opened this issue Jun 4, 2020 · 8 comments
Milestone

Comments

@jeremyshisong
Copy link

@jeremyshisong jeremyshisong commented Jun 4, 2020

What version of Go are you using (go version)?

$ go version
go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

YES

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/data/golang"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go1.11"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go1.11/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build473055106=/tmp/go-build -gno-record-gcc-switches"

What did you do?

When I use go1.11 to compile elastic metricbeat with tags v7.0.0,the binary file would run out of memory when the system available memory is sparse.

The system memory useage is like this:

free -m
              total        used        free      shared  buff/cache   available
Mem:           1868        1686          14          97         167          43
Swap:             0           0           0

What did you expect to see?

The system available memory is sparse, But Metricbeat memory usage would less then 10MB when it started. So It could be started.

What did you see instead?

It can't start instead of oom.

The oom runtime stack logs are:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x283d1da, 0x16)
        /usr/local/go1.11/src/runtime/panic.go:608 +0x72 fp=0x7ffc2108ab90 sp=0x7ffc2108ab60 pc=0x11123e2
runtime.sysMap(0xc000000000, 0x4000000, 0x3e26558)
        /usr/local/go1.11/src/runtime/mem_linux.go:156 +0xc7 fp=0x7ffc2108abd0 sp=0x7ffc2108ab90 pc=0x10fd687
runtime.(*mheap).sysAlloc(0x3e0c560, 0x4000000, 0x0, 0x0)
        /usr/local/go1.11/src/runtime/malloc.go:619 +0x1c7 fp=0x7ffc2108ac58 sp=0x7ffc2108abd0 pc=0x10f14d7
runtime.(*mheap).grow(0x3e0c560, 0x1, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:920 +0x42 fp=0x7ffc2108acb0 sp=0x7ffc2108ac58 pc=0x11099c2
runtime.(*mheap).allocSpanLocked(0x3e0c560, 0x1, 0x3e26568, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:848 +0x337 fp=0x7ffc2108acf0 sp=0x7ffc2108acb0 pc=0x1109847
runtime.(*mheap).alloc_m(0x3e0c560, 0x1, 0x2a, 0x0)
        /usr/local/go1.11/src/runtime/mheap.go:692 +0x119 fp=0x7ffc2108ad30 sp=0x7ffc2108acf0 pc=0x1109059
runtime.(*mheap).alloc.func1()
        /usr/local/go1.11/src/runtime/mheap.go:759 +0x4c fp=0x7ffc2108ad68 sp=0x7ffc2108ad30 pc=0x113ef1c
runtime.(*mheap).alloc(0x3e0c560, 0x1, 0x101002a, 0x7ffc2108add0)
        /usr/local/go1.11/src/runtime/mheap.go:758 +0x8a fp=0x7ffc2108adb8 sp=0x7ffc2108ad68 pc=0x11092fa
runtime.(*mcentral).grow(0x3e0e318, 0x0)
        /usr/local/go1.11/src/runtime/mcentral.go:232 +0x94 fp=0x7ffc2108ae00 sp=0x7ffc2108adb8 pc=0x10fd084
runtime.(*mcentral).cacheSpan(0x3e0e318, 0x0)
        /usr/local/go1.11/src/runtime/mcentral.go:106 +0x2f8 fp=0x7ffc2108ae48 sp=0x7ffc2108ae00 pc=0x10fcbd8
runtime.(*mcache).refill(0x7f237641b000, 0x220000002a)
        /usr/local/go1.11/src/runtime/mcache.go:122 +0x95 fp=0x7ffc2108ae78 sp=0x7ffc2108ae48 pc=0x10fc795
runtime.(*mcache).nextFree.func1()
        /usr/local/go1.11/src/runtime/malloc.go:749 +0x32 fp=0x7ffc2108ae98 sp=0x7ffc2108ae78 pc=0x113e332
runtime.(*mcache).nextFree(0x7f237641b000, 0x3e2652a, 0x4000, 0x7f237641b000, 0x7ffc2108af58)
        /usr/local/go1.11/src/runtime/malloc.go:748 +0xb6 fp=0x7ffc2108aef0 sp=0x7ffc2108ae98 pc=0x10f1b86
runtime.mallocgc(0x180, 0x2807a20, 0x7ffc2108af01, 0x7f237641f000)
        /usr/local/go1.11/src/runtime/malloc.go:903 +0x793 fp=0x7ffc2108af90 sp=0x7ffc2108aef0 pc=0x10f24d3
runtime.newobject(0x2807a20, 0x3e265c0)
        /usr/local/go1.11/src/runtime/malloc.go:1032 +0x38 fp=0x7ffc2108afc0 sp=0x7ffc2108af90 pc=0x10f28b8
runtime.malg(0x7f2300008000, 0x7f237641b000)
        /usr/local/go1.11/src/runtime/proc.go:3285 +0x31 fp=0x7ffc2108b000 sp=0x7ffc2108afc0 pc=0x111c661
runtime.mpreinit(0x3e06840)
        /usr/local/go1.11/src/runtime/os_linux.go:311 +0x29 fp=0x7ffc2108b020 sp=0x7ffc2108b000 pc=0x1110819
runtime.mcommoninit(0x3e06840)
        /usr/local/go1.11/src/runtime/proc.go:624 +0xc1 fp=0x7ffc2108b058 sp=0x7ffc2108b020 pc=0x1115f71
runtime.schedinit()
        /usr/local/go1.11/src/runtime/proc.go:546 +0x89 fp=0x7ffc2108b0c0 sp=0x7ffc2108b058 pc=0x1115c39
runtime.rt0_go(0x7ffc2108b1c8, 0x1, 0x7ffc2108b1c8, 0x0, 0x7f2375b95c05, 0x2000000000, 0x7ffc2108b1c8, 0x100000000, 0x1140e50, 0x0, ...)
        /usr/local/go1.11/src/runtime/asm_amd64.s:195 +0x11a fp=0x7ffc2108b0c8 sp=0x7ffc2108b0c0 pc=0x1140f7a

BUT
When I use go1.10.8 to build the same elastic metricbeat code, It started without oom and the system available memory is also about 40 MB left.

Is there any breaking changes start from go version 1.11?

@davecheney
Copy link
Contributor

@davecheney davecheney commented Jun 4, 2020

Go 1.11 is no longer supported, http://golang.org/s/release. Can you please re test with Go 1.14.

@jeremyshisong
Copy link
Author

@jeremyshisong jeremyshisong commented Jun 4, 2020

@davecheney Thanks for your reply,I have tested with the latest release Go1.14.4. the result is same.

The runtime stack logs are:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x26d9370, 0x16)
        /usr/local/go1.14.4/src/runtime/panic.go:1116 +0x72 fp=0x7fff2e91e338 sp=0x7fff2e91e308 pc=0x1034092
runtime.sysMap(0xc000000000, 0x4000000, 0x3afca58)
        /usr/local/go1.14.4/src/runtime/mem_linux.go:169 +0xc5 fp=0x7fff2e91e378 sp=0x7fff2e91e338 pc=0x1017d55
runtime.(*mheap).sysAlloc(0x3ae7920, 0x400000, 0x0, 0x0)
        /usr/local/go1.14.4/src/runtime/malloc.go:715 +0x1cd fp=0x7fff2e91e420 sp=0x7fff2e91e378 pc=0x100b17d
runtime.(*mheap).grow(0x3ae7920, 0x1, 0x0)
        /usr/local/go1.14.4/src/runtime/mheap.go:1286 +0x11c fp=0x7fff2e91e488 sp=0x7fff2e91e420 pc=0x102600c
runtime.(*mheap).allocSpan(0x3ae7920, 0x1, 0xd352a00, 0x3afca68, 0x1)
        /usr/local/go1.14.4/src/runtime/mheap.go:1124 +0x6a0 fp=0x7fff2e91e508 sp=0x7fff2e91e488 pc=0x1025d50
runtime.(*mheap).alloc.func1()
        /usr/local/go1.14.4/src/runtime/mheap.go:871 +0x64 fp=0x7fff2e91e560 sp=0x7fff2e91e508 pc=0x10630e4
runtime.(*mheap).alloc(0x3ae7920, 0x1, 0x12a, 0x1)
        /usr/local/go1.14.4/src/runtime/mheap.go:865 +0x81 fp=0x7fff2e91e5b0 sp=0x7fff2e91e560 pc=0x10252b1
runtime.(*mcentral).grow(0x3af87f8, 0x0)
        /usr/local/go1.14.4/src/runtime/mcentral.go:255 +0x79 fp=0x7fff2e91e5f0 sp=0x7fff2e91e5b0 pc=0x1017779
runtime.(*mcentral).cacheSpan(0x3af87f8, 0x0)
        /usr/local/go1.14.4/src/runtime/mcentral.go:106 +0x2bc fp=0x7fff2e91e638 sp=0x7fff2e91e5f0 pc=0x10172ac
runtime.(*mcache).refill(0x7f8695204108, 0x2a)
        /usr/local/go1.14.4/src/runtime/mcache.go:138 +0x85 fp=0x7fff2e91e658 sp=0x7fff2e91e638 pc=0x1016d95
runtime.(*mcache).nextFree(0x7f8695204108, 0x7f869520412a, 0xfffffffffffffff8, 0x7fff2e91e6e0, 0x7fff2e91e708)
        /usr/local/go1.14.4/src/runtime/malloc.go:868 +0x87 fp=0x7fff2e91e690 sp=0x7fff2e91e658 pc=0x100b9a7
runtime.mallocgc(0x180, 0x26a4580, 0x7fff2e91e701, 0x100cb42)
        /usr/local/go1.14.4/src/runtime/malloc.go:1036 +0x793 fp=0x7fff2e91e730 sp=0x7fff2e91e690 pc=0x100c2e3
runtime.newobject(0x26a4580, 0x4000)
        /usr/local/go1.14.4/src/runtime/malloc.go:1165 +0x38 fp=0x7fff2e91e760 sp=0x7fff2e91e730 pc=0x100c6d8
runtime.malg(0x8000, 0x3afcac0)
        /usr/local/go1.14.4/src/runtime/proc.go:3360 +0x31 fp=0x7fff2e91e7a0 sp=0x7fff2e91e760 pc=0x103f3a1
runtime.mpreinit(0x3acfc40)
        /usr/local/go1.14.4/src/runtime/os_linux.go:339 +0x2d fp=0x7fff2e91e7c0 sp=0x7fff2e91e7a0 pc=0x103119d
runtime.mcommoninit(0x3acfc40)
        /usr/local/go1.14.4/src/runtime/proc.go:630 +0x108 fp=0x7fff2e91e808 sp=0x7fff2e91e7c0 pc=0x1038898
runtime.schedinit()
        /usr/local/go1.14.4/src/runtime/proc.go:547 +0x95 fp=0x7fff2e91e860 sp=0x7fff2e91e808 pc=0x10384b5
runtime.rt0_go(0x7fff2e91e968, 0x1, 0x7fff2e91e968, 0x0, 0x7f869497ec05, 0x2000000000, 0x7fff2e91e968, 0x100000000, 0x1065850, 0x0, ...)
        /usr/local/go1.14.4/src/runtime/asm_amd64.s:214 +0x125 fp=0x7fff2e91e868 sp=0x7fff2e91e860 pc=0x1065985
@davecheney
Copy link
Contributor

@davecheney davecheney commented Jun 4, 2020

Thank you for confirming that the problem occurs with Go 1.14.

As an experiment, are you able to add a small amount of swap to the system, and verify if the problem continues to occur?

@jeremyshisong
Copy link
Author

@jeremyshisong jeremyshisong commented Jun 4, 2020

@davecheney After created about 100MB swap spaces, and set swappiness=0 to avoid swapping whenever possible,it can run successfully without oom.

Here is system memory usage:

free -m
              total        used        free      shared  buff/cache   available
Mem:           1868        1772          15          23          80          36
Swap:           102         102           0
@davecheney
Copy link
Contributor

@davecheney davecheney commented Jun 4, 2020

ping @mknyszek

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jun 4, 2020

As of Go 1.11, heap arenas are mapped in 64 MiB increments on linux/amd64 (golang.org/cl/85887). Looking at the stack trace, it looks there's a failure to map the first arena (as read-write). In theory this shouldn't happen with default settings on Linux because we don't immediately go and touch all of those 64 MiB, i.e we map it but we don't use it unless we actually need it. Given that it's the actual mmap call that's failing, it looks like the OS is just telling us we can't map that memory for whatever reason.

One reason this could be is overcommit rules, or virtual address space limits. Can you show me the output of

$ cat /proc/sys/vm/overcommit_memory
$ cat /proc/sys/vm/overcommit_ratio
$ ulimit -v

on your system? Also, what version of Linux are you running on?

A work-around here could be to compile for GOARCH=386 where we use 4 MiB arenas; you should be able to map at least 2 arenas before Linux thinks you're out of memory. It will likely mean slower code overall (bit intrinsics are missing for example, and they're used on a bunch of critical paths in the runtime, also 32-bit registers instead of 64-bit) but your code should work.

@odeke-em odeke-em changed the title go1.11 compiled elastic metricbeat result in oom when runing in sparse available memory system runtime: OOM when runing in sparse available memory system Jun 5, 2020
@jeremyshisong
Copy link
Author

@jeremyshisong jeremyshisong commented Jun 5, 2020

@mknyszek Thanks for your detailed explanation.

The outpout of system configs are as follows:

$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio
50
$ ulimit -v
unlimited

The version of Linux is 3.10.107

It looks like overcommit rules forbidded mapping the 64 MiB heap arenas.

As an experiment,when setting overcommit_memory to 1, It worked without OOM.

The work-around to compile for GOARCH=386 is also worked,as your explanation It will likely mean slower code overall,may be there is another better choice.

The 64 MiB heap arenas is hardcode or it changeable via a compiler flag?

@networkimprov
Copy link

@networkimprov networkimprov commented Jun 5, 2020

@gopherbot remove WaitingForInfo

@dmitshur dmitshur added this to the Backlog milestone Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.