Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: high startup address space usage (RLIMIT_AS) on Linux AMD64 #38010

Open
pkramme opened this issue Mar 22, 2020 · 16 comments
Open

runtime: high startup address space usage (RLIMIT_AS) on Linux AMD64 #38010

pkramme opened this issue Mar 22, 2020 · 16 comments

Comments

@pkramme
Copy link

@pkramme pkramme commented Mar 22, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/vorvvbgc/.cache/go-build"
GOENV="/home/vorvvbgc/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/vorvvbgc/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/vorvvbgc/go1.14/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/vorvvbgc/go1.14/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build969828813=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am trying to get an FastCGI server running behind an Apache2 webserver on a shared hosting system using the net/http/fcgi library. The webserver is limiting my software to 512MB memory.

This is the code: https://play.golang.org/p/Z-Gc6icOpw5

What did you expect to see?

I expect to see "This was generated by Go running as a FastCGI app" on the website generated by the FastCGI server.

What did you see instead?

I have modified the sysReserve() function in the runtime to include println() to print out the error code from mmap() and the requested memory size. This is a diff of src/runtime/mem_linux.go and my version:

157a158,159
>       println(err)
>       println(n)

I kept the output in the following output in the hopes that it might be useful.

The application crashes with this trace:

0
131072
0
1048576
0
8388608
0
67108864
12
536870912
fatal error: failed to reserve page summary memory

runtime stack:
runtime.throw(0x6f3456, 0x25)
        /home/vorvvbgc/go1.14/go/src/runtime/panic.go:1112 +0x72 fp=0x7ffc17e5b170 sp=0x7ffc17e5b140 pc=0x433a12
runtime.(*pageAlloc).sysInit(0x939428)
        /home/vorvvbgc/go1.14/go/src/runtime/mpagealloc_64bit.go:80 +0x13f fp=0x7ffc17e5b1e8 sp=0x7ffc17e5b170 pc=0x42ac1f
runtime.(*pageAlloc).init(0x939428, 0x939420, 0x94db38)
        /home/vorvvbgc/go1.14/go/src/runtime/mpagealloc.go:297 +0x75 fp=0x7ffc17e5b210 sp=0x7ffc17e5b1e8 pc=0x4288b5
runtime.(*mheap).init(0x939420)
        /home/vorvvbgc/go1.14/go/src/runtime/mheap.go:694 +0x274 fp=0x7ffc17e5b238 sp=0x7ffc17e5b210 pc=0x425ad4
runtime.mallocinit()
        /home/vorvvbgc/go1.14/go/src/runtime/malloc.go:470 +0xff fp=0x7ffc17e5b268 sp=0x7ffc17e5b238 pc=0x40c41f
runtime.schedinit()
        /home/vorvvbgc/go1.14/go/src/runtime/proc.go:545 +0x60 fp=0x7ffc17e5b2c0 sp=0x7ffc17e5b268 pc=0x437100
runtime.rt0_go(0x7ffc17e5b2f8, 0x1, 0x7ffc17e5b2f8, 0x0, 0x7fd2029790ca, 0x1, 0x7ffc17e5cbb6, 0x0, 0x7ffc17e5cbc4, 0x7ffc17e5cbe6, ...)
        /home/vorvvbgc/go1.14/go/src/runtime/asm_amd64.s:214 +0x125 fp=0x7ffc17e5b2c8 sp=0x7ffc17e5b2c0 pc=0x460655

The application works fine with golang 1.13.9.

I have no idea how to debug this further.

@andybons
Copy link
Member

@andybons andybons commented Mar 23, 2020

@andybons andybons changed the title High startup memory allocation on Linux AMD64 runtime: high startup memory allocation on Linux AMD64 Mar 23, 2020
@andybons andybons added this to the Unplanned milestone Mar 23, 2020
@ivzhh
Copy link

@ivzhh ivzhh commented Mar 28, 2020

@pkramme Hi, would you mind to post your Apache setup for this too? I did not reproduce this on a fresh Apache. Maybe it is due to my configuration.

@alexzorin
Copy link

@alexzorin alexzorin commented Mar 29, 2020

@pkramme is this shared hosting environment cPanel by any chance?

We also started getting reports of this same panic with our Go application, which exposes itself as a .live.cgi FastCGI net/http/cgi server integrating with cPanel's LiveAPI, as soon as we upgraded to 1.14.

Going to downgrade to 1.13.9 for now.

@pkramme
Copy link
Author

@pkramme pkramme commented Apr 2, 2020

@aleksator No, it is not, it is a custom build setup. @ivzhh I'm not able to share the config, as it is proprietary.

Theoretically, if we execute any code with 512MB memory limitation, the problem should become visible. I will try to produce something not based on fcgi as a reproducer, so that no apache2 setup is necessary.

@aleksator
Copy link
Contributor

@aleksator aleksator commented Apr 2, 2020

Tagging a proper person here: @alexzorin

@alexzorin
Copy link

@alexzorin alexzorin commented Apr 4, 2020

I think what @pkramme suggested about the 512MB memory limit is correct - specifically RLIMIT_AS.

"Back in the day" (EL5-ish era), shared web hosting admins did not have access to the RSS cgroups controller (because of EL5's ancient kernel), and so controlling VSZ limits was the only choice available to them. In the long term, this has resulted in a lot of misguided admins keeping these VSZ limits around for no good reason.

Anyway, the Apache-based reproducer is straightforward. (For some reason, a simple Go hello world wrapped in a bash ulimit -v didn't repro for me, not sure why).

  1. Compile a very simple net/http/cgi binary using Go 1.14.1 and stick it in Apache httpd 2.4's cgi-bin/:
package main

import (
	"net/http/cgi"
)

func main() {
	if err := cgi.Serve(nil); err != nil {
		panic(err)
	}
}
go build -o /var/www/html/cgi-bin/reproducer.cgi reproducer.go
  1. Configure Apache with a 512MB RLimitMEM and restart Apache (note, don't try this in Docker or LXC-like environments, setrlimit will just fail and the repro won't work):
RLimitMEM 536870912
apachectl -k restart
  1. Access http://localhost/cgi-bin/reproducer.cgi. It will produce an HTTP 500, and in Apache's error_log, you will see the panic stack from the original report.

I would prefer not to ask our customers to remove the rlimit (or else we'll be stuck shipping with Go 1.13 for all eternity).

Would it be practical for the Go runtime to try work within whatever it sees by getrlimit?

@mashedkeyboard
Copy link

@mashedkeyboard mashedkeyboard commented May 24, 2020

Adding that I'm also seeing this issue in a different memory-limited environment with a 512mb limit (a Grid Engine setup). Raising the memory limit to 950mb fixes the issue, but it's unclear to me why it should ever be an issue anyway - the program does not use that much memory during running.

@sbinet
Copy link
Member

@sbinet sbinet commented Jun 2, 2020

apologies for the "me too" post, but this has also prevented to migrate a little Go-based "script" of one of my colleagues at CERN from Go-1.13.x to the latest Go-1.14.x.

@pkramme
Copy link
Author

@pkramme pkramme commented Jun 11, 2020

Well, after reinvestigating this issue I stumbled over the proposal for the new page allocator which was introduced in go1.14: https://github.com/golang/proposal/blob/master/design/35112-scaling-the-page-allocator.md

There are only two known adverse effects of this large mapping on Linux:

  1. ulimit -v, which restricts even PROT_NONE mappings.
  2. Programs like top, when they report virtual memory footprint, include PROT_NONE mappings.

In the grand scheme of things, these are relatively minor consequences. The former is not used often, and in cases where it is, it's used as an inaccurate proxy for limiting a process's physical memory use. The latter is mostly cosmetic, though perhaps some monitoring system uses it as a proxy for memory use, and will likely result in some harmless questions.

So, this explains it. @aclements Is there a workaround for cases like this?

@networkimprov
Copy link

@networkimprov networkimprov commented Jun 11, 2020

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jun 11, 2020

As @pkramme points out, we were aware of this issue when the changes to the page allocator were proposed. As @alexzorin points out, ulimit -v is an out-dated mechanism for limiting memory use.

I would prefer not to ask our customers to remove the rlimit (or else we'll be stuck shipping with Go 1.13 for all eternity).

Would it be practical for the Go runtime to try work within whatever it sees by getrlimit?

The short answer is no. The virtual memory mappings made to support structures in the page allocator significantly simplified the improvements made in the 1.14 release. Earlier on in the release cycle the amount of memory mapped was much larger which caused problems on certain platforms where the default ulimit -v value for default users was fairly low, so out-of-the-box Go programs would not work on an out-of-the-box system without having additional privileges (see #35568). This is generally not true on Linux where ulimit -v is unlimited by default (at least for the versions I'm aware of). We took steps to reduce the size of these mappings at the cost of additional complexity and a small performance regression. We experimented a little with additional mitigations but concluded they weren't practical.

@sbinet @mashedkeyboard @alexzorin @pkramme:

In order to understand your situations better, could you elaborate on the reasons why your and/or your customers cannot set RLIMIT_AS/ulimit -v to unlimited, or an otherwise sufficiently high number for your Go programs?

As a side note, (and to be totally clear, I'm not recommending this as an official workaround) compiling your code with GOARCH=386 should allow your code to run on amd64 platforms with a low ulimit -v, since the memory mapping we make is proportional to the size of the address space and the address space is much smaller on 386. I recognize that this has its issues, and is not generally a feasible alternative. The most notable issues that come to mind are that your code might run slower (due to 32-bit registers and a lack of certain intrinsics) or some libraries you code depends on might not support 32-bit platforms (I'm not sure how common it is for libraries to support amd64 but not 386, but it is possible).

@alexzorin
Copy link

@alexzorin alexzorin commented Jun 11, 2020

could you elaborate on the reasons why your and/or your customers cannot set RLIMIT_AS/ulimit -v to unlimited, or an otherwise sufficiently high number for your Go programs?

This is our plan. It's going to be a challenge for XX,000 hosts between X,000 customers, so we are first planning to add telemetry to our 1.13 builds to see how many systems run the CGI under restricted virtual memory.

@pkramme
Copy link
Author

@pkramme pkramme commented Jun 11, 2020

@sbinet
Copy link
Member

@sbinet sbinet commented Jun 12, 2020

We don't have much lever on how to configure the CGI environment.
and CERN-IT is a bit conservative w/ changing the configuration of services they provide for their physicists (who are sometimes a bit "cavalier" with how they setup their things.)

nonetheless, I've sent a ticket on raising the RLIMIT_AS.
I've also passed on to my colleague the 32b workaround.

we'll see.

(anyways, it's not a high profile CGI service, we won't miss supersymetry nor mini-blackholes, or loose the beam if we're stuck w/ Go-1.13.x b/c of that. :P)

@pkramme
Copy link
Author

@pkramme pkramme commented Sep 6, 2020

@mknyszek Is there any progress on this on your end?

@prattmic prattmic changed the title runtime: high startup memory allocation on Linux AMD64 runtime: high startup address space usage (RLIMIT_AS) on Linux AMD64 Sep 10, 2020
@alexzorin
Copy link

@alexzorin alexzorin commented Sep 14, 2020

we are first planning to add telemetry to our 1.13 builds to see how many systems run the CGI under restricted virtual memory.

To put a conclusion on this from my end, we gathered some RLIMIT_AS stats and the number of affected users is around 0.5%. The majority have a limit of 4096MB set on Apache, which is the vendor default on this platform.

As long as Go continues to work within that limit, we're happy to live with it and will ask those other users to adapt. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.