Skip to content

runtime: excessive memory use between 1.21.0 -> 1.21.1 due to hugepages and the linux/amd64 max_ptes_none default of 512 #64332

@LeGEC

Description

@LeGEC

What version of Go are you using (go version)?

$ go version
go version go1.21.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/builder/.cache/go-build'
GOENV='/home/builder/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/builder/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/builder/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='0'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3335037910=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Our production service was recently shut down by the OOM, which led us to inspect its memory usage in details.

We discovered that, when running, our process had a memory consumption (RSS) that grew over time and never shrank.

sample runtime.MemStats report :

	"Alloc": 830731560,
	"TotalAlloc": 341870177656,
	"Sys": 8023999048,
	"Lookups": 0,
	"Mallocs": 3129044622,
	"Frees": 3124956536,
	"HeapAlloc": 830731560,
	"HeapSys": 7836532736,
	"HeapIdle": 6916292608,
	"HeapInuse": 920240128,
	"HeapReleased": 6703923200,
	"HeapObjects": 4088086,
	"StackInuse": 15204352,
	"StackSys": 15204352,
	"MSpanInuse": 8563968,
	"MSpanSys": 17338944,
	"MCacheInuse": 4800,
	"MCacheSys": 15600,
	"BuckHashSys": 5794138,
	"GCSys": 146092920,
	"OtherSys": 3020358,
	"NextGC": 1046754240,
	"LastGC": 1700579048506142728,
	"PauseTotalNs": 108783964,

at that time, the reported RSS for our process was 3,19Gb.

We looked at our history in more details, and observed that we had a big change in production behavior when we upgraded our go version from 1.19.5 to 1.20.0 - we unfortunately didn't notice the issue at that time, because we upgrade (and restart) our service on a regular basis.

To confirm this theory, we have downgraded our go version back to 1.19.13, and our memory consumption is now small and stable again.

Here is a graph of the RSS of our service over the last 48h, the drop corresponds to our new deployment with go 1.19.13 :

image

It should be noted that our production kernel is a hardened kernel based on grsecurity 5.15.28, which may be related to this issue (randomized heap addresses ?)

What did you expect to see?

A constant and stable memory usage.

What did you see instead?

The go runtime seems to not release memory back to the system.


Unfortunately, we have only been able to observe this issue on our production system, in production conditions.

We were not yet able to reproduce the issue on other systems, or by running isolated features in test programs deployed on our production infrastructure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    WaitingForInfoIssue is not actionable because of missing required information, which needs to be provided.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions