Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: scavenger freezes up in Go 1.14 in Windows due to coarse time granularity #38617

Closed
kingreatwill opened this issue Apr 23, 2020 · 9 comments
Assignees
Milestone

Comments

@kingreatwill
Copy link

@kingreatwill kingreatwill commented Apr 23, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.1 windows/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\35084\AppData\Local\go-build
set GOENV=C:\Users\35084\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=*.lingcb.net
set GONOSUMDB=*.lingcb.net
set GOOS=windows
set GOPATH=D:\go
set GOPRIVATE=*.lingcb.net
set GOPROXY=https://goproxy.cn,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\35084\AppData\Local\Temp\go-build075262326=/tmp/go-build -gno-record-gcc-switches
GOROOT/bin/go version: go version go1.14.1 windows/amd64
GOROOT/bin/go tool compile -V: compile version go1.14.1
gdb --version: GNU gdb (GDB) 8.2.1

What did you do?

Used memory cannot be freed.

image

package main

import (
	"fmt"
	"runtime"
	"runtime/debug"
	"time"
)

func main() {

	var a []int
	for i := 0; i < 2000; i++ {
		a = make([]int, i*100000)
	}
	a = nil
	fmt.Printf("%v\n", a)

	go func() {
		for {
			m := runtime.MemStats{}
			d := debug.GCStats{}
			runtime.ReadMemStats(&m)
			debug.ReadGCStats(&d)
			fmt.Printf("%v\t%v\t%v\n", d.NumGC, m.HeapIdle/1024/1024, m.HeapInuse/1024/1024)
			time.Sleep(time.Second)
		}
	}()
	for {
		runtime.GC()
		time.Sleep(time.Second)
	}
}
@andybons
Copy link
Member

@andybons andybons commented Apr 23, 2020

Thanks for the issue.

This is expected on linux and RSS will drop if the system is under memory pressure. But this is Windows. Is the same behavior expected @aclements @mknyszek ?

@andybons andybons changed the title Why does not return memory to the system? runtime: why does not return memory to the system? Apr 23, 2020
@andybons andybons changed the title runtime: why does not return memory to the system? runtime: memory not being returned to the system in Windows Apr 23, 2020
@andybons andybons added this to the Unplanned milestone Apr 23, 2020
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 23, 2020

On Windows it should be eager.

Firstly I'll note that what the reproducer is printing doesn't subtract out HeapReleased from HeapIdle, so it's not going to give an accurate picture of what the runtime thinks it returned to the OS.

The fact that Windows' Task Manager is reporting that high of an RSS is concerning, considering that Windows reports physical memory used. I ran this on a Windows VM and HeapReleased didn't increase on Windows like it does on Linux.

Puzzled, I started printing some diagnostics, and noticed that scavenge pacing math was producing a bunch of NaN values... the problem is that the pacing is based on a measurement of how long it takes the scavenge, and on Windows the time granularity is really coarse (like 15ms or something). So the time measured for the scavenge pacing was exactly 0, because it happened within that time granule. The scavenge pacing sees some NaN values, and basically sleeps forever.

So, this is a real bug. I'm not sure what the fix is, and it may be worth back-porting to 1.14? There is a workaround, which is to call debug.FreeOSMemory occasionally.

@mknyszek mknyszek self-assigned this Apr 23, 2020
@mknyszek mknyszek modified the milestones: Unplanned, Go1.15 Apr 23, 2020
@mknyszek mknyszek changed the title runtime: memory not being returned to the system in Windows runtime: scavenger freezes up in Go 1.14 in Windows due to coarse time granularity Apr 23, 2020
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 23, 2020

As @aclements informed me it looks like the theoretical granularity should be 100ns, which would be enough, but the time is only updated on every OS tick.

Experimentally, I found in my Windows VM that the best granularity I could get (spin until nanotime()-start != 0 in the scavenger) was around 0.95ms, which is definitely not good enough.

I think we need a fallback here to the Go 1.13 pacing which didn't rely on measuring time.

CC @aclements

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 23, 2020

@aclements also just informed me that Go sets Windows' interrupt frequency to 1ms, so there we go.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 24, 2020

Change https://golang.org/cl/229997 mentions this issue: runtime: make the scavenger's pacing logic more defensive

@gopherbot gopherbot closed this in c791537 Apr 30, 2020
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 30, 2020

I've confirmed that the change which closed this issue fixed the scavenger not kicking in.

However, because of a lack of feedback, the Go runtime has to be conservative, which means about a 3-4 MiB/s release rate back to the OS. Perhaps we should think about being more aggressive about scavenging when the application goes idle, though it's hard because we don't know when the application will wake back up again to do work.

@networkimprov
Copy link

@networkimprov networkimprov commented May 4, 2020

@gopherbot please backport to 1.14; this is a regression.

@gopherbot
Copy link

@gopherbot gopherbot commented May 4, 2020

Backport issue(s) opened: #38856 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

@gopherbot gopherbot commented May 7, 2020

Change https://golang.org/cl/232743 mentions this issue: [release-branch.go1.14] runtime: make the scavenger's pacing logic more defensive

gopherbot pushed a commit that referenced this issue May 7, 2020
…re defensive

This change adds two bits of logic to the scavenger's pacing. Firstly,
it checks to make sure we scavenged at least one physical page, if we
released a non-zero amount of memory. If we try to release less than one
physical page, most systems will release the whole page, which could
lead to memory corruption down the road, and this is a signal we're in
this situation.

Secondly, the scavenger's pacing logic now checks to see if the time a
scavenging operation takes is measured to be exactly zero or negative.
The exact zero case can happen if time update granularity is too large
to effectively capture the time the scavenging operation took, like on
Windows where the OS timer frequency is generally 1ms. The negative case
should not happen, but we're being defensive (against kernel bugs, bugs
in the runtime, etc.). If either of these cases happen, we fall back to
Go 1.13 behavior: assume the scavenge operation took around 10µs per
physical page. We ignore huge pages in this case because we're in
unknown territory, so we choose to be conservative about pacing (huge
pages could only increase the rate of scavenging).

Currently, the scavenger is broken on Windows because the granularity of
time measurement is around 1 ms, which is too coarse to measure how fast
we're scavenging, so we often end up with a scavenging time of zero,
followed by NaNs and garbage values in the pacing logic, which usually
leads to the scavenger sleeping forever.

For #38617.
Fixes #38856.

Change-Id: Iaaa2a4cbb21338e1258d010f7362ed58b7db1af7
Reviewed-on: https://go-review.googlesource.com/c/go/+/229997
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
(cherry picked from commit c791537)
Reviewed-on: https://go-review.googlesource.com/c/go/+/232743
xujianhai666 added a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
This change adds two bits of logic to the scavenger's pacing. Firstly,
it checks to make sure we scavenged at least one physical page, if we
released a non-zero amount of memory. If we try to release less than one
physical page, most systems will release the whole page, which could
lead to memory corruption down the road, and this is a signal we're in
this situation.

Secondly, the scavenger's pacing logic now checks to see if the time a
scavenging operation takes is measured to be exactly zero or negative.
The exact zero case can happen if time update granularity is too large
to effectively capture the time the scavenging operation took, like on
Windows where the OS timer frequency is generally 1ms. The negative case
should not happen, but we're being defensive (against kernel bugs, bugs
in the runtime, etc.). If either of these cases happen, we fall back to
Go 1.13 behavior: assume the scavenge operation took around 10µs per
physical page. We ignore huge pages in this case because we're in
unknown territory, so we choose to be conservative about pacing (huge
pages could only increase the rate of scavenging).

Currently, the scavenger is broken on Windows because the granularity of
time measurement is around 1 ms, which is too coarse to measure how fast
we're scavenging, so we often end up with a scavenging time of zero,
followed by NaNs and garbage values in the pacing logic, which usually
leads to the scavenger sleeping forever.

Fixes golang#38617.

Change-Id: Iaaa2a4cbb21338e1258d010f7362ed58b7db1af7
Reviewed-on: https://go-review.googlesource.com/c/go/+/229997
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.