Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: deleted timer was not cleaned up in time, causing the context chain saved in timerCtx to not be GCed in time #60144

Closed
578559967 opened this issue May 12, 2023 · 8 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge

Comments

@578559967
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.20.4 darwin/arm64

Does this issue reproduce with the latest release?

Yes.
It can also reproduce in 1.19.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/r/Library/Caches/go-build"
GOENV="/Users/r/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOMODCACHE="/Users/r/go/pkg/mod"
GOOS="darwin"
GOPATH="/Users/r/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/robert/sdk/go1.20.4"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/r/sdk/go1.20.4/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.4"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/j_/rq9ph2cd3h50w468sgv34lwh0000gn/T/go-build3730981437=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.20.4 darwin/arm64
GOROOT/bin/go tool compile -V: compile version go1.20.4
uname -v: Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000
ProductName:		macOS
ProductVersion:		13.2.1
BuildVersion:		22D68
lldb --version: lldb-1403.0.17.64
Apple Swift version 5.8 (swiftlang-5.8.0.124.2 clang-1403.0.22.11.100)

What did you do?

test4.go:

package main

import (
	"context"
	"flag"
	"fmt"
	"runtime"
	"time"
)

type A [100]int

func run(i int) func() {
	c := context.TODO()
	a := &A{}
	runtime.SetFinalizer(a, func(a *A) {
		fmt.Println("Free ", i)
	})
	c = context.WithValue(c, "a", a)
	c, cancel := context.WithTimeout(c, time.Second*time.Duration(i))
	return cancel
}

func main() {
	count := flag.Int("c", 1, "Count")
	flag.Parse()

	fmt.Println("Count: ", *count)

	for i := 0; i < *count; i++ {
		run(6)
	}
	run(3)()  // Cancel the context
	begin := time.Now()
	for i := 0; i < 10; i++ {
		runtime.GC()
		time.Sleep(time.Second)
		fmt.Printf("After %.2f seconds\n", time.Since(begin).Seconds())
	}
}

go1.20.4 run test4.go -c 2
go1.20.4 run test4.go -c 4

What did you expect to see?

$ go1.20.4 run test4.go -c 2
Count:  2
Free  3
After 1.00 seconds
After 2.00 seconds
After 3.01 seconds
After 4.01 seconds
After 5.01 seconds
After 6.01 seconds
Free  6
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

$ go1.20.4 run test4.go -c 4
Count:  4
Free  3
After 1.00 seconds
After 2.00 seconds
After 3.00 seconds
After 4.01 seconds
After 5.01 seconds
After 6.01 seconds
Free  6
Free  6
Free  6
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

What did you see instead?

$ go1.20.4 run test4.go -c 2
Count:  2
Free  3
After 1.00 seconds
After 2.00 seconds
After 3.01 seconds
After 4.01 seconds
After 5.01 seconds
After 6.01 seconds
Free  6
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

$ go1.20.4 run test4.go -c 4
Count:  4
After 1.00 seconds
After 2.00 seconds
After 3.00 seconds
Free  3
After 4.01 seconds
After 5.01 seconds
After 6.01 seconds
Free  6
Free  6
Free  6
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

Expect to see Free 3 in the first round of GC. But when -c 4, it appeared after 3 seconds.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 12, 2023
@578559967
Copy link
Author

time.AfterFunc also has the same problem.

package main

import (
	"context"
	"flag"
	"fmt"
	"runtime"
	"time"
)

type A [100]int

func run(i int) func() {
	c := context.TODO()
	a := &A{}
	runtime.SetFinalizer(a, func(a *A) {
		fmt.Println("Free ", i)
	})
	c = context.WithValue(c, "a", a)
	c, cancel := context.WithTimeout(c, time.Second*time.Duration(i))
	return cancel
}

func run2(i int) *time.Timer {
	a := &A{}
	runtime.SetFinalizer(a, func(a *A) {
		fmt.Println("Free ", i)
	})
	return time.AfterFunc(time.Duration(i)*time.Second, func() {
		fmt.Println(i, a[3])
	})
}

func main() {
	count := flag.Int("c", 1, "Count")
	flag.Parse()

	fmt.Println("Count: ", *count)

	for i := 0; i < *count; i++ {
		// run(6)
		run2(6)
	}
	// run(3)()
	run2(3).Stop()
	begin := time.Now()
	for i := 0; i < 10; i++ {
		runtime.GC()
		time.Sleep(time.Second)
		fmt.Printf("After %.2f seconds\n", time.Since(begin).Seconds())
	}
}
$ go1.20.4 run test4.go -c 1
Count:  1
Free  3
After 1.00 seconds
After 2.00 seconds
After 3.00 seconds
After 4.01 seconds
After 5.01 seconds
6 0
After 6.01 seconds
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

$ go1.20.4 run test4.go -c 5
Count:  5
After 1.00 seconds
After 2.00 seconds
After 3.00 seconds
Free  3
After 4.01 seconds
After 5.01 seconds
6 0
6 0
6 0
6 0
6 0
After 6.01 seconds
Free  6
Free  6
Free  6
Free  6
Free  6
After 7.01 seconds
After 8.01 seconds
After 9.01 seconds
After 10.01 seconds

@cherrymui cherrymui changed the title runtime: deleted timer was not cleaned up in time, causing the context chain saved in timerCtx to not be GCed in time. runtime: deleted timer was not cleaned up in time, causing the context chain saved in timerCtx to not be GCed in time May 12, 2023
@ianlancetaylor
Copy link
Contributor

There are absolutely no guarantees as to when finalizers are run. They may never be run. Quoting https://pkg.go.dev/runtime#SetFinalizer

The finalizer is scheduled to run at some arbitrary time after the program can no longer reach the object to which obj points. There is no guarantee that finalizers will run before a program exits, so typically they are useful only for releasing non-memory resources associated with an object during a long-running program.

@cherrymui
Copy link
Member

As @ianlancetaylor noted, the finalizer doesn't require to run at a specific time. If this causes issues for real production programs, please add more detail. Closing for now. Thanks.

@cherrymui cherrymui closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2023
@578559967
Copy link
Author

There are absolutely no guarantees as to when finalizers are run. They may never be run. Quoting https://pkg.go.dev/runtime#SetFinalizer

The finalizer is scheduled to run at some arbitrary time after the program can no longer reach the object to which obj points. There is no guarantee that finalizers will run before a program exits, so typically they are useful only for releasing non-memory resources associated with an object during a long-running program.

The focus of this issue is not on finalizers, they are only used to demonstrate when "a" should be garbage collected. By setting GODEBUG=allocfreetrace=1, we can also see that "a" is not being garbage collected in time.
The main point of this issue is that the timer cleanup is too slow, causing the Context to not be garbage collected in time. This problem has indeed caused OOM in our product.
Our product is an online service, and for each request to the service, there is a context chain that contains some contexts that store a large amount of data, including a sub-context created by context.WithTimeout. Usually, this context is canceled quickly and does not wait for a timeout. The entire request will also return quickly. However, we found that the memory was not being garbage collected in a timely manner, even if we actively called GC. Eventually, we identified that the timer was not being cleaned up in a timely manner, causing timer.arg to store context.WithDeadline.func2, which contains timerCtx, and then all the way to the entire context chain, making it impossible for the kv stored in the context to be recycled. This is easily causing OOM under high QPS.

@ianlancetaylor
Copy link
Contributor

It's kind of the same problem, though. Garbage collection is not guaranteed to happen on any particular time scale.

I'm not sure we can fix this. The timer code is tied into the scheduler. One aspect of this is that timers are not released from the queue when they are stopped. Instead they are simply marked as deleted, and a later scheduling pass will remove them. So the timer will remain pointing to the function in the queue. We can't simply clear the function in the Stop method, because code is permitted to call Stop and then Reset, and have the function be called at that time.

Actually, though, I think I do see a way, though I'm not sure whether it is a good idea. The context code knows that it won't call the Reset method, so it can clear the timer.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/495080 mentions this issue: context: clear timer after stopping it

@golang golang deleted a comment from jiangbajin May 16, 2023
@ianlancetaylor
Copy link
Contributor

OK, that change to the context package doesn't work. We could perhaps add a Timer.Clear method. That would require a proposal. See https://go.dev/s/proposal.

@578559967
Copy link
Author

OK, that change to the context package doesn't work. We could perhaps add a Timer.Clear method. That would require a proposal. See https://go.dev/s/proposal.

Thanks @ianlancetaylor . I think Timer.Clear should just clear Timer.r.f and Timer.r.arg. I have tested it and it works fine.

@golang golang locked and limited conversation to collaborators May 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

4 participants