runtime: crashes with "runtime: traceback stuck" #62086
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
WaitingForInfo
Issue is not actionable because of missing required information, which needs to be provided.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
We can't use 1.20.6 because of #61431
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Our production environments runs tens of thousands of containers written in Go, most of them running only for a few minutes or hours. About once every day one of them crashes with
runtime: traceback stuck
. It is not always the same service, and has been happening for months and across multiple Go versions, going back to at least Go 1.18. We are not sure exactly when it started.We did saw a common pattern, where the stack trace is always of the routine running our internal MemoryMonitor. It is a small library that runs in all our services, samples the cgroup memory parameters every second from procfs, and logs all the running operation if we use 90% of available memory. When we turn off this functionality the problem disappear, so we know it is related. All the containers that crashed didn't reach this limit during their run, so only the sampling occurred.
Another thing to note that we always see in the dump is an active runtime profiling running by the DataDog agent we are integrating with. It is running every 30 seconds and takes a CPU and memory profile using the standard pprof library. We are not sure if this is related.
What did you expect to see?
No crashes.
What did you see instead?
The text was updated successfully, but these errors were encountered: