Description
I have a process that experiences significant STW times during the mark termination phase of garbage collection. When it has around 100,000 goroutines and a live heap of around 500MB, its STW pauses take tens of milliseconds. The process runs on linux/amd64 with GOMAXPROCS=18, with a recent version of Go tip (30b9663).
I profiled its CPU usage with perf_events, looking specifically at samples with runtime.gcMark on the stack. 7.5% of samples include runtime.gcMark -> runtime.freeStackSpans (non-parallelized work, discussed in #11465). 90% of samples include runtime.gcMark -> runtime.parfordo -> runtime.markroot, and 75% of samples include runtime.gcMark -> runtime.parfordo -> runtime.markroot -> runtime.shrinkstack.
The work for stack shrinking is split 18 ways in my case, but at 75% of CPU cycles it's still a huge contributor to mark termination pauses, and the pauses my program sees are well beyond the 10ms goal with only a modest heap size.
Does stack shrinking need to take place while the world is stopped?
$ go version
go version devel +30b9663 Tue Oct 13 21:06:58 2015 +0000 linux/amd64