runtime: high-percentile latency of memory allocations has regressed significantly #31678
Heavily-allocating and highly-parallel applications' memory allocation latencies regressed significantly in the Go 1.12 release (up to 3-4x in k8s' 99th percentile latency).
Much of the background and work done for this issue has been summarized in kubernetes/kubernetes#75833.
The offending commit with bisection was found to be 8e093e7, which introduced the notion of scavenging to make up for the RSS increase for scavenging from scavenged memory (a mouthful). This change landed to fix an issue for some users where, as a result of previous changes in Go 1.12, the allocator could outrun things like
However, these extra syscalls caused an unacceptable increase in the latency of memory allocations which took this slow path which had the heap lock acquired. In the case of an already heavily-allocating and highly-parallel application (like the k8s situation: hundreds of thousands of goroutines and 64 Ps), this can have serious consequences in terms latency because subsequent span allocations' would end up spending a lot of time trying to acquire the heap lock.
While the fundamental issue is more the fact that the span allocator is not very scalable (all span allocations are serialized via a single lock), this is much more difficult to fix. At the very least we should bring back the latency to Go 1.11 or better in these situations.
In investigating this further, we discovered that the fact that the allocator could outrun things like
The text was updated successfully, but these errors were encountered:
@gopherbot please backport open a backport to 1.12. I think it makes sense to fix this issue for Go 1.12 since it would at least let users safely move to Go 1.12 without having to skip a release due to performance issues.
We should do more in the future to ensure that performance regressions like this get discovered prior to release as opposed to after, perhaps through improved benchmarks.