-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: improve heap hugepage utilization #59960
Comments
Landed in https://go.dev/cl/436395. |
Change https://go.dev/cl/492018 mentions this issue: |
For golang/go#59960. Change-Id: I4c6ee87db54952ccacdfa6c66b419356e5842620 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/492018 Reviewed-by: Michael Pratt <mpratt@google.com>
Change https://go.dev/cl/531816 mentions this issue: |
Change https://go.dev/cl/532117 mentions this issue: |
This has caused performance issues in production environments. Disable it until further notice. Fixes #63334. Related to #61718 and #59960. Change-Id: If84c5a8685825d43c912a71418f2597e44e867e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/531816 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
After the previous CL, this is now all dead code. This change is separated out to make the previous one easy to backport. For #63334. Related to #61718 and #59960. Change-Id: I109673ed97c62c472bbe2717dfeeb5aa4fc883ea Reviewed-on: https://go-review.googlesource.com/c/go/+/532117 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change https://go.dev/cl/532255 mentions this issue: |
This has caused performance issues in production environments. MADV_COLLAPSE can go into direct reclaim, but we call it with the heap lock held. This means that the process could end up stalled fairly quickly if just one allocating goroutine ends up in the madvise call, at least until the madvise(MADV_COLLAPSE) call returns. A similar issue occurred with madvise(MADV_HUGEPAGE), because that could go into direct reclaim on any page fault for MADV_HUGEPAGE-marked memory. My understanding was that the calls to madvise(MADV_COLLAPSE) were fairly rare, and it's "best-effort" nature prevented it from going into direct reclaim often, but this was wrong. It tends to be fairly heavyweight even when it doesn't end up in direct reclaim, and it's almost certainly not worth it. Disable it until further notice and let the kernel fully dictate hugepage policy. The updated scavenger policy is still more hugepage friendly by delaying scavening until hugepages are no longer densely packed, so we don't lose all that much. The Sweet benchmarks show a minimal difference. A couple less realistic benchmarks seem to slow down a bit; they might just be getting unlucky with what the kernel decides to back with a huge page. Some benchmarks on the other hand improve. Overall, it's a wash. name old time/op new time/op delta BiogoIgor 13.1s ± 1% 13.2s ± 2% ~ (p=0.182 n=9+10) BiogoKrishna 12.0s ± 1% 12.1s ± 1% +1.23% (p=0.002 n=9+10) BleveIndexBatch100 4.51s ± 4% 4.56s ± 3% ~ (p=0.393 n=10+10) EtcdPut 20.2ms ± 4% 19.8ms ± 2% ~ (p=0.079 n=10+9) EtcdSTM 109ms ± 3% 111ms ± 3% +1.63% (p=0.035 n=10+10) GoBuildKubelet 31.2s ± 1% 31.3s ± 1% ~ (p=0.780 n=9+10) GoBuildKubeletLink 7.77s ± 0% 7.81s ± 2% ~ (p=0.237 n=8+10) GoBuildIstioctl 31.8s ± 1% 31.7s ± 0% ~ (p=0.136 n=9+9) GoBuildIstioctlLink 7.88s ± 1% 7.89s ± 1% ~ (p=0.720 n=9+10) GoBuildFrontend 11.7s ± 1% 11.8s ± 1% ~ (p=0.278 n=10+9) GoBuildFrontendLink 1.15s ± 4% 1.15s ± 5% ~ (p=0.387 n=9+9) GopherLuaKNucleotide 19.7s ± 1% 20.6s ± 0% +4.48% (p=0.000 n=10+10) MarkdownRenderXHTML 194ms ± 3% 196ms ± 3% ~ (p=0.356 n=9+10) Tile38QueryLoad 633µs ± 2% 629µs ± 2% ~ (p=0.075 n=10+10) name old average-RSS-bytes new average-RSS-bytes delta BiogoIgor 69.2MB ± 3% 68.4MB ± 1% ~ (p=0.190 n=10+10) BiogoKrishna 4.40GB ± 0% 4.40GB ± 0% ~ (p=0.605 n=9+9) BleveIndexBatch100 195MB ± 3% 195MB ± 2% ~ (p=0.853 n=10+10) EtcdPut 107MB ± 4% 108MB ± 3% ~ (p=0.190 n=10+10) EtcdSTM 91.6MB ± 5% 92.6MB ± 4% ~ (p=0.481 n=10+10) GoBuildKubelet 2.26GB ± 1% 2.28GB ± 1% +1.22% (p=0.000 n=10+10) GoBuildIstioctl 1.53GB ± 0% 1.53GB ± 0% +0.21% (p=0.017 n=9+10) GoBuildFrontend 556MB ± 1% 554MB ± 2% ~ (p=0.497 n=9+10) GopherLuaKNucleotide 39.0MB ± 3% 39.0MB ± 1% ~ (p=1.000 n=10+8) MarkdownRenderXHTML 21.2MB ± 2% 21.4MB ± 3% ~ (p=0.190 n=10+10) Tile38QueryLoad 5.99GB ± 2% 6.02GB ± 0% ~ (p=0.243 n=10+9) name old peak-RSS-bytes new peak-RSS-bytes delta BiogoIgor 90.2MB ± 4% 89.2MB ± 2% ~ (p=0.143 n=10+10) BiogoKrishna 4.49GB ± 0% 4.49GB ± 0% ~ (p=0.190 n=10+10) BleveIndexBatch100 283MB ± 8% 274MB ± 6% ~ (p=0.075 n=10+10) EtcdPut 147MB ± 4% 149MB ± 2% +1.55% (p=0.034 n=10+8) EtcdSTM 117MB ± 5% 117MB ± 4% ~ (p=0.905 n=9+10) GopherLuaKNucleotide 44.9MB ± 1% 44.6MB ± 1% ~ (p=0.083 n=8+8) MarkdownRenderXHTML 22.0MB ± 8% 22.1MB ± 9% ~ (p=0.436 n=10+10) Tile38QueryLoad 6.24GB ± 2% 6.29GB ± 2% ~ (p=0.218 n=10+10) name old peak-VM-bytes new peak-VM-bytes delta BiogoIgor 1.33GB ± 0% 1.33GB ± 0% ~ (p=0.504 n=10+9) BiogoKrishna 5.77GB ± 0% 5.77GB ± 0% ~ (p=1.000 n=10+9) BleveIndexBatch100 3.53GB ± 0% 3.53GB ± 0% ~ (p=0.642 n=10+10) EtcdPut 12.1GB ± 0% 12.1GB ± 0% ~ (p=0.564 n=10+10) EtcdSTM 12.1GB ± 0% 12.1GB ± 0% ~ (p=0.633 n=10+10) GopherLuaKNucleotide 1.26GB ± 0% 1.26GB ± 0% ~ (p=0.297 n=9+10) MarkdownRenderXHTML 1.26GB ± 0% 1.26GB ± 0% ~ (p=0.069 n=10+10) Tile38QueryLoad 7.47GB ± 2% 7.53GB ± 2% ~ (p=0.280 n=10+10) name old p50-latency-ns new p50-latency-ns delta EtcdPut 19.8M ± 5% 19.3M ± 3% -2.74% (p=0.043 n=10+9) EtcdSTM 81.4M ± 4% 83.4M ± 4% +2.46% (p=0.029 n=10+10) Tile38QueryLoad 241k ± 1% 240k ± 1% ~ (p=0.393 n=10+10) name old p90-latency-ns new p90-latency-ns delta EtcdPut 30.4M ± 5% 30.6M ± 5% ~ (p=0.971 n=10+10) EtcdSTM 222M ± 3% 226M ± 4% ~ (p=0.063 n=10+10) Tile38QueryLoad 687k ± 2% 691k ± 1% ~ (p=0.173 n=10+8) name old p99-latency-ns new p99-latency-ns delta EtcdPut 42.3M ±10% 41.4M ± 7% ~ (p=0.353 n=10+10) EtcdSTM 486M ± 7% 487M ± 4% ~ (p=0.579 n=10+10) Tile38QueryLoad 6.43M ± 2% 6.37M ± 3% ~ (p=0.280 n=10+10) name old ops/s new ops/s delta EtcdPut 48.6k ± 3% 49.5k ± 2% ~ (p=0.065 n=10+9) EtcdSTM 9.09k ± 2% 8.95k ± 3% -1.56% (p=0.045 n=10+10) Tile38QueryLoad 28.4k ± 1% 28.6k ± 1% +0.87% (p=0.016 n=9+10) Fixes #63335. For #63334. Related to #61718 and #59960. Change-Id: If84c5a8685825d43c912a71418f2597e44e867e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/531816 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> (cherry picked from commit 595deec) Reviewed-on: https://go-review.googlesource.com/c/go/+/532255 Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Background
Transparent huge pages (THP) admin guide.
Go scavenging policy. (Implementation details are out-of-date, but linked policy is relevant.)
Motivation
Currently, Go's hugepage-related policies do not play well together and have bit-rotted. The result is that the memory regions the Go runtime chooses to mark as
MADV_NOHUGEPAGE
andMADV_HUGEPAGE
are somewhat haphazard, resulting in memory overuse for small heaps. The memory overuse is upwards of 40% memory overhead in some cases. Turning off huge pages entirely fixes the problem, but leaves CPU performance on the table. This policy also means large heaps might have dense sections that are erroneously mapped asMADV_NOHUGEPAGE
, costing up to 1% throughput.The goal of this work is to eliminate this overhead for small heaps while improving huge page utilization for large heaps.
Design
Full design doc soon. (CL)
TL;DR: Determine which parts of the heap should be backed by huge pages based on near-future density of page allocator chunks. This density heuristic is then also used to delay the background scavenger: only chunks that have been sparse for at least one GC cycle are eligible for scavenging. The scavenger also breaks up hugepages.
The text was updated successfully, but these errors were encountered: