-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: make the scavenger more prompt #16930
Comments
ping @aclements |
Not a simple change, I'm afraid. Moving to 1.9. |
Closely related: #14045 (comment) |
An interesting report of this problem in the field: https://medium.com/samsara-engineering/running-go-on-low-memory-devices-536e1ca2fe8f |
Change https://golang.org/cl/141937 mentions this issue: |
Change https://golang.org/cl/142959 mentions this issue: |
Change https://golang.org/cl/142960 mentions this issue: |
Change https://golang.org/cl/143157 mentions this issue: |
@mknyszek, should this issue be closed now? |
Yes, I think so. |
Currently, there's a lag of at least five minutes (and potentially much longer) between the Go heap size shrinking and the process' RSS shrinking. This is done to amortize the cost of releasing and re-acquiring system memory, but has several downsides:
I believe we should make the scavenger release memory promptly and aggressively.
I believe we can do this without significant overhead by improving the design of the scavenger. Currently the scavenger is careful to recycle spans that have been unused for more than five minutes. This is largely wasted effort because virtual memory is fungible: there's a slight TLB locality boost to retaining very recently used memory (on the order of a few megabytes), but beyond this it doesn't matter what unused pages we return to the OS. The inflexibility of the scavenger has several downsides. Primarily, it requires many system calls to release sparse unused regions of memory, and these system calls have a very high per-call cost because the OS needs to do remote TLB invalidation. This cost also grows with the number of CPUs. It can also needlessly delay freeing memory because coalescing two neighboring free spans takes the most recent "used" time of the two.
I propose separating the concerns of how many pages to release from which pages to release.
Which pages to release should be based on minimizing the number of
sysUnused
calls (madvise
on Linux). Roughly, releasing n pages should attempt to release an n page span and if no spans n pages or larger are on the free list, it should release as few smaller spans as possible. There are many algorithms that satisfy this outline, and I suspect the details don't matter. It would be reasonable to adopt the exact algorithm used by tcmalloc, since that has been field-tested. We should preferentially release spans from the end of the free lists, since those have been least recently used and have the least preference for being reused.How many pages to release is a harder question, but by separating these concerns we have the flexibility to choose a good answer. For example, the current policy can be expressed as retaining
max(HeapInUse over the past 5 minutes)
. The "right" answer depends on the costs of releasing and re-acquiring memory, the future size of the heap, and the relative CPU versus memory costs we're willing to incur. Because of the "sawtooth" behavior of GC, we do know something about the future size of the heap, at least: the heap will grow to the heap goal in the near future, and often that heap goal is in a rough steady state. Hence, as a simple heuristic, I propose we use the heap goal times a "steady state variance" factor, with some hysteresis to retain cost amortization in case garbage collections are happening rapidly:/cc @matloob @RLH
The text was updated successfully, but these errors were encountered: