runtime: ill-fated GC outcome in spiky/bursty scenarios #42805
The Go GC machinery leaves the door open to several ill-fated scenarios, that have been reported in other issues. The one closest to what I'm describing here (and providing a reproduction for) is #10064.
Imagine the Go program has 64GiB available memory. The last garbage collection resulted in an live set of 54GiB. With the default GOGC=100 value, the pacer will schedule a collection once another 54GiB have been allocated, or in the next 2 minutes, whichever happens first.
If there's a rapid spike of heap allocation due to program mode change (e.g. a database compaction, which is how my team discovered this), such that it amounts to more than 10GiB, an OOM panic will occur.
And that appears to be reasonable behaviour, if those 10GiB are effectively retained / reachable / in scope.
However, what's not reasonable is that the same will occur even if 9.90GiB of those 10GiB have been released / become unreachable. For example, the program underwent 99 iterations of this logic, in under 2 minutes from the last GC:
The next iteration (100th) will cause the go runtime to expand the heap beyond its available memory, and that will cause an OOM panic. Instead, I would've expected the runtime to detect the impending OOM, and instead choose to run a forced GC.
The above scenario is greatly simplifying things, of course.
I built a reproduction harness here: https://github.com/raulk/trampoline/.
This program creates a cgroup and enforces the memory limit indicated by the -limit parameter (default: 32MiB). The cgroup's swap memory value is set to the same value, to prevent the program from using any swap. (IMPORTANT: make sure the right cgroup options are enabled to enforce this caging; check README for more info).
The program will then allocate a byte slice of size 90% of the configured limit (+ slice overhead). This will simulate a spike in heap usage, and will very likely induce GC at around 30MiB (with the default limit value).
Of course, the exact numbers are dependent on many conditions, and thus non-deterministic. Could be less or more in your setup, and you may need to tweak the limit parameter.
Given the default value of GOGC=100, the GC pacer will schedule to run when the allocated heap amounts to 2x of the live set at GC mark phase end. In my setup, this clocks in at 60MiB. Of course, that's beyond our 32MiB limit.
Next, the program releases the 90% byte slab, and allocates the remaining 10%. With the default limit value, it releases 30198988 bytes to allocate 3355443 bytes (obviating slice headers).
At that point, the program has enough unused heap space that it could reclaim and assign to the new allocation. But unfortunately, GC is scheduled too far out, and the Go runtime does not run GC as a last resource before going above its limit. Therefore, instead of reusing vacant, resident memory, it decides to expand the heap and goes beyond its cgroup limit, thus triggering the OOM killer.
The gist here is that the Go runtime had 9x times (roughly) as much memory free as it needed to allocate, but it was not capable of reclaiming it in time.
Discussion & ideas
We'll probably end up setting up a memory watchdog, initialized with a user-configured memory limit (à la JVM -Xmx). As the heap grows, we'll probably reduce GOGC dynamically by calling
The text was updated successfully, but these errors were encountered:
I would say it's related, but also somewhat orthogonal. I can see how SetMaxHeap helps in server environment. But, say, I am compiling lots of beefy Go code on my not super beefy laptop, there are lots of parallelism and lots of compiler/linker/vet/test invocations. Frequently it badly freezes my machine, sometimes I need to hard reboot even. I am not sure who/how will set SetMaxHeap for all these subprocesses and what will be the limits. But if Go processes would be overall more careful consuming large amounts of memory, it may help.
@dvyukov Sure that's a problem as well, but not the OP's problem. He has and knows a hard limit.
The difficulty I see in your scenario is how we would know. I don't see how a Go process can reliably tell that it is "using too much memory" until mmaping new memory fails. We could run a GC at that point, sure. But we never get a mmap failure in your scenario: the OS is stealing all the available memory, paging out your window manager, etc., in order to satisfy our requests. How do we know the OS is "trying too hard" and we should back off?
@raulk Thanks for the report! From my perspective (and I agree with Keith), this boils down to more evidence suggesting we should have a configurable maximum heap. You mention that you think
@dvyukov We're reasonably careful about heap growths nowadays since we'll eagerly return memory to the OS in that case, but you still run into trouble with the amount of memory needed doubling (which is independent of "heap growth" as far as the runtime's meaning (i.e.
Your example of lots of Go processes is a real issue, but I think it's also somewhat orthogonal to this issue which seems to be focused on a server application (@raulk correct me if I'm wrong). Unfortunately, Go generally doesn't play well with co-tenants (ironically it's usually worse if they're all Go code too, because the idle GC will try to eat up
@mknyszek @randall77 I think I agree with your assessments here. I believe mmap would succeed when expanding the heap beyond the hard limit. But when the process effectively writes to the mapped memory, that's when it would become backed by physical memory and it would summon the OOMKiller. There might be ways that one can interrogate the OS about the limits in force, but this would be hugely platform dependent.
I think the most deterministic way of achieving sympathy here is through
The reason why I suggested decreasing GOGC as we approach the max heap is that if one doesn't do that, GC pacing becomes entirely reactive instead of proactive. For example, if I have a 64GiB max heap, my previous live set was 32GiB, with the default GOGC we wouldn't run GC proactively until another 32GiB have been allocated, which is obviously too late (or until the 2min timer fires).
If that's the case, we would have had this timeline, potentially:
Instead, if one decreases GOGC inversely proportional to the remaining until max heap, you could get a much balanced pattern, for example:
Nice observation about fragmentation. This is something that has come up already recently, and going forward I think anything like
All feedback in this space is useful. :) The notification system for
@dvyukov Yeah, I think I get it. Almost like making the heap goal itself an EWMA or something. I think I can see the value in that. The cliff that's dependent on timing is a bit of a problem and it would be nice to smooth that out.