-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: concurrent GC leads to 5x memory consumption #11677
Comments
/CC @dvyukov |
That's concurrent GC issue. I've run: If I sprinkle runtime.GC calls in the test, in particular inside of the following loop: for i := 0; i < 1e3; i++ {
_ = make([]byte, 1<<20)
} Memory usage stabilizes at ~68MB. But if I remove the calls, and let the concurrent GC do its work, memory steadily grows to 240MB, then to 300MB, then to 340MB. If I enable gctracing, then I see lines like this: gc #1095 @60.193s 9%: 0.35+0.59+0.66+7.4+0.61 ms clock, 1.4+0.59+0+0/1.3/0.16+2.4 ms cpu, 4->125->2 MB, 4 MB goal, 10 P GC is basically 30x off of the goal. GC must not ever let heap grow above the GOGC goal. If it needs some reserve room, it should aim at 0.9 GOGC or so, but then cap the heap at GOGC if it fails to reclaim memory in time. |
This is certainly worth looking into but stating that "GC must not ever let On Sun, Jul 19, 2015 at 5:15 PM, Dmitry Vyukov notifications@github.com
|
If it is not a hard limit, than it is not possible to provision a machine for a Go program. |
Yet machines run Go programs. What is the use case? The example is worth On Mon, Jul 20, 2015 at 2:06 PM, Dmitry Vyukov notifications@github.com
|
@RLH, tests are failing regularly. The problem is not hypothetical. Very senior Go developers can't reason about memory usage and write reliable tests. Surely there is some problem here. It may not be the GC, but let's not be so quick to dismiss it as a possibility. We need to get the world not flaky. |
We can argue all we want about GOGC and provisioning, but there's definitely a bug here:
It was supposed to finish at 4 MB but finished at 125 MB. It shouldn't be possible to miss by that much. We're probably mis-accounting something. We spent less than a microsecond in assists (probably we spent no time in assists), which should have acted as the back-pressure against the heap going this far past the goal. Another odd thing here is that the duration of the mark phase was 7.4 ms, but we only spent 1.46 ms of CPU time in GC during that phase. Those times are small enough that it's possible it's just scheduling noise delaying the start of the mark workers (from the OS, the runtime, or both), but that shouldn't impact assists. |
The role of the GC in this issue is real. Whether the GC can ever exceed On Mon, Jul 20, 2015 at 3:36 PM, Brad Fitzpatrick notifications@github.com
|
Let's make this about the bug that Austin found and not about GC philosophy. |
Here's what's going on. The loop that allocates 1,000 1MB byte slices does enter the assist and the assist knows that it needs to do a lot of work to pay for the allocation, but it finds no work to do, so it returns immediately. It keeps building up assist debt, but also keeps finding no work to do, so there's no back pressure. The reason it's not finding any work to do appears to be related to #11694. There's no work to do because we're actually out of work. But we haven't entered mark termination yet, so the mutator is still running. This can happen simply because it takes us a while to go in to mark termination when we run out of work because we do things like rescanning globals to pick up more work to do in concurrent mark. It's also theoretically possible for this to happen when there is work to do, but it's not available to the assist. For example, the code in #11694 creates a heap that's almost entirely a long singly-linked list. A dedicated mark worker could be working on that list, but there's no way to hand off that work, so the assist won't be able to do anything, regardless of the assist debt. So it seems like we need to do something if we're in the assist and can't find enough work to pay down the assist debt. |
#11694 #11694 is related. #11694 The solution to both probably includes back pressure that forces the The trick of course is to minimize the cost to more mainstream applications On Tue, Jul 21, 2015 at 6:30 PM, Austin Clements notifications@github.com
|
I think for 1.6 we need to make all work equally available to all goroutines. That is, allow mutators to rescan stacks and globals if that's the next piece of work that GC needs to perform.
We need to put them to sleep before the allocation. That is, you already have credit to pay for the allocation, or you are not allowed to do it. I am also worried about transition period when heap profile/size changes. If controller parameters are incorrect, can't mutators under-help and lead to overallocation? |
There appears to be one other cause of this. We don't enable assists until the concurrent mark phase, which means there's some time between when we hit the GC trigger heap size and when we finish the scan phase and go in to mark during which the mutator can continue to allocate without back-pressure. With my prototype fix for the "no work" problem, I see that when we still go over the heap goal, we're over by exactly the amount that was allocated between the GC trigger and the beginning of concurrent mark. |
Currently, there are three ways to satisfy an assist: 1) the mutator steals credit from background GC, 2) the mutator actually does GC work, and 3) there is no more work available. 3 is one of the causes of overshooting the heap goal (and was never intended as a way to satisfy an assist). I have a prototype solution that disallows 3 by making the assist block until it can satisfy 1 or 2 or GC terminates. This actually looks a lot like the getfull barrier, except that it also repeatedly tries to steal credit. Unfortunately it's another spin-with-sleep loop like getfull, for the same reasons getfull is. I'm writing up a Go1.6Early issue about improving the GC barrier overall, since it's really showing its inappropriateness for concurrent GC. |
I think this is a good idea.
Perhaps. Currently, we only do assists if we just did a large allocation or got a new span to keep it off the common path, which means it would be tricky to do the assist before we do the allocation. But we could change this so it updates the assist debt first and if it's over some small-ish threshold, do the assist (or any number of variations on this theme).
No, modulo bugs like this one. We conservatively base the total assist work on the entire allocated heap size. (Note that this is different from what went out in the original GC pacing proposal, which was much less conservative and occasionally had very bad results.) |
My plan to fix this was to enable assists earlier. The write barrier must be on before we enable assists, but since we turn it on in the scan phase, we can in principle enable assists quite early. Unfortunately, this introduces other problems. For example, it's possible for assists during the scan phase to drain everything we have so far and signal completion even though the scan is still collecting pointers. I can work around this, but this change still has deleterious effects I don't yet understand like doubling the duration and heap size of TestTraceStress. |
CL https://golang.org/cl/12672 mentions this issue. |
CL https://golang.org/cl/12674 mentions this issue. |
CL https://golang.org/cl/12671 mentions this issue. |
CL https://golang.org/cl/12670 mentions this issue. |
Currently it's possible for the GC assist to signal completion of the mark phase, which puts the GC coordinator goroutine on the current P's run queue, and then return to mutator code that delays until the next forced preemption before actually yielding control to the GC coordinator, dragging out completion of the mark phase. This delay can be further exacerbated if the mutator makes other goroutines runnable before yielding control, since this will push the GC coordinator on the back of the P's run queue. To fix this, this adds a Gosched to the assist if it completed the mark phase. This immediately and directly yields control to the GC coordinator. This already happens implicitly in the background mark workers because they park immediately after completing the mark. This is one of the reasons completion of the mark phase is being dragged out and allowing the mutator to allocate without assisting, leading to the large heap goal overshoot in issue #11677. This is also a prerequisite to making the assist block when it can't pay off its debt. Change-Id: I586adfbecb3ca042a37966752c1dc757f5c7fc78 Reviewed-on: https://go-review.googlesource.com/12670 Reviewed-by: Russ Cox <rsc@golang.org>
Currently, there are three ways to satisfy a GC assist: 1) the mutator steals credit from background GC, 2) the mutator actually does GC work, and 3) there is no more work available. 3 was never really intended as a way to satisfy an assist, and it causes problems: there are periods when it's expected that the GC won't have any work, such as when transitioning from mark 1 to mark 2 and from mark 2 to mark termination. During these periods, there's no back-pressure on rapidly allocating mutators, which lets them race ahead of the heap goal. For example, test/init1.go and the runtime/trace test both have small reachable heaps and contain loops that rapidly allocate large garbage byte slices. This bug lets these tests exceed the heap goal by several orders of magnitude. Fix this by forcing the assist (and hence the allocation) to block until it can satisfy its debt via either 1 or 2, or the GC cycle terminates. This fixes one the causes of #11677. It's still possible to overshoot the GC heap goal, but with this change the overshoot is almost exactly by the amount of allocation that happens during the concurrent scan phase, between when the heap passes the GC trigger and when the GC enables assists. Change-Id: I5ef4edcb0d2e13a1e432e66e8245f2bd9f8995be Reviewed-on: https://go-review.googlesource.com/12671 Reviewed-by: Russ Cox <rsc@golang.org>
Currently the GC coordinator enables GC assists at the same time it enables background mark workers, after the concurrent scan phase is done. However, this means a rapidly allocating mutator has the entire scan phase during which to allocate beyond the heap trigger and potentially beyond the heap goal with no back-pressure from assists. This prevents the feedback system that's supposed to keep the heap size under the heap goal from doing its job. Fix this by enabling mutator assists during the scan phase. This is safe because the write barrier is already enabled and globally acknowledged at this point. There's still a very small window between when the heap size reaches the heap trigger and when the GC coordinator is able to stop the world during which the mutator can allocate unabated. This allows *very* rapidly allocator mutators like TestTraceStress to still occasionally exceed the heap goal by a small amount (~20 MB at most for TestTraceStress). However, this seems like a corner case. Fixes #11677. Change-Id: I0f80d949ec82341cd31ca1604a626efb7295a819 Reviewed-on: https://go-review.googlesource.com/12674 Reviewed-by: Russ Cox <rsc@golang.org>
For future reference, this will print all GC cycles that go over the goal:
|
CL https://golang.org/cl/16297 mentions this issue. |
Currently dedicated mark workers participate in the getfull barrier during concurrent mark. However, the getfull barrier wasn't designed for concurrent work and this causes no end of headaches. In the concurrent setting, participants come and go. This makes mark completion susceptible to live-lock: since dedicated workers are only periodically polling for completion, it's possible for the program to be in some transient worker each time one of the dedicated workers wakes up to check if it can exit the getfull barrier. It also complicates reasoning about the system because dedicated workers participate directly in the getfull barrier, but transient workers must instead use trygetfull because they have exit conditions that aren't captured by getfull (e.g., fractional workers exit when preempted). The complexity of implementing these exit conditions contributed to #11677. Furthermore, the getfull barrier is inefficient because we could be running user code instead of spinning on a P. In effect, we're dedicating 25% of the CPU to marking even if that means we have to spin to make that 25%. It also causes issues on Windows because we can't actually sleep for 100µs (#8687). Fix this by making dedicated workers no longer participate in the getfull barrier. Instead, dedicated workers simply return to the scheduler when they fail to get more work, regardless of what others workers are doing, and the scheduler only starts new dedicated workers if there's work available. Everything that needs to be handled by this barrier is already handled by detection of mark completion. This makes the system much more symmetric because all workers and assists now use trygetfull during concurrent mark. It also loosens the 25% CPU target so that we can give some of that 25% back to user code if there isn't enough work to keep the mark worker busy. And it eliminates the problematic 100µs sleep on Windows during concurrent mark (though not during mark termination). The downside of this is that if we hit a bottleneck in the heap graph that then expands back out, the system may shut down dedicated workers and take a while to start them back up. We'll address this in the next commit. Updates #12041 and #8687. No effect on the go1 benchmarks. This slows down the garbage benchmark by 9%, but we'll more than make it up in the next commit. name old time/op new time/op delta XBenchGarbage-12 5.80ms ± 2% 6.32ms ± 4% +9.03% (p=0.000 n=20+20) Change-Id: I65100a9ba005a8b5cf97940798918672ea9dd09b Reviewed-on: https://go-review.googlesource.com/16297 Reviewed-by: Rick Hudson <rlh@golang.org>
go version devel +2ae7737 Sat Jul 11 14:36:33 2015 +0000 freebsd/386
The text was updated successfully, but these errors were encountered: