-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: eliminate stack rescanning #17503
Comments
CL https://golang.org/cl/31362 mentions this issue. |
CL https://golang.org/cl/31450 mentions this issue. |
CL https://golang.org/cl/31451 mentions this issue. |
CL https://golang.org/cl/31369 mentions this issue. |
CL https://golang.org/cl/31453 mentions this issue. |
CL https://golang.org/cl/31452 mentions this issue. |
CL https://golang.org/cl/31457 mentions this issue. |
CL https://golang.org/cl/31454 mentions this issue. |
CL https://golang.org/cl/31367 mentions this issue. |
CL https://golang.org/cl/31368 mentions this issue. |
CL https://golang.org/cl/31456 mentions this issue. |
CL https://golang.org/cl/31455 mentions this issue. |
CL https://golang.org/cl/31366 mentions this issue. |
CL https://golang.org/cl/31550 mentions this issue. |
CL https://golang.org/cl/31572 mentions this issue. |
CL https://golang.org/cl/31570 mentions this issue. |
CL https://golang.org/cl/31571 mentions this issue. |
CL https://golang.org/cl/31655 mentions this issue. |
We used the "double barrier" to address stack scanning issues in the IBM J9 implementation of Metronome. It's in section 4.3 of "Design and implementation of a comprehensive real-time java virtual machine" by Auerbach et al, section 4.3. In that case we were incrementalizing over many thread's stacks, as opposed to the individual stack, but it solves the same problem. It worked well, although the extra barrier overhead was annoying. Let me know if you have any questions. david (dfb@google.com) |
@davidfbacon, thanks for the reference! Indeed, that looks like the same barrier design. It's good to know that it worked well for Metronome. I'll update the proposal document to add a citation. |
Updates golang/go#17503. Change-Id: Ib635a49f9fde36493ba98a6d87b9c1dd114e0c7d Reviewed-on: https://go-review.googlesource.com/31362 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@aclements any numbers on the performance impact of the new barrier? Also, are they harder to eliminate at compile time? |
@rasky, it's about a 1.7% performance hit on the x/benchmarks garbage benchmark (which, as the name suggests, is designed to hammer the garbage collector). I haven't checked, but I suspect we've gained more than that from other optimizations since Go 1.7. They are harder to eliminate at compile time. I completely disabled the optimizations that don't carry over directly to the hybrid barrier and binaries got about 1% larger. We could eliminate some of these, but it requires flow analysis and the current insertion code doesn't do any flow analysis. OTOH, the places where we can eliminate write barriers with the current write barrier aren't all that common anyway (how often do you write the address of a global to something?), so we're not losing much. |
morestack writes the context pointer to gobuf.ctxt, but since morestack is written in assembly (and has to be very careful with state), it does *not* invoke the requisite write barrier for this write. Instead, we patch this up later, in newstack, where we invoke an explicit write barrier for ctxt. This already requires some subtle reasoning, and it's going to get a lot hairier with the hybrid barrier. Fix this by simplifying the whole mechanism. Instead of writing gobuf.ctxt in morestack, just pass the value of the context register to newstack and let it write it to gobuf.ctxt. This is a normal Go pointer write, so it gets the normal Go write barrier. No subtle reasoning required. Updates #17503. Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63 Reviewed-on: https://go-review.googlesource.com/31550 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
CL https://golang.org/cl/31764 mentions this issue. |
CL https://golang.org/cl/31766 mentions this issue. |
CL https://golang.org/cl/31765 mentions this issue. |
Austin, looks like this can be closed? I'm trying to close or remilestone all Go1.9Early bugs. |
Ping @aclements : Should this issue be kept open? |
Change https://golang.org/cl/134318 mentions this issue: |
Change https://golang.org/cl/134785 mentions this issue: |
Change https://golang.org/cl/134777 mentions this issue: |
The mark 2 phase was originally introduced as a way to reduce the chance of entering STW mark termination while there was still marking work to do. It works by flushing and disabling all local work caches so that all enqueued work becomes immediately globally visible. However, mark 2 is not only slow–disabling caches makes marking and the write barrier both much more expensive–but also imperfect. There is still a rare but possible race (~once per all.bash) that can cause GC to enter mark termination while there is still marking work. This race is detailed at https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race The effect of this is that mark termination must still cope with the possibility that there may be work remaining after a concurrent mark phase. Dealing with this increases STW pause time and increases the complexity of mark termination. Furthermore, a similar but far more likely race can cause early transition from mark 1 to mark 2. This is unfortunate because it causes performance instability because of the cost of mark 2. This CL fixes this by replacing mark 2 with a distributed termination detection algorithm. This algorithm is correct, so it eliminates the mark termination race, and doesn't require disabling local caches. It ensures that there are no grey objects upon entering mark termination. With this change, we're one step closer to eliminating marking from mark termination entirely (it's still used by STW GC and checkmarks mode). This CL does not eliminate the gcBlackenPromptly global flag, though it is always set to false now. It will be removed in a cleanup CL. This led to only minor variations in the go1 benchmarks (https://perf.golang.org/search?q=upload:20180909.1) and compilebench benchmarks (https://perf.golang.org/search?q=upload:20180910.2). This significantly improves performance of the garbage benchmark, with no impact on STW times: name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.21ms ± 1% 2.05ms ± 1% -7.38% (p=0.000 n=18+19) Garbage/benchmem-MB=1024-12 2.30ms ±16% 2.20ms ± 7% -4.51% (p=0.001 n=20+20) name old STW-ns/GC new STW-ns/GC delta Garbage/benchmem-MB=64-12 138k ±44% 141k ±23% ~ (p=0.309 n=19+20) Garbage/benchmem-MB=1024-12 159k ±25% 178k ±98% ~ (p=0.798 n=16+18) name old STW-ns/op new STW-ns/op delta Garbage/benchmem-MB=64-12 4.42k ±44% 4.24k ±23% ~ (p=0.531 n=19+20) Garbage/benchmem-MB=1024-12 591 ±24% 636 ±111% ~ (p=0.309 n=16+18) (https://perf.golang.org/search?q=upload:20180910.1) Updates #26903. Updates #17503. Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19 Reviewed-on: https://go-review.googlesource.com/c/134318 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, setting GODEBUG=gcrescanstacks=1 enables a debugging mode where the garbage collector re-scans goroutine stacks during mark termination. This was introduced in Go 1.8 to debug the hybrid write barrier, but I don't think we ever used it. Now it's one of the last sources of mark work during mark termination. This CL removes it. Updates #26903. This is preparation for unifying STW GC and concurrent GC. Updates #17503. Change-Id: I6ae04d3738aa9c448e6e206e21857a33ecd12acf Reviewed-on: https://go-review.googlesource.com/c/134777 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
Now that we do no mark work during mark termination, we no longer need the gchelper mechanism. Updates #26903. Updates #17503. Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b Reviewed-on: https://go-review.googlesource.com/c/134785 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
With the work on #26903, I'm declaring this done. There are a few minor optimizations that could still be done. |
One of the largest remaining contributors to GC STW time is stack rescanning. I have an approach for eliminating this entirely. This is a tracking bug for implementing this approach.
I will upload a design document and proof soon, and I have a working implementation that I plan to have cleaned up and mailed out in a day or two.
I'm marking this Go 1.9. My current plan is to get the change in for Go 1.8, but have a GODEBUG flag to fall back to the current algorithm for debugging purposes (and in case something goes wrong). Assuming things go smoothly, we'll actually rip out the stack rescanning code when Go 1.9 opens.
Edit: Design doc
Edit: Things to follow up on in Go 1.9+:
debug.gcrescanstacks
(Go 1.12, 198440c)work.helperDrainBlock
(Go 1.12, runtime: simplify mark termination and eliminate mark 2 #26903)stopTheWorldWithSema
(should happen as part of non-cooperative preemption)/cc @RLH
The text was updated successfully, but these errors were encountered: