Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: eliminate stack rescanning #17503

Closed
5 of 7 tasks
aclements opened this issue Oct 18, 2016 · 45 comments
Closed
5 of 7 tasks

runtime: eliminate stack rescanning #17503

aclements opened this issue Oct 18, 2016 · 45 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge
Milestone

Comments

@aclements
Copy link
Member

aclements commented Oct 18, 2016

One of the largest remaining contributors to GC STW time is stack rescanning. I have an approach for eliminating this entirely. This is a tracking bug for implementing this approach.

I will upload a design document and proof soon, and I have a working implementation that I plan to have cleaned up and mailed out in a day or two.

I'm marking this Go 1.9. My current plan is to get the change in for Go 1.8, but have a GODEBUG flag to fall back to the current algorithm for debugging purposes (and in case something goes wrong). Assuming things go smoothly, we'll actually rip out the stack rescanning code when Go 1.9 opens.

Edit: Design doc

Edit: Things to follow up on in Go 1.9+:

/cc @RLH

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31362 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31450 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31451 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31369 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31453 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31452 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31457 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31454 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31367 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31368 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31456 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31455 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31366 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31550 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31572 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31570 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31571 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31655 mentions this issue.

@davidfbacon
Copy link

We used the "double barrier" to address stack scanning issues in the IBM J9 implementation of Metronome. It's in section 4.3 of "Design and implementation of a comprehensive real-time java virtual machine" by Auerbach et al, section 4.3. In that case we were incrementalizing over many thread's stacks, as opposed to the individual stack, but it solves the same problem.

It worked well, although the extra barrier overhead was annoying. Let me know if you have any questions.

david (dfb@google.com)

@aclements
Copy link
Member Author

@davidfbacon, thanks for the reference! Indeed, that looks like the same barrier design. It's good to know that it worked well for Metronome.

I'll update the proposal document to add a citation.

gopherbot pushed a commit to golang/proposal that referenced this issue Oct 21, 2016
Updates golang/go#17503.

Change-Id: Ib635a49f9fde36493ba98a6d87b9c1dd114e0c7d
Reviewed-on: https://go-review.googlesource.com/31362
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@rasky
Copy link
Member

rasky commented Oct 22, 2016

@aclements any numbers on the performance impact of the new barrier? Also, are they harder to eliminate at compile time?

@aclements
Copy link
Member Author

@rasky, it's about a 1.7% performance hit on the x/benchmarks garbage benchmark (which, as the name suggests, is designed to hammer the garbage collector). I haven't checked, but I suspect we've gained more than that from other optimizations since Go 1.7.

They are harder to eliminate at compile time. I completely disabled the optimizations that don't carry over directly to the hybrid barrier and binaries got about 1% larger. We could eliminate some of these, but it requires flow analysis and the current insertion code doesn't do any flow analysis. OTOH, the places where we can eliminate write barriers with the current write barrier aren't all that common anyway (how often do you write the address of a global to something?), so we're not losing much.

gopherbot pushed a commit that referenced this issue Oct 24, 2016
morestack writes the context pointer to gobuf.ctxt, but since
morestack is written in assembly (and has to be very careful with
state), it does *not* invoke the requisite write barrier for this
write. Instead, we patch this up later, in newstack, where we invoke
an explicit write barrier for ctxt.

This already requires some subtle reasoning, and it's going to get a
lot hairier with the hybrid barrier.

Fix this by simplifying the whole mechanism. Instead of writing
gobuf.ctxt in morestack, just pass the value of the context register
to newstack and let it write it to gobuf.ctxt. This is a normal Go
pointer write, so it gets the normal Go write barrier. No subtle
reasoning required.

Updates #17503.

Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63
Reviewed-on: https://go-review.googlesource.com/31550
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31764 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31766 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31765 mentions this issue.

@bradfitz
Copy link
Contributor

bradfitz commented May 3, 2017

Austin, looks like this can be closed? I'm trying to close or remilestone all Go1.9Early bugs.

@bradfitz bradfitz modified the milestones: Go1.9, Go1.9Early May 3, 2017
@aclements aclements modified the milestones: Go1.10Early, Go1.9 Jun 7, 2017
@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@ianlancetaylor
Copy link
Contributor

Ping @aclements : Should this issue be kept open?

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 9, 2018
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/134318 mentions this issue: runtime: eliminate mark 2 and fix mark termination race

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/134785 mentions this issue: runtime: eliminate gchelper mechanism

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/134777 mentions this issue: runtime: remove GODEBUG=gcrescanstacks=1 mode

gopherbot pushed a commit that referenced this issue Oct 2, 2018
The mark 2 phase was originally introduced as a way to reduce the
chance of entering STW mark termination while there was still marking
work to do. It works by flushing and disabling all local work caches
so that all enqueued work becomes immediately globally visible.
However, mark 2 is not only slow–disabling caches makes marking and
the write barrier both much more expensive–but also imperfect. There
is still a rare but possible race (~once per all.bash) that can cause
GC to enter mark termination while there is still marking work. This
race is detailed at
https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race
The effect of this is that mark termination must still cope with the
possibility that there may be work remaining after a concurrent mark
phase. Dealing with this increases STW pause time and increases the
complexity of mark termination.

Furthermore, a similar but far more likely race can cause early
transition from mark 1 to mark 2. This is unfortunate because it
causes performance instability because of the cost of mark 2.

This CL fixes this by replacing mark 2 with a distributed termination
detection algorithm. This algorithm is correct, so it eliminates the
mark termination race, and doesn't require disabling local caches. It
ensures that there are no grey objects upon entering mark termination.
With this change, we're one step closer to eliminating marking from
mark termination entirely (it's still used by STW GC and checkmarks
mode).

This CL does not eliminate the gcBlackenPromptly global flag, though
it is always set to false now. It will be removed in a cleanup CL.

This led to only minor variations in the go1 benchmarks
(https://perf.golang.org/search?q=upload:20180909.1) and compilebench
benchmarks (https://perf.golang.org/search?q=upload:20180910.2).

This significantly improves performance of the garbage benchmark, with
no impact on STW times:

name                        old time/op    new time/op   delta
Garbage/benchmem-MB=64-12    2.21ms ± 1%   2.05ms ± 1%   -7.38% (p=0.000 n=18+19)
Garbage/benchmem-MB=1024-12  2.30ms ±16%   2.20ms ± 7%   -4.51% (p=0.001 n=20+20)

name                        old STW-ns/GC  new STW-ns/GC  delta
Garbage/benchmem-MB=64-12      138k ±44%     141k ±23%     ~    (p=0.309 n=19+20)
Garbage/benchmem-MB=1024-12    159k ±25%     178k ±98%     ~    (p=0.798 n=16+18)

name                        old STW-ns/op  new STW-ns/op                delta
Garbage/benchmem-MB=64-12     4.42k ±44%    4.24k ±23%     ~    (p=0.531 n=19+20)
Garbage/benchmem-MB=1024-12     591 ±24%      636 ±111%    ~    (p=0.309 n=16+18)

(https://perf.golang.org/search?q=upload:20180910.1)

Updates #26903.
Updates #17503.

Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19
Reviewed-on: https://go-review.googlesource.com/c/134318
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
gopherbot pushed a commit that referenced this issue Oct 2, 2018
Currently, setting GODEBUG=gcrescanstacks=1 enables a debugging mode
where the garbage collector re-scans goroutine stacks during mark
termination. This was introduced in Go 1.8 to debug the hybrid write
barrier, but I don't think we ever used it.

Now it's one of the last sources of mark work during mark termination.
This CL removes it.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Updates #17503.

Change-Id: I6ae04d3738aa9c448e6e206e21857a33ecd12acf
Reviewed-on: https://go-review.googlesource.com/c/134777
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
gopherbot pushed a commit that referenced this issue Oct 2, 2018
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.

Updates #26903.
Updates #17503.

Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@aclements
Copy link
Member Author

With the work on #26903, I'm declaring this done. There are a few minor optimizations that could still be done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

9 participants