New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: eliminate stack rescanning #17503

Open
aclements opened this Issue Oct 18, 2016 · 44 comments

Comments

Projects
None yet
9 participants
@aclements
Member

aclements commented Oct 18, 2016

One of the largest remaining contributors to GC STW time is stack rescanning. I have an approach for eliminating this entirely. This is a tracking bug for implementing this approach.

I will upload a design document and proof soon, and I have a working implementation that I plan to have cleaned up and mailed out in a day or two.

I'm marking this Go 1.9. My current plan is to get the change in for Go 1.8, but have a GODEBUG flag to fall back to the current algorithm for debugging purposes (and in case something goes wrong). Assuming things go smoothly, we'll actually rip out the stack rescanning code when Go 1.9 opens.

Edit: Design doc

Edit: Things to follow up on in Go 1.9/1.10:

  • Remove stack rescanning [done]
  • Remove (or replace) stack barriers and delete TestStackBarrierProfiling [done]
  • Remove debug.gcrescanstacks
  • Fix early mark termination race
  • Remove work draining from mark termination and work.helperDrainBlock
  • Revisit 100us wait in stopTheWorldWithSema
  • Revisit making the second shade conditional (and the condition for channel ops)

/cc @RLH

@gopherbot

This comment has been minimized.

gopherbot commented Oct 18, 2016

CL https://golang.org/cl/31362 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31450 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31451 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31369 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31453 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31452 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31457 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31454 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31367 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31368 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31456 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31455 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 19, 2016

CL https://golang.org/cl/31366 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 20, 2016

CL https://golang.org/cl/31550 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 20, 2016

CL https://golang.org/cl/31572 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 20, 2016

CL https://golang.org/cl/31570 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 20, 2016

CL https://golang.org/cl/31571 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 21, 2016

CL https://golang.org/cl/31655 mentions this issue.

@davidfbacon

This comment has been minimized.

davidfbacon commented Oct 21, 2016

We used the "double barrier" to address stack scanning issues in the IBM J9 implementation of Metronome. It's in section 4.3 of "Design and implementation of a comprehensive real-time java virtual machine" by Auerbach et al, section 4.3. In that case we were incrementalizing over many thread's stacks, as opposed to the individual stack, but it solves the same problem.

It worked well, although the extra barrier overhead was annoying. Let me know if you have any questions.

david (dfb@google.com)

@aclements

This comment has been minimized.

Member

aclements commented Oct 21, 2016

@davidfbacon, thanks for the reference! Indeed, that looks like the same barrier design. It's good to know that it worked well for Metronome.

I'll update the proposal document to add a citation.

gopherbot pushed a commit to golang/proposal that referenced this issue Oct 21, 2016

design: design doc for eliminating stack re-scanning
Updates golang/go#17503.

Change-Id: Ib635a49f9fde36493ba98a6d87b9c1dd114e0c7d
Reviewed-on: https://go-review.googlesource.com/31362
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@rasky

This comment has been minimized.

Member

rasky commented Oct 22, 2016

@aclements any numbers on the performance impact of the new barrier? Also, are they harder to eliminate at compile time?

@aclements

This comment has been minimized.

Member

aclements commented Oct 23, 2016

@rasky, it's about a 1.7% performance hit on the x/benchmarks garbage benchmark (which, as the name suggests, is designed to hammer the garbage collector). I haven't checked, but I suspect we've gained more than that from other optimizations since Go 1.7.

They are harder to eliminate at compile time. I completely disabled the optimizations that don't carry over directly to the hybrid barrier and binaries got about 1% larger. We could eliminate some of these, but it requires flow analysis and the current insertion code doesn't do any flow analysis. OTOH, the places where we can eliminate write barriers with the current write barrier aren't all that common anyway (how often do you write the address of a global to something?), so we're not losing much.

gopherbot pushed a commit that referenced this issue Oct 24, 2016

runtime: make morestack less subtle
morestack writes the context pointer to gobuf.ctxt, but since
morestack is written in assembly (and has to be very careful with
state), it does *not* invoke the requisite write barrier for this
write. Instead, we patch this up later, in newstack, where we invoke
an explicit write barrier for ctxt.

This already requires some subtle reasoning, and it's going to get a
lot hairier with the hybrid barrier.

Fix this by simplifying the whole mechanism. Instead of writing
gobuf.ctxt in morestack, just pass the value of the context register
to newstack and let it write it to gobuf.ctxt. This is a normal Go
pointer write, so it gets the normal Go write barrier. No subtle
reasoning required.

Updates #17503.

Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63
Reviewed-on: https://go-review.googlesource.com/31550
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@gopherbot

This comment has been minimized.

gopherbot commented Oct 24, 2016

CL https://golang.org/cl/31764 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 24, 2016

CL https://golang.org/cl/31766 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Oct 24, 2016

CL https://golang.org/cl/31765 mentions this issue.

@aclements

This comment has been minimized.

Member

aclements commented Oct 28, 2016

@cherrymui, I think you're right that channel operations only need the second shade if the source stack is grey and the destination stack is black. For 1.8 we're always performing the second shade (channel operations or not), which is more conservative for correctness and makes it easier to have the GODEBUG setting to fall back to the current behavior (or a superset thereof). But I'll definitely consider your observation in depth for Go 1.9 when I look at making the second shade conditional.

@gopherbot

This comment has been minimized.

gopherbot commented Feb 9, 2017

CL https://golang.org/cl/36621 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Feb 9, 2017

CL https://golang.org/cl/36619 mentions this issue.

@gopherbot

This comment has been minimized.

gopherbot commented Feb 9, 2017

CL https://golang.org/cl/36620 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 14, 2017

runtime: remove rescan list
With the hybrid barrier, rescanning stacks is no longer necessary so
the rescan list is no longer necessary. Remove it.

This leaves the gcrescanstacks GODEBUG variable, since it's useful for
debugging, but changes it to simply walk all of the Gs to rescan
stacks rather than using the rescan list.

We could also remove g.gcscanvalid, which is effectively a distributed
rescan list. However, it's still useful for gcrescanstacks mode and it
adds little complexity, so we'll leave it in.

Fixes #17099.
Updates #17503.

Change-Id: I776d43f0729567335ef1bfd145b75c74de2cc7a9
Reviewed-on: https://go-review.googlesource.com/36619
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

gopherbot pushed a commit that referenced this issue Feb 14, 2017

runtime: remove stack barriers
Now that we don't rescan stacks, stack barriers are unnecessary. This
removes all of the code and structures supporting them as well as
tests that were specifically for stack barriers.

Updates #17503.

Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c
Reviewed-on: https://go-review.googlesource.com/36620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

gopherbot pushed a commit that referenced this issue Feb 14, 2017

runtime: remove g.stackAlloc
Since we're no longer stealing space for the stack barrier array from
the stack allocation, the stack allocation is simply
g.stack.hi-g.stack.lo.

Updates #17503.

Change-Id: Id9b450ae12c3df9ec59cfc4365481a0a16b7c601
Reviewed-on: https://go-review.googlesource.com/36621
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
@bradfitz

This comment has been minimized.

Member

bradfitz commented May 3, 2017

Austin, looks like this can be closed? I'm trying to close or remilestone all Go1.9Early bugs.

@bradfitz bradfitz modified the milestones: Go1.9, Go1.9Early May 3, 2017

@aclements aclements modified the milestones: Go1.10Early, Go1.9 Jun 7, 2017

@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jul 9, 2018

Ping @aclements : Should this issue be kept open?

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 9, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Sep 10, 2018

Change https://golang.org/cl/134318 mentions this issue: runtime: eliminate mark 2 and fix mark termination race

@gopherbot

This comment has been minimized.

gopherbot commented Sep 11, 2018

Change https://golang.org/cl/134785 mentions this issue: runtime: eliminate gchelper mechanism

@gopherbot

This comment has been minimized.

gopherbot commented Sep 11, 2018

Change https://golang.org/cl/134777 mentions this issue: runtime: remove GODEBUG=gcrescanstacks=1 mode

gopherbot pushed a commit that referenced this issue Oct 2, 2018

runtime: eliminate mark 2 and fix mark termination race
The mark 2 phase was originally introduced as a way to reduce the
chance of entering STW mark termination while there was still marking
work to do. It works by flushing and disabling all local work caches
so that all enqueued work becomes immediately globally visible.
However, mark 2 is not only slow–disabling caches makes marking and
the write barrier both much more expensive–but also imperfect. There
is still a rare but possible race (~once per all.bash) that can cause
GC to enter mark termination while there is still marking work. This
race is detailed at
https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md#appendix-mark-completion-race
The effect of this is that mark termination must still cope with the
possibility that there may be work remaining after a concurrent mark
phase. Dealing with this increases STW pause time and increases the
complexity of mark termination.

Furthermore, a similar but far more likely race can cause early
transition from mark 1 to mark 2. This is unfortunate because it
causes performance instability because of the cost of mark 2.

This CL fixes this by replacing mark 2 with a distributed termination
detection algorithm. This algorithm is correct, so it eliminates the
mark termination race, and doesn't require disabling local caches. It
ensures that there are no grey objects upon entering mark termination.
With this change, we're one step closer to eliminating marking from
mark termination entirely (it's still used by STW GC and checkmarks
mode).

This CL does not eliminate the gcBlackenPromptly global flag, though
it is always set to false now. It will be removed in a cleanup CL.

This led to only minor variations in the go1 benchmarks
(https://perf.golang.org/search?q=upload:20180909.1) and compilebench
benchmarks (https://perf.golang.org/search?q=upload:20180910.2).

This significantly improves performance of the garbage benchmark, with
no impact on STW times:

name                        old time/op    new time/op   delta
Garbage/benchmem-MB=64-12    2.21ms ± 1%   2.05ms ± 1%   -7.38% (p=0.000 n=18+19)
Garbage/benchmem-MB=1024-12  2.30ms ±16%   2.20ms ± 7%   -4.51% (p=0.001 n=20+20)

name                        old STW-ns/GC  new STW-ns/GC  delta
Garbage/benchmem-MB=64-12      138k ±44%     141k ±23%     ~    (p=0.309 n=19+20)
Garbage/benchmem-MB=1024-12    159k ±25%     178k ±98%     ~    (p=0.798 n=16+18)

name                        old STW-ns/op  new STW-ns/op                delta
Garbage/benchmem-MB=64-12     4.42k ±44%    4.24k ±23%     ~    (p=0.531 n=19+20)
Garbage/benchmem-MB=1024-12     591 ±24%      636 ±111%    ~    (p=0.309 n=16+18)

(https://perf.golang.org/search?q=upload:20180910.1)

Updates #26903.
Updates #17503.

Change-Id: Icbd1e12b7a12a76f423c9bf033b13cb363e4cd19
Reviewed-on: https://go-review.googlesource.com/c/134318
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

gopherbot pushed a commit that referenced this issue Oct 2, 2018

runtime: remove GODEBUG=gcrescanstacks=1 mode
Currently, setting GODEBUG=gcrescanstacks=1 enables a debugging mode
where the garbage collector re-scans goroutine stacks during mark
termination. This was introduced in Go 1.8 to debug the hybrid write
barrier, but I don't think we ever used it.

Now it's one of the last sources of mark work during mark termination.
This CL removes it.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Updates #17503.

Change-Id: I6ae04d3738aa9c448e6e206e21857a33ecd12acf
Reviewed-on: https://go-review.googlesource.com/c/134777
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

gopherbot pushed a commit that referenced this issue Oct 2, 2018

runtime: eliminate gchelper mechanism
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.

Updates #26903.
Updates #17503.

Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment