Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: stopped Ms can't become dedicated or fractional GC workers #44313

Open
prattmic opened this issue Feb 16, 2021 · 8 comments
Open

runtime: stopped Ms can't become dedicated or fractional GC workers #44313

prattmic opened this issue Feb 16, 2021 · 8 comments

Comments

@prattmic
Copy link
Member

@prattmic prattmic commented Feb 16, 2021

When a GC starts, Ms are enlisted to run a dedicated or fractional GC worker via a call to findRunnableGCWorker early in schedule.

This is problematic for Ms blocked in findrunnable, which never calls findRunnableGCWorker, and thus cannot run dedicated or fractional GC workers. findrunnable can run idle-priority GC workers, and thus the GC work should get done regardless, but this is going to become particularly problematic if idle-priority GC is removed (#44163).

The typical flow should look something like:

  1. Normal goroutine calls gcStart.
  2. During the process of starting GC, startTheWorldWithSema calls wakep, waking an M stopped in findrunnable.
  3. The woken M runs a GC mark worker at idle-priority.
  4. The M from (1) eventually enters the scheduler and calls findRunnableGCWorker, running a dedicated GC mark worker, and then calls wakep from schedule, waking another M stopped in findrunnable.
  5. The woken M runs a GC mark worker at idle-priority.

If the Ms at (3) and (5) are not spinning (which they won't be if it will be fully stopped), I don't believe there will be anymore wakep calls, thus limiting the total number of workers to a maximum of three.

FWIW, I don't think it should be fundamentally difficult to move a findRunnableGCWorker check into findrunnable.

This is somewhat related to #39004, another case of a source of work not reachable from findrunnable (though in that case there is no wakep at all).

cc @mknyszek @aclements

@prattmic prattmic added this to the Go1.17 milestone Feb 16, 2021
@prattmic prattmic self-assigned this Feb 16, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307914 mentions this issue: runtime: refactor findrunnable spinning recheck

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307913 mentions this issue: runtime: refactor work stealing to dedicated function

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307910 mentions this issue: runtime: move delta computation closer to use

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 7, 2021

Change https://golang.org/cl/307911 mentions this issue: runtime: remove redudant tryWakeP component

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 8, 2021

Change https://golang.org/cl/307912 mentions this issue: runtime: clarify which work needs spinning coordination

Loading

gopherbot pushed a commit that referenced this issue Apr 16, 2021
findrunnable has a couple places where delta is recomputed from a new
pollUntil value. This proves to be a pain in refactoring, as it is easy
to forget to do properly.

Move computation of delta closer to its use, where it is more logical
anyways.

This CL should have no functional changes.

For #43997.
For #44313.

Change-Id: I89980fd7f40f8a4c56c7540cae03ff99e12e1422
Reviewed-on: https://go-review.googlesource.com/c/go/+/307910
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Apr 16, 2021
Here tryWakeP can't already be true, so there is no need to combine the
values.

This CL should have no functional changes.

For #43997.
For #44313.

Change-Id: I640c7bb88a5f70c8d22f89f0b5b146b3f60c0136
Reviewed-on: https://go-review.googlesource.com/c/go/+/307911
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Apr 16, 2021
The overview comments discuss readying goroutines, which is the most
common source of work, but timers and idle-priority GC work also require
the same synchronization w.r.t. spinning Ms.

This CL should have no functional changes.

For #43997
Updates #44313

Change-Id: I7910a7f93764dde07c3ed63666277eb832bf8299
Reviewed-on: https://go-review.googlesource.com/c/go/+/307912
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Apr 20, 2021
findrunnable has grown very large and hard to follow over the years.
Parts we can split out into logical chunks should help make it more
understandable and easier to change in the future.

The work stealing loop is one such big chunk that is fairly trivial to
split out into its own function, and even has the advantage of
simplifying control flow by removing a goto around work stealing.

This CL should have no functional changes.

For #43997.
For #44313.

Change-Id: Ie69670c7bc60bd6c114e860184918717829adb22
Reviewed-on: https://go-review.googlesource.com/c/go/+/307913
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Chris Hines <chris.cs.guy@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue Apr 20, 2021
Break the main components of the findrunnable spinning -> non-spinning
recheck out into their own functions, which simplifies both findrunnable
and the new functions, which can make use of fancy features like early
returns.

This CL should have no functional changes.

For #43997
For #44313

Change-Id: I6d3060fcecda9920a3471ff338f73d53b1d848a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/307914
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented May 24, 2021

I don't think anything else is going to be done here in this release. @prattmic does that sound right? Optimistically kicking to 1.18.

Loading

@mknyszek mknyszek removed this from the Go1.17 milestone May 24, 2021
@mknyszek mknyszek added this to the Go1.18 milestone May 24, 2021
@prattmic
Copy link
Member Author

@prattmic prattmic commented Jun 1, 2021

Agreed. I don't think this warrants a backport/late fix.

Loading

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Nov 10, 2021

It's probably too late to do anything here for this cycle. Moving this to backlog since it was already moved back one cycle. Feel free to move it back into 1.19 if you plan to do some work on it.

Loading

@mknyszek mknyszek removed this from the Go1.18 milestone Nov 10, 2021
@mknyszek mknyszek added this to the Backlog milestone Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants