Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upruntime: non-cooperative goroutine preemption #24543
Comments
aclements
added this to the Go1.12 milestone
Mar 26, 2018
aclements
self-assigned this
Mar 26, 2018
gopherbot
added
the
Proposal
label
Mar 26, 2018
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Mar 26, 2018
|
Change https://golang.org/cl/102600 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Mar 26, 2018
|
Change https://golang.org/cl/102603 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Mar 26, 2018
|
Change https://golang.org/cl/102604 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
|
Forwarding some questions from @hyangah on the CL:
All of cgo is currently considered a safe-point (one of the reasons it's relatively expensive to enter and exit cgo) and this won't change.
I don't think the runtime can avoid sending signals to threads that may be in cgo without expensive synchronization on common paths, but I don't think it matters. When it enters the runtime signal handler it can recognize that it was in cgo and do the appropriate thing (which will probably be to just ignore it, or maybe queue up an action like stack scanning).
It should be okay if cgo code uses the signal, as long as it's correctly chained. I'm hoping to use POSIX real-time signals on systems where they're available, so the runtime will attempt to find one that's unused (which is usually all of them anyway), though that isn't an option on Darwin. And a question from @randall77 (which I answered on the CL, but should have answered here):
There's really no cost to the current technique and we'll continue to rely on it in the runtime for the foreseeable future, so my current plan is to leave it in. However, we could be much more aggressive about removing stack bounds checks (for example if we can prove that a whole call tree will fit in the nosplit zone). |
This comment has been minimized.
This comment has been minimized.
|
So it is still possible to make goroutine nonpreemptable with something like: |
This comment has been minimized.
This comment has been minimized.
|
Yes, that would still make a goroutine non-preemptible. However, with some extra annotations in the assembly to indicate registers containing pointers it will become preemptible without any extra work or run-time overhead to reach an explicit safe-point. In the case of I'll add a paragraph to the design doc about this. |
This comment has been minimized.
This comment has been minimized.
|
will the design doc be posted here? |
This comment has been minimized.
This comment has been minimized.
|
The design doc is under review here: https://golang.org/cl/102600
(As a reminder, please only post editing comments to the CL itself and keep technical discussion on the GitHub issue.)
|
pushed a commit
to golang/proposal
that referenced
this issue
Mar 28, 2018
This comment has been minimized.
This comment has been minimized.
|
The doc is now submitted: Proposal: Non-cooperative goroutine preemption |
This comment has been minimized.
This comment has been minimized.
mtstickney
commented
Mar 30, 2018
|
Disclaimer: I'm not a platform expert, or an expert on language implementations, or involved with go aside from having written a few toy programs in it. That said: There's a (potentially) fatal flaw here: As some old notes for SBCL point out, Windows has no working version of preemptive signals without loading a kernel driver, which is generally prohibitive for applications. |
This comment has been minimized.
This comment has been minimized.
JamesBielby
commented
Mar 31, 2018
|
I think the example code to avoid creating a past-the-end pointer has a problem if the slice has a capacity of 0. You need to declare _p after the first if statement. |
This comment has been minimized.
This comment has been minimized.
creker
commented
Mar 31, 2018
•
|
@mtstickney looks like it's true but we can look for other implementations, how they go about the same problem. CoreCLR talks about the same problem - they need to preempt threads for GC and talk about the same bugs with wrong thread context. And they also talk about how they solve it without ditching I'm not an expert in this kind of stuff so I'm sorry if this has nothing to do with solving the problem here. |
This comment has been minimized.
This comment has been minimized.
mtstickney
commented
Mar 31, 2018
|
@creker Nor me, so we're in the same boat there. I hadn't seen the CoreCLR reference before, but that's the same idea as the lisp approach: The trick is capturing the original register set: you can either do it with an OS primitive ( It looks like on some Windows versions, some of the time, you can detect and avoid the race conditions with |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the pointers about For GC preemption, we can always resume the same goroutine on the same thread after preemption, so there's no need to call For scheduler preemption, things are a bit more complicated, but I think still okay. In this case we would need to call |
aclements
added
the
Proposal-Accepted
label
Apr 9, 2018
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Apr 20, 2018
|
Change https://golang.org/cl/108497 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Apr 20, 2018
|
Change https://golang.org/cl/108496 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Apr 20, 2018
|
Change https://golang.org/cl/108498 mentions this issue: |
pushed a commit
that referenced
this issue
Apr 20, 2018
pushed a commit
that referenced
this issue
Apr 20, 2018
pushed a commit
that referenced
this issue
Apr 23, 2018
bradfitz
referenced this issue
Apr 24, 2018
Closed
crypto/elliptic: hang in doubleJacobian with Curve P-521 #25054
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Apr 25, 2018
|
Change https://golang.org/cl/109351 mentions this issue: |
pushed a commit
that referenced
this issue
May 22, 2018
pushed a commit
that referenced
this issue
May 22, 2018
pushed a commit
that referenced
this issue
May 22, 2018
This comment has been minimized.
This comment has been minimized.
wsc1
commented
Sep 30, 2018
|
Hi all, a question about this design. First, I've only read it from a high level, and haven't followed the use cases where the cooperative preemption is problematic. But the big picture question I have is: what about code which depends on cooperative preemption? But I can give an example where non-cooperative preemption might be problematic, on a project I am working on. The context is communication with an OS/C thread via atomics and not cgo, where timing is very sensitive: real time deadline misses will render things useless (audio i/o). If currently, some code controls the pre-emption via spinning and runtime.Gosched(), then it seems to me this proposal will break that code because it will introduce preemption and hence delay the thing which is programmed to not be pre-empted. Again, there is no way for us to know how much such code there is, and without an assessment of that, it seems to me this proposal risks entering a game of whack-a-mole w.r.t. go scheduling, where you solve one problem and as a result another pops up. Please don't take away programmer control of pre-emption. Last question: what good could runtime.Gosched() possibly serve with per-instruction pre-emption? Again, sorry I don't know the details of this proposal, but that might be the case with a lot of other code that uses runtime.Gosched() under the assumption of cooperative pre-emption. |
alikhil
referenced this issue
Oct 16, 2018
Closed
Benchmark parallel and sequential image proccesing #32
foubarre
referenced this issue
Oct 19, 2018
Closed
calling runtime.ReadMemStats freezes runtime while any goroutines are in a non pre-emptable tight loop #28289
wsc1
referenced this issue
Oct 25, 2018
Closed
cmd/go, cmd/compile: record language version to support language transitions #28221
This comment has been minimized.
This comment has been minimized.
networkimprov
commented
Oct 25, 2018
|
@wsc1 can you provide a code example which breaks without cooperative scheduling? |
This comment has been minimized.
This comment has been minimized.
wsc1
commented
Oct 25, 2018
•
|
@networkimprov please see |
This comment has been minimized.
This comment has been minimized.
|
@wsc1 |
This comment has been minimized.
This comment has been minimized.
wsc1
commented
Oct 26, 2018
@crvv testing audio latency violations is hard. it isn't expected to agree except on the same hard ware under the same operating conditions or under OS simulation, and the scheduler random seed can come into play. re-creating those things to placate the interests of pre-emptive scheduling is not my job, although I'm happy to help along the way. No one said what hardware or OS operating conditions either. There are also a myriad of reasons why pre-emption in a real-time audio processing loop would cause problems documented there by audio processing professionals. The Go runtime uses locks and the memory management can lock the system. These things are widely accepted as things which cause glitches in real time audio because they can take longer than the real-wall-clock-time allocated to a low latency application. This is widely accepted. It is also widely accepted that the glitches "will eventually" happen, meaning that it is very hard to create a test in which they do in a snap because it depends on the whole system (OS, hardware, go runtime) state. I do not find it so credible to quote out of context against the grain of best practice in the field. You could also provide a reason why you want a test on the github issue tracker. Do you believe that the worst case real-wall-clock-time of pre-emption doings inserted into a critical segment of real-time audio processing code shouldn't cause problems? Why? To me, the burden of proof lies there. On my end, I'll continue to provide what info and alternatives I can to help inform the discussion. |
ianlancetaylor
changed the title
proposal: runtime: non-cooperative goroutine preemption
runtime: non-cooperative goroutine preemption
Dec 21, 2018
aclements
modified the milestones:
Go1.12,
Go1.13
Jan 8, 2019
This comment has been minimized.
This comment has been minimized.
Non-cooperative preemption as proposed shouldn't introduce any more overhead or jitter than you would already see from any general-purpose operating system handling hardware and timer interrupts. If the system is not overloaded, which is already a requirement for such work, then it will simply resume from the preemption after perhaps a few thousand instructions. And we could probably optimize the scheduler to not even attempt preemption if it would be a no-op. On the flip side, tight atomic loops like you describe currently run the very real risk of deadlocking the entire Go system, which would certainly violate any latency target.
There's quite a lot of evidence that cooperative preemption has caused a large number of issues for many users (see #543, #12553, #13546, #14561, #15442, #17174, #20793, #21053 for a sampling). |
This comment has been minimized.
This comment has been minimized.
ayanamist
commented
Jan 18, 2019
|
@aclements It's easy to write some micro program to show this infinite loop, like all issues you mentioned, but hard to find a real case in a larger project. I can not see such goroutine running without calling any non-inlinable functions has any useful purpose. |
This comment has been minimized.
This comment has been minimized.
|
@ayanamist This bug has been reported multiple times based on real code in real projects, including cases mentioned in the issues cited above. It's true that the cut down versions look like unimportant toys, and it's true that the root cause is often a bug in the original code, but this is still surprising behavior in the language that has tripped up many people over the years. |
This comment has been minimized.
This comment has been minimized.
|
I'd just like to add some data to this discussion. I gave a talk at several conferences [1,2,3,4,5] about this topic and collected some user experiences and feedback that can be summarized as:
I believe this should be addressed, as long as some way to annotate code as not preemptible is provided (surprises should be opt-in, and not opt-out). |
This comment has been minimized.
This comment has been minimized.
wsc1
commented
Jan 21, 2019
•
Thanks very much for the well balanced data. I wanted to especially emphasise agreement with your last statement:
because not allowing un-preemptible code :
I am not arguing that the above list is more important to take into account than the problems caused by the surprise, just emphasising that a way to annotate code as not pre-emptible should be a requirement. |
This comment has been minimized.
This comment has been minimized.
wsc1
commented
Jan 21, 2019
|
I would like to ask that the title of this issue be changed so that it is not in the form of an opinion about the solution but rather in the form of stated problems. I think that might help guide the discussion toward consensus, as there is plenty of evidence of lack of consensus for the pre-emptive vs cooperative scheduling. For example, "scheduling improvements" may be a less divisive choice. |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Jan 21, 2019
|
Change https://golang.org/cl/158857 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Jan 21, 2019
|
Change https://golang.org/cl/158861 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Jan 21, 2019
|
Change https://golang.org/cl/158858 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Jan 21, 2019
|
Change https://golang.org/cl/158859 mentions this issue: |
This comment has been minimized.
This comment has been minimized.
gopherbot
commented
Jan 21, 2019
|
Change https://golang.org/cl/158860 mentions this issue: |
aclements commentedMar 26, 2018
•
edited
I propose that we solve #10958 (preemption of tight loops) using non-cooperative preemption techniques. I have a detailed design proposal, which I will post shortly. This issue will track this specific implementation approach, as opposed to the general problem.
Edit: Design doc
Currently, Go currently uses compiler-inserted cooperative preemption points in function prologues. The majority of the time, this is good enough to allow Go developers to ignore preemption and focus on writing clear parallel code, but it has sharp edges that we've seen degrade the developer experience time and time again. When it goes wrong, it goes spectacularly wrong, leading to mysterious system-wide latency issues (#17831, #19241) and sometimes complete freezes (#543, #12553, #13546, #14561, #15442, #17174, #20793, #21053). And because this is a language implementation issue that exists outside of Go's language semantics, these failures are surprising and very difficult to debug.
@dr2chase has put significant effort into prototyping cooperative preemption points in loops, which is one way to solve this problem. However, even sophisticated approaches to this led to unacceptable slow-downs in tight loops (where slow-downs are generally least acceptable).
I propose that the Go implementation switch to non-cooperative preemption using stack and register maps at (essentially) every instruction. This would allow goroutines to be preempted without explicit
preemption checks. This approach will solve the problem of delayed preemption with zero run-time overhead and have side benefits for debugger function calls (#21678).
I've already prototyped significant components of this solution, including constructing register maps and recording stack and register maps at every instruction and so far the results are quite promising.
/cc @drchase @RLH @randall77 @minux