-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime,iter: the interaction of Pull
with runtime.LockOSThread
is currently buggy, and generally not well-defined
#65889
Comments
Pull
with runtime.LockOSThread
is currently buggy, and generally not well-definedPull
with runtime.LockOSThread
is currently buggy, and generally not well-defined
CC @rsc |
just to get this right, as I'm often working on system-related Go code dealing with Linux kernel namespaces that needs Os-level thread locking: the caller creating the iterator and asking to iterate needs to lock the thread, as the iterator itself cannot, correct? Is there a way for the iterator to ensure its caller has locked the thread? |
I worry this is too restrictive, as some libraries call A few examples I have found:
Admittedly I haven't found a ton of cases like this, but from a user perspective it seems like a very orthogonal limitation to for loop iterators. |
Correct. The caller creating the iterator and any caller of
I'm not sure I understand. The
That's a good point. You might not be in control of, or be aware of, what the library is doing. I think the We could also punt the question, pick a more restrictive implementation to begin with, and see what happens in practice. We can always backwards-compatibly lift the restriction in the future. Thinking out loud, could we have a slow implementation, and only fall back to it if only the iterator goroutine locks to the thread? That might be OK? However, we then have a bit of a "worst of both worlds" situation. Not only do you have a performance cliff (it's just in a different place now), you still have the pitfall wherein the iterator goroutine (not locked) may see thread state the code author didn't expect it to. |
slow would be better than panic. |
I was confused as well initially. The problem here is that
|
Imo, it's better to start with something more restrictive (such as panicking for now), and then lift some of those restrictions over time as issues arise and solutions appear more clear |
Myself, @aclements, @cherrymui, and @dr2chase discussed this offline a little bit more, and if hypothetically panics are off the table, I think we have an idea of what the implementation would have to look like. If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine. If If the iterator goroutine calls Even if the calling goroutine is not locked to a thread, for simplicity, we'll always fall back to the channel-based implementation. In short: the iterator goroutine can share threads with thread-locked goroutines, but the calling goroutine cannot. The reason for this asymmetry is that you can pass the iterator goroutine's context to another thread, and it would be difficult to support a calling goroutine being able to share a thread with the iterator goroutine when the iterator goroutine is locked specifically. (The calling goroutine would have to be able to force itself to switch threads to the iterator goroutine's thread. This might require a substantial amount of work in the scheduler.) We suspect the iterator wanting to lock to the thread to be less common in general than the calling goroutine being locked to a thread. There's still a performance pitfall here, but it hopefully shouldn't show up most of the time. That is, if a thread-locked goroutine is just iterating over, say, just some straight-forward all-in-Go data structures with |
What happens when the iterator goroutine is switched? Will this program successfully hand over the OS main thread to a different goroutine? package main
import (
"fmt"
"iter"
"runtime"
)
func init() {
// Lock main thread to main goroutine.
runtime.LockOSThread()
}
func main() {
next, _ := iter.Pull(switchThread)
// Pass thread to iterator, and get back a different
// thread.
next()
// Continue on a different, non-main thread.
fmt.Println("successfully passed thread")
}
func switchThread(yield func(struct{}) bool) {
// Unblock the waiting pull iterator with different
// thread.
go yield(struct{}{})
// Perform main-thread work.
fmt.Println("received thread")
} |
My understanding is that this is what would happen |
I don't understand how
But to your argument the proposal also states
which is not necessary true with the |
I was talking about @mknyszek's comment that immediately precedes yours. |
And of course, as long as the iterator goroutine runs on the locked main thread, I can force the runtime to let me keep it: package main
import (
"fmt"
"iter"
"runtime"
)
/*
static void main_thread_api(void) {
switchBack();
call_main_thread_api_that_never_returns();
}
*/
import "C"
func init() {
runtime.LockOSThread()
}
func main() {
next, _ := iter.Pull(switchThread)
// Runs on the main thread.
next()
// Continues on some other thread.
fmt.Println("successfully passed thread")
}
//export switchBack
func switchBack() {
// Runs in a Cgo callback which effectively locks the main
// thread so the runtime can't give it back through the call
// to yield.
go yieldMain(struct{}{})
}
var yield func(struct{}) bool
func switchThread(yield func(struct{}) bool) {
yieldMain = yield
fmt.Println("received main thread, running main thread API")
C.main_thread_api()
} |
Which states:
which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads. |
My understanding is that the iterator goroutine can run on the locked thread but it isn't locked to it. The scheduler can still migrate it to a different goroutine if it wants to. So you don't have any guarantees as to which thread will execute any of the instructions in It would be weird if it worked the way you are proposing, it would mean that in general people writing iterators shouldn't pass To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of |
|
I see. Your analysis makes sense, but then I'm surprised that the iterator does not run on the same thread as the caller, regardless of package main
import "runtime"
func main() {
runtime.LockOSThread()
it := func(yield func() bool) {
// Surprisingly, this body of code may run on
// a completely different thread.
}
for range it {
}
} |
No, iter.Pull would be what makes the difference, IIUC. If you are using |
As @aarzilli says, this does not affect
That example is breaking my brain. I need to time to digest that. It may well be that having the iterator call into cgo or a syscall means we need to treat it the same as calling |
Are you saying that package main
import "runtime"
import "iter"
func main() {
runtime.LockOSThread()
it := func(yield func(struct{}) bool) {
// Always on the main thread if called in a
// `for range f`, (most of?) the time on the
// main thread if called by `iter.Pull`.
}
for _ = range it {
}
next, _ := iter.Pull(it)
for {
next()
}
// Bonus question: what happens when
// yield is called from another goroutine?
it2 := func(yield func(struct{}) bool) {
go yield(struct{}{})
}
for _ = range it2 {
}
}
Yeah, considering cgo/syscalls as implicitly calling LockOSThread for the duration of their calls seems consistent. |
|
FWIW, the documentation doesn't mention goroutines:
In any case it seems better to have consistent behaviour between
This has several advantages:
|
I've been trying to figure out what should/shouldn't work and where something must change. I have a default bias that LockOSThread is the pig at the party here, but maybe that's wrong. I think that these things should by-default work, with possible exceptions in the case of LockOSThread "somewhere":
The last case can create problems, depending on why it called LockOSThread. Did the goroutine need it for reasons unrelated to the iteration (context), or was it because of iteration (iterator), or was it because the loop body needed to run on a particular OS thread (body)? If it's just context, it doesn't matter what the range function does or where the loop body runs, they are incidental. And if Pull is called, that is not a problem either, whether it uses goroutine or coroutine because the iterator is insensitive to goroutine/thread. If it's the loop body, then Pull is not an issue because Pull works on iterators. In the default case of rangefunc, that does not explicitly do interesting things with goroutines (range functions can do things with goroutines, but it's not the default, and it looks like perhaps running yield (the loop body) in a different goroutine should be discouraged), this will also "just work", the range function runs in the same goroutine as the body and as its callers, so it all works there, too. The interesting case is the iterator. I think this breaks down into three subcases, which differ in how the deal with undoing LockOSThread.
In the first case, it's obvious that LockOSThread is needed, and thus the special rule applies, "don't pass the range function to other goroutines". This case is also sensitive to where the Pull function executes, so a coroutine implementation is preferred. 2a is the same. For 2b and 3, because the locked goroutine is hidden within the range function, it doesn't matter where the range function itself runs. I think the only tricky case is for an iterator that requires locking to an OS thread, but these are either obvious when they are created, or else insulate iterator value generation inside another goroutine. It seems like a range function that runs "yield" (the loop body) in a different thread than the outer caller is officially a bad idea unless this is documented (i.e., it has a purpose). I think it is also a bad idea if an range function calls LockOSThread (in the caller's goroutine, so, 2a) without unlocking before it returns to the caller. That implies a need for a balancing UnlockOSThread that is not necessarily provided. GoodStyle™ would be to lock the OS thread when the range function is created, not when it is called. This is the best I've got on this much time. I'll think about it some more, but maybe other people will have better thoughts, improve it, or shoot it down. |
Thank you for the detailed example. I believe I've identified where our "coroutine" semantics differ. Hopefully, the following explanation is clearer. First, assumptions:
Then my proposal: let the locked thread (if any) and its lock count follow goroutine switches between caller and iterator, for both rangefunc and The consequences are:
func main() {
// This iterator runs on the locked thread.
rng := func(yield func(struct{}) bool) {
go yield(struct{}{}) // Run the loop body with a different thread.
select{} // Or a blocking Cgo function.
}
runtime.LockOSThread()
for range rng {
// The loop body is *not* running on the locked thread!
}
}
With my proposal, the following happens:
By passing locked threads between caller and iterator, LockOSThread deadlocks are no longer possible.
The same happens for Q, leaving A locked to T. Now, B and C have no threads locked so P and Q will run on some arbitrary thread when called by them. |
This seems problematic:
Since P was previously locked to T1, and might have made that call expecting that subsequent calls would also be locked to T1, running on T2 seems wrong. However in my "how is this used anyway?" scenario, the iterator inside the pull function "should not" have exported (left unreversed) a call to LockOSThread; that should have occurred in the creating context, perhaps, and this is even something we can check for in the Pull implementation, and panic. Locking the OS thread when the iterator is created in A/T2 creates the opposite problem of "what happens in B/T1 when B calls P", and how do we distinguish that "bad" case from the "not bad" case where the iterator does not care that it is thread-locked, and just happens to be embedded within code that A called, and A was locked to T2 for other reasons? I am inclined to assert that if there is an iterator that requires thread-locking to work properly, that it must be coded to hide that from its callers by creating its own goroutine, locking that to an OS thread, and using it to generate values, which are then passed to the yield function in the caller's context (that way, the calling function can also require thread-locking for the body, perhaps even to a different thread). Probably such an iterator would be paired with a dispose/stop function and might also have a finalizer, to recover the goroutine and locked OS thread. So if I were to try to get closer to a proposal, Pull functions use coroutines, and if the locked count changes between entry and exit to the yield function (i.e., unpaired lock/unlock), panic, and if an iterator (a range function) wishes to run on a locked OS thread, it must create a separate goroutine for that purposes, the calling context is not guaranteed. |
This only applies to iterators that require a locked thread separate from their caller, correct? That is, iterators that must be locked to their caller's locked thread will just work like the equivalent rangefunc does.
Would rangefunc iterators also panic? If not, why not?
By "calling context is not guaranteed" do you mean "the calling context is what's calling |
I think so, including "if you pass the range function to a different goroutine, it will see a different context" (don't do that if you want it to run in a constant context -- in this situation the programmer is/should-be aware that thread locking is in play).
Probably not, and the reason is cost. We're one optimization and some boosted inlining away from range function iterators running really well, and all the checking code that gets inserted into the generated yield function constant-propagates away. Checking the locked-thread-state won't go away like that. We might include a more-checking mode for this, maybe we would always check it anyway.
Right. The underlying problem here is that LockOSThread is probably the wrong primitive, it would have been better expressed as One performance-related question -- if we declare that it is GoodStyle™ for range functions that need locking to create a separate locked goroutine and access that through channels, that's going to be even more expensive when passed to Pull. A problem with range function iterators is that they're opaque -- they don't provide the option of offering a cheaper implementation of Pull (e.g., a SliceIterator has a much cheaper implementation of Pull, but that's not available). It seems like in this case if you know that the iterator will require a thread locked goroutine, we might be better off inverting the relationship, have Pull be the default, and the range function derived from that. (I am not at all sure about this, but I don't like the pile of expensive layers.) |
The other reason that rangefunc iterators likely won't panic is that the range function is written by a human and really is "just a function". The compiler rewrite of
is
The most we could check is that thread locking state doesn't change between initialization and each iteration of the loop body, but the bare range function is just a function. And strictly speaking, nobody's required to use iter.Pull, a somewhat slower non-coroutine version would also work. I think the interesting question here is whether we want to sign up for the different, extra, and potentially useful semantics of passing OS-thread-locked state across a coroutine switch. Originally, this had just been a performance optimization. |
May I suggest not panic'ing for On the other hand, starting with the panic and relaxing it later is backwards compatible.
In addition, starting with a non-context-passing |
I'm not sure I understand the reflect example. Could you explain more about it, maybe with a concrete example? As I understand, rangefunc is just a function, so the way to use reflection to run a rangefunc loop is just |
Please ignore the reflect example; I failed to find an instance that is not a case of the general second example in my comment. That is, cases where (1) |
I am very sympathetic to wanting to implement range-over-func using The FAQ already covers that iter.Pull and similar approaches are not equivalent: https://go.dev/wiki/RangefuncExperiment#why-are-the-semantics-not-exactly-like-if-the-iterator-function-ran-in-a-coroutine-or-goroutine . Let me try to give an expanded version. This explanation is mostly outside of the scope of this issue, but I think it addresses what you are asking. There are two main proposed semantics/implementations for range-over-func statements:
The first implementation is function rewriting. This takes a range-over-function statement and constructs a new synthetic function that communicates across function boundaries via hidden state, calls the iterator function on the synthetic function, and reestablishes control flow after the function call. Adjusting what David had a bit:
What exactly those tweaks are is a bit complicated, but these should be mostly invisible to regular users. They are not invisible to compilers and interpreters though. The overwhelming advantage of this approach is that it is as fast as users expect. This is mostly accomplished by inlining The second implementation is to concurrently execute somerangefunction and to communicate between the concurrent routines. In practice, concurrent routines are implemented via goroutines in go. This leads to a candidate semantics based around iter.Pull or similar.
This is a bit over simplified, but the key point is There are an enormous number of additional posible tweaks to both of the above approaches ( These semantics do differ. Stacks and goroutines are observable in go. It is easier to see this with an example.
Let's suppose rand.Int() returns 1 first, so panic(x) happens. What are the stacks at the panic(x) call in the different semantics?
Because the stacks are different, the panic executes differently in these two cases.
So the value of count is an example of an observable difference in the two semantics. The root cause is that the goroutines and calls are structured differently in the two proposals, and this is observable. It is also worth considering a panic within the iterator function ('somerangefunc' in the example) also plays out differently. One semantics terminates the program. Also runtime.Goexit() within the iterator function also plays out differently. Exercises left to reader. This is not even getting into whether runtime's stack traces are considered observable. The runtime library is particularly prone to having functions like LockOSThread that make them even more observable. In the end, range-over-func's were chosen to be equivalent to the function rewrite semantics. This is mostly due to efficiency (and partially debugging and partially robustness). So users of range-over-func should have the function rewrite semantics in mind when they use it (the simple version above), not You can then ask can anything be done to narrow the divide. Maybe. But I would recommend against trying to do this on the issue tracker. You need to poke some fundamental stuff like "what is a stack in Go?", or "what if there was a second kind of coroutine in the runtime?" and do this in a backward compatible fashion. I recommend taking a few days [or weeks] and try to come up with something offline. I hope this helped. With background out of the way, on to specific questions.
ssa/interp does not need this. The ssa package does the function rewrite so the interpreter did not need serious modification. It did need modification to handle defer elegantly. (Alternatives for the defer changes exist but they are really ugly.)
It is not trivial. And it probably will be difficult. Range-over-function is a big language change enabled by compiler magic. This places a burden on tools sensitive to detailed semantics like compilers and interpreters. FWIW ssa/builder.go is now ~17% more lines of code.
I do not know how to do this. Maybe someone more clever than me has already solved this? Based on my own experience updating an interpreter, one can get something 90% of an implementation of range-over-func using iter.Pull, but it will not end up 100% equivalent to the function rewrite semantics. And interpreters need to be >= 99% to be useful. The stacks are different and trying to work around this is hard. range-over-func is a big language change, and I expect interpreters to need to do a big adjustment. I cannot accurately predict what these will be for other interpreters. Interpreters likely need to understand the code transformation to understand the semantics of a range-over-func statement. A good place to start is the comments here https://go.googlesource.com/go/+/refs/heads/master/src/cmd/compile/internal/rangefunc/rewrite.go (and the tests in rangefunc_test.go). Interpreter maintainers can then determine what support they need from the Go standard library to implement range-over-func in their tool. They can file proposals requesting these features. FWIW I suspect what they need is to attach state equivalent to
Fair enough. But hopefully it is still useful enough.
Maybe try to rephrase how you think of this requirement as 'On the statement |
Thank you very much for your detailed comment. Your section about function rewriting and Go interpreters made me realize that function rewriting is irrelevant for Given
I tried and failed to understand this rephrasing. Can you elaborate? What I'm proposing is not that two goroutines must run on the same locked thread. I'm proposing that I don't have enough knowledge of the Go runtime to know whether such behaviour is easy to implement, but I presume the runtime scheduler is already involved every time a goroutine sends or receives from a channel, which is what |
Seems like I misunderstood what you were proposing. I will let others comment on the plausibility of implementing this. The internal I will point out that |
Indeed. On the other hand without cooperative scheduling, a caller of So it seems to me it's a trade-off between two kinds of sharp edges. |
With the freeze approaching, we need to lock in an answer to this issue. Building on everyone's input, this post explains what we plan to implement for this release. We're sure this won't please everyone, but we believe this handles the major use cases efficiently, we know how to implement it in the next few weeks, and it's flexible enough that we believe it can be relaxed in the future if necessary. One thing to keep in mind here is that LockOSThread is a spanner in the works; this is not first-class semantics, this is something that Go does to support thread-linked state in operating systems, which in turn is mostly an artifact of C lacking closures, the resulting proliferation of global state, and the collision of that global state with threads and multiprocessors. Explicit use of LockOSThread gets second-class support here. Iterators over state that requires LockOSThread will either not work with Pull, or will require an extra layer of pushing results back and forth to an additional (locked) goroutine. The second thing to keep in mind is that Pull iterators, even with coroutines, are slower than plain iteration. Thus, we expect people to use Pull iterators only when necessary. Their primary use case is operations over 2 (or more) iterators, and because of their higher cost, it's likely that these binary (or more-ary) operations will not even be symmetric; they'll iterate normally over one operand, and Pull the others. TLDR, what we propose to implement is:
The effect of this is that the caller of iter.Pull may be on a locked thread, as long as it doesn't lock or unlock the thread and then call The advantage of this proposal is that it maintains good performance composability. That is, Pull iterators remain fast in the presence of the two oblivious uses of LockOSThread.
The disadvantage of this proposal is that if someone wishes to write an iterator over some OS thread state, if that iterator should work properly when converted to a Pull iterator, then either
And |
*cough* I would like to make you aware that IIRC, Docker/runc was one of the prominent use cases to introduce If anything, you'll need to put up a big, very bright flashing warning sign in Go's documentation to prevent contributors to say, any of the existing container engine/executors code bases to infect them with unexpected and difficult to diagnose bugs -- calling this "second class support" is a totally misleading euphemism for the problems this can and probably will cause. |
@thediveo , yes, you're right of course that |
Change https://go.dev/cl/583675 mentions this issue: |
Problem
The current implementation of
iter.Pull
in the rangefunc experiment creates a new goroutine and uses an optimized goroutine switch path during iteration.The problem with this implementation, as it stands, is that it interacts poorly with
runtime.LockOSThread
. If the goroutine spawned for the iterator (let's call it the iterator goroutine) callsruntime.LockOSThread
, first and foremost, that call will not be respected. For example, if thenext
function is passed to a different goroutine, the runtime will not schedule the iterator goroutine on the thread it was locked to. Bad things might happen. Simultaneously, even in the easy case, if the iterator goroutine always switches with the same calling goroutine on the same thread, then once that iterator goroutine exists, it will take down the current thread with it. This is expected, but this happens before the calling goroutine gets placed back into the scheduler's ownership. The calling goroutine instead stays blocked forever with no hope of executing it again.One possible solution
The most straightforward fix to this is to fall back to a slower kind of switching of goroutines if either the iterator goroutine or the calling goroutine locks itself to a thread. This slower kind of switching would probably be done with channels, and would have semantics that are equivalent to the existing implementation of
iter.Pull
. Because at least one of the goroutines is locked to a thread, the overhead of OS context switching also comes into play, so this implementation would likely be dramatically slower.This fix, then, has a really unfortunate performance cliff. One could imagine that a UI library (for example) locks a goroutine to a thread because it needs to interact with C libraries that require staying on the main thread. Using
iter.Pull
on such a goroutine would be a very, very bad idea if this workaround was adopted. The resulting performance regression would be very surprising.The proposed solution
Discussing with @aclements, @dr2chase, and @cherrymui, we have an alternative fix that avoids the performance cliff. Specifically, we would restrict iterators from being able to call
runtime.LockOSThread
by panicking if called from an iterator goroutine (the runtime handles the iterator goroutine's creation, and thus can enforce this). Meanwhile, the calling goroutine (the caller ofnext
, that is) is fully free to be locked to an thread.If the iterator is prevented from locking itself to a thread from the iterator goroutine, then it's totally fine for that iterator goroutine to just execute on the locked calling goroutine's thread. That iterator goroutine effectively gets locked to the same thread while its executing, and we know that it will eventually switch back to the calling goroutine. It's fine for the calling goroutine to send
next
to another goroutine as well, even if that other goroutine is locked, because it can only execute once the iterator goroutine has ceded control of the thread (by the semantics and implementation ofiter.Pull
).One hazard of this approach is that the iterator, running on the iterator goroutine, may care about thread-local state, and threads locked to goroutines tend to be locked precisely because there is some important thread-local state that needs to be preserved. However, if the iterator goroutine really cared about thread-local state, it should call
runtime.LockOSThread
, which in this case, will panic. That does mean that the iterator is slightly restricted in what it can do, but we feel that it's better for the code author to find that out rather than deal with some subtle issue related toruntime.LockOSThread
and local thread state not being what they expect.Furthermore, we think it's very unlikely that an iterator will want to lock itself to a thread anyway, so the extra restriction is unlikely to matter to the vast majority of Go users. Simultaneously, making this choice lets us keep
iter.Pull
fast for many more (more realistic) use cases, lets us avoid the pain of maintaining two implementations ofiter.Pull
, and prevents surprising performance cliffs for users ofiter.Pull
.The text was updated successfully, but these errors were encountered: