Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upIncompatibility of Rust's stdlib with Coroutines #33368
Comments
This comment has been minimized.
This comment has been minimized.
|
You could add a yet another way to panic as per rust-lang/rfcs#1513, in a way which circumvents any |
This comment has been minimized.
This comment has been minimized.
|
If |
This comment has been minimized.
This comment has been minimized.
|
@nagisa Would that work across crates? I'm also not sure if that would be easier than simply splitting up @arielb1 Yeah it would solve the reference problem, but then it would set the |
This comment has been minimized.
This comment has been minimized.
The RFC requires only one panicking mechanism to be used in the final binary.
If we want to think about that at all we gotta act fast, because 1.9 with a stable
Something towards this would be my preferred solution, but keep in mind that |
This comment has been minimized.
This comment has been minimized.
|
I don't think another panic behaviour is the right choice here @nagisa, since changing the "obversable" behaviour of panic isn't really what coroutine implementations need. Rather the API or the implementation has to change. Maybe I misunderstood your idea though... Hooking into Apart from that I think that @arielb1's idea is quite good and could be implemented right now without causing any regressions. |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
May 4, 2016
•
|
I still cannot get the idea behind the If we can remove the I don't know why we need to accept panic in |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@sfackler Yeah making it a bool wouldn't solve it probably. But why is this implemented in the "runtime" instead of doing it like C++ with it's EDIT: Thanks GitHub for putting the close button right beside the comment button, without adding any confirmation dialogue. UI Design? Anyone? |
lhecker
closed this
May 4, 2016
lhecker
reopened this
May 4, 2016
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
May 4, 2016
•
|
@sfackler Well, yes. Changing it to bool won't solve all the problems. But, at least, thread_local! { pub static IS_PANICKING: Cell<bool> = Cell::new(false); }
// Here is a function that will be called when panic! happens
// Just like the one in https://github.com/rust-lang/rust/blob/master/src/libstd/panicking.rs#L198
fn on_panic(obj: &(Any+Send), file: &'static str, line: u32) {
let is_panicking = IS_PANICKING.with(|s| {
let orig = s.get();
s.set(true);
orig
});
if is_panicking {
// Abort right here
util::dumb_print(format_args!("thread panicked while processing \
panic. aborting.\n"));
unsafe { intrinsics::abort() }
}
// ...
}
// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L159
pub fn panicking() -> bool {
IS_PANICKING.with(|s| s.get())
}
// The catch_panic
// https://github.com/rust-lang/rust/blob/master/src/libstd/sys/common/unwind/mod.rs#L131
unsafe fn inner_try(f: fn(*mut u8), data: *mut u8)
-> Result<(), Box<Any + Send>> {
if panicking() {
// It should not be allowed (to catch panic while panicking)
unsafe { intrinsics::abort() }
}
let mut payload = imp::payload();
let r = intrinsics::try(f, data, &mut payload as *mut _ as *mut _);
// Clear the flag because we already caught the panic
IS_PANICKING.with(|s| s.set(false));
if r == 0 {
Ok(())
} else {
Err(imp::cleanup(payload))
}
}
I think this solution can also fulfill this purpose and |
This comment has been minimized.
This comment has been minimized.
Because unwind in Rust is always a cold path and we do not want to generate extra drop glue just to accomodate for unwinds from drop glue.
Because it has perfectly sensible use-cases. |
This comment has been minimized.
This comment has been minimized.
Hmm... Your statement sounds reasonable, but is that based on actual benchmarking?
I kind of don't understand that though, because if your destructor can panic, it will one time correctly unwind the thread and one time it will crash the whole program, because the destructor was called as part of a ongoing unwind. I can't say I like that "wonkyness". If you have time: Would you care to point out a valid use case? It's purely optional and out of interest though. |
This comment has been minimized.
This comment has been minimized.
|
@lhecker The compiler's own rust/src/libsyntax/errors/mod.rs Lines 354 to 364 in 3157691 |
This comment has been minimized.
This comment has been minimized.
Implementing nounwind drop glue would simply double the size of binary code used by drop glues. There’s nothing to benchmark here.
One case, as pointed out by @jonas-schievink, would be ensuring that all objects go out of scope in a valid state. You cannot drop a You still do not want these to just Basically, IME, disallowing panicking in |
This comment has been minimized.
This comment has been minimized.
Aaaah... That makes sense! That's actually a really good argument for panicking drops. Thanks! |
lhecker
changed the title
Incompatibility of std::panic with coroutines
Incompatibility Rust's stdlib with Coroutines
May 6, 2016
lhecker
changed the title
Incompatibility Rust's stdlib with Coroutines
Incompatibility of Rust's stdlib with Coroutines
May 6, 2016
This comment has been minimized.
This comment has been minimized.
|
I edited the issue because @alexcrichton finally brought me to my senses regarding the widespread use of TLS in the stdlib. There is actually a specific reason why I always thought that it's not an issue and easily solvable but I'm too emberassed to disclose that dumb idea. |
This comment has been minimized.
This comment has been minimized.
How much code is used by drop glue? It might even be worth it to even do RTTI-based drop glue for unwinding. |
This comment has been minimized.
This comment has been minimized.
|
@arielb1 I’m not sure if I did it correctly, but it seems like librustc has about 53591 bytes of drop glue. libsyntax = 33031B. (command used: |
This comment has been minimized.
This comment has been minimized.
|
That's small potatoes. |
This comment has been minimized.
This comment has been minimized.
|
@arielb1 I wouldn’t call librustc drop-glue intensive, though. Far from it. |
This comment has been minimized.
This comment has been minimized.
|
I still do wonder why the binary size would blow up that much though... Since all the "nounwind drop glue" would do is to literally call a single method. Isn't that just one EH entry plus one Well if the exception handling in Rust would be a bit less "cumbersome" as it is now it would already solve most of the problems with coroutines. All that would be left after that is afaik I think the only other option is to add a compile time option akin to RFC 1513. But if we did that I might as well write a RFC to add full opt-in coroutine support to Rust because the difference in effort is probably negligible. (In fact |
This comment has been minimized.
This comment has been minimized.
If the function (e.g. drop glue) marked as nounwind actually begins unwinding, you have undefined behaviour, therefore you must replace all the occurences of panicking in the nounwind glue with some other non-terminating side effect (e.g. This basically means the compiler would have to generate two kinds of drop glue:
This is basically a 100% or close to 100% increase in drop glue size. |
This comment has been minimized.
This comment has been minimized.
|
Uhm... Why don't you just call abort in You can see it here as LLVM IR: http://llvm.org/docs/ExceptionHandling.html#new-exception-handling-instructions |
This comment has been minimized.
This comment has been minimized.
|
The problem is that then, any panic in a destructor will cause the program to OTOH, with the current situation, if there is a |
This comment has been minimized.
This comment has been minimized.
I think they mean we should abort on unwind edges from drop glue that is executed as part of unwinding already.
That’s a fair point. I guess there’s a few factors here:
|
This comment has been minimized.
This comment has been minimized.
But then won't we need the double drop glue? Either
Doubling up all drop-glue will probably require Rust 2.0 - because the destructor for |
This comment has been minimized.
This comment has been minimized.
I'm unfortunately not that familiar with the terminology in this community so I don't fully understand what you meant with that. (I'm sorry for that.) But I think @arielb1 understood me correctly. To say it as plain as possible again: I do like C++'s exception rules for destructors more than Rust's complicated system in
Yeah... I understand why panicking destructors are a thing that can be quite useful. So it's not like I don't understand it. But it still just feels so incredibly "wrong" that it's possible in a "safe" language for a program to one time survive a thread unwind (after which it could spawn it again) and one time where the whole process crashes. And all of that is decided on a "whim" over wether a destructor is run as part of the regular flow or as part of a active unwind. You know? That's just doesn't deterministic and I personally think that determinism should be something very valuable in programming languages. |
This comment has been minimized.
This comment has been minimized.
|
BTW: Using libcore is absolutely unacceptable since pretty much every crate uses libstd. And removing TLS is no small feat, since it does require modifying the landing pads of destructors, which has not been endorsed yet (and probably would get rejected at this point), while still very likely requiring a huge amount of time to get into rustc's code base to make that change (which - as I said - I'd gladly do but only if it's merged at the end). |
This comment has been minimized.
This comment has been minimized.
Choosing a solution for non-trivial problems is exactly what the RFC process is for. I’m pretty sure a proposal to reimplement standard library parts which currently use TLS to use something else would be accepted provided thread-safety and speed are not compromised on. The question is what you’d replace TLS with? I, personally, have no idea provided the requirements.
The way I see it modification of codegen is the least of the problems (if necessary at all). Removing TLS doesn’t imply doing that either. While the discussion naturally flowed towards the destructors, libstd also uses thread locals in ways which aren’t related to panicking. You’ll need to fix these as well, at which point the
You can have your own |
This comment has been minimized.
This comment has been minimized.
I do believe that this is a hard change, but depending on the solution for this it's not a "substantial" one. The only thing right now that might require a RFC would be "abort on panic inside destructors", since it does change semantics (where you can safely panic). (I don't think that this will affect many people though). I do think that I should have opened this issue in the RFC repo though.
Which is why it'd be cool if @rust-lang/libs could (finally) say something or at least give some hints about this.
It's only
Thanks! Didn't knew that. |
nagisa
added
the
T-libs
label
Jun 1, 2016
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I'm just gonna take a moment to point out that 64-bit Windows has UMS threads, which give you proper OS threads that support TLS and such, but using a user mode scheduler so you effectively have coroutines, and also have other advantages like returning control to your scheduler whenever you block in a syscall. Doesn't solve the problem on other platforms though. |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
Jun 2, 2016
Well, obviously this is not a good solution. I am happy to make any PRs for this, but what I am asking for is Rust's team to tell me which solution would be acceptable. |
This comment has been minimized.
This comment has been minimized.
|
I'd be in favor of either
Again related to 'compilation scenarios' you could easily imagine a global switch that turned on 'slow' TLS and did the right thing everywhere. |
This comment has been minimized.
This comment has been minimized.
Is it really so unlikely that coroutine aware TLS could be achieved? I can see UMS becoming a thing going forward which would make this much easier so maybe it's not the worst idea in the world to investigate this? |
This comment has been minimized.
This comment has been minimized.
|
What's the "dependency graph" of TLS within the standard lib? That is:
Without this information explicitly stated it is very hard to know which way is the best way forward. In my opinion the best solution to this problem is not to fork the standard library, but to:
Depending on how big is the dependency graph of TLS in the standard lib, we might be able to split that out in a EDIT: the objective should not be to encourage people to fork libstd to implement green threading, but to make it really easy to implement different green threading solutions as a library that can work "as seamlessly as possible" with libstd. |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg TLS is not a global variable, it's very much local. You could think of it like a hidden parameter that is passed to all functions. |
This comment has been minimized.
This comment has been minimized.
|
Isn't the TLS variable available during the whole lifetime of a thread? (independently of in which function your are)? (in C++ is at least so). Need to read more on Rust thread local variables but I assumed they would work the same. |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg global variables have to deal with concurrency, TLS does not. TLS gets away with a |
This comment has been minimized.
This comment has been minimized.
|
Is a closure using a thread local variable EDIT: and if that is the case, does the compiler treat TLS as volatile (i.e. does it reload the address of the TLS variable on every access)? |
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg Right now TLS does not have to be treated as volatile because during execution of a function you'll never suddenly be on a different thread. If a closure is sent to a different thread then it'll see the TLS of the thread it was sent to and is running on. |
This comment has been minimized.
This comment has been minimized.
@retep998 Right, but if a coroutine yields before finishing, and gets sent to another thread where it resumes execution, when it access TLS it will see the values of the thread it was sent from (which might not longer exist! and is unsafe!). I can only think of two different situations involving coroutines and TLS, and none of them make sense to me in practice.
In Case A, when execution is resumed, the TLS variables still refer to the storage of the thread the coroutine was migrated from. There are two options, either we update them to refer to the TLS the coroutine was migrated to, or we prevent sending coroutines that access TLS between threads. I think that updating TLS to refer to completely different storage on resumption is a recipe for disaster since it makes very hard to reason about what is going on. So in my opinion, the best thing would be to forbid Case A completely by forbidding coroutines that access TLS from being migrated between threads. That probably means making thread local storage Case B makes no sense to me either, since in this case the variable should not be thread local in the first place. Still, Case B is safe as long as the thread the coroutine was migrated from outlives the thread the coroutine has been migrated to. So this could be allowed if the lifetimes can be enforced. |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
Jun 29, 2016
•
|
Well, it is not a wise move to forbid all coroutines from using TLS values. Adding that constraint on TLS value will make users confused and it also becomes an obstacle for users to migrate their programs to use coroutines. For your first case, there is a good solution: Make TLS to be volatile. If a coroutine is switched out and migrated to another thread, then when execution is resumed in another thread, all subsequence TLS value access will go to the TLS that coroutine was migrated to. We can do that by adding a flag to compiler. |
This comment has been minimized.
This comment has been minimized.
I suggested adding volatile above as well but I don't like that myself: at every suspension point of a coroutine, every time the corrutine is resumed, all the values of all TLS might have changed. I think this will make it really hard to reason about what a coroutine is going to do since it is impossible to reason about its current state (think about a generator, suspending inside of a loop, where at every loop iteration you might have different TLS values). I am pretty sure that there are valid use cases for wanting to silently update TLS on coroutine resumption, but I think that forbidding it will prevent hard to debug user errors (the scheduler might migrate coroutines at will), and when the user really wants the coroutine state to change on resumption, there are more explicit alternatives (e.g. get a handler to the thread in which the coroutine is currently running and use that to access some "thread local" state). Another reason to be against making all TLS volatile, even when implemented as opt in via a compiler flag, is that it pessimizes all the code that access TLS and that is not migrated between threads. Still, the most important reason is being able to reason about the code. In C, C++, and D, migrating a coroutine/fiber that access thread local storage between threads is undefined behavior, and they cannot catch it at compile time. Is there any low level language that allows migrating coroutines that access thread local storage between threads? What semantics do they chose? |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
Jun 29, 2016
|
As far as I know, non of them (system programming languages) allows migrating coroutines that access thread local storage between threads. Take Go as an example, the only way you can use TLS is by cgo, and they run FFI calls in a separate thread (can't be sure). Your case about suspending inside of a loop is very convincing! In that case, making TLS volatile is not a good choice, or say, it may make the result of program wrong completely! Back to this issue, is there a way to get rid of those TLS usages in libstd? If we want to make TLS |
This comment has been minimized.
This comment has been minimized.
|
So I just checked and in C++'s Coroutines Technical Specification reading a And no, You initiate the coroutine on a particular thread. The coroutine runs on that thread until its first suspension point. Then it gets suspended. When you resume the coroutine (in whatever thread you decide to do so), resuming the coroutine is just a function call that calls the system scheduler. The system scheduler then "possibly" migrates the coroutine to a different thread, which resumes the coroutine by calling a function that continues after the suspension point of the coroutine. When the coroutine after this point reads a For this to work the compiler only needs to avoid caching / reordering reads of |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
Jul 24, 2016
|
It seems that someone is going to add coroutine support directly to LLVM, which means that it is possible to tell LLVM not to inline TLS calls between context swaps. https://internals.rust-lang.org/t/llvm-coroutines-to-bring-awarness |
This comment has been minimized.
This comment has been minimized.
|
I think that Rust should never support M:N goroutines as Go implements them. This was decided a long time ago. Kernel-assisted UMS-style solutions should be fine, however. |
This comment has been minimized.
This comment has been minimized.
|
@pcwalton Your comment literally left @zonyitoo and me speachless... P.S.: Just call them suspend-down coroutines. |
This comment has been minimized.
This comment has been minimized.
zonyitoo
commented
Jan 13, 2017
|
@pcwalton I can totally understand the reason Rust's team does not want to support coroutines officially. But could you please open a door for us to give it a try? Or could you please give us a chance to implement anything like I admit |
This comment has been minimized.
This comment has been minimized.
|
To be clear, the door isn't closed to non-callback style. A lot of folks do want JS-style generators in the language. That, and/or @pcwalton's comment was about the Go model specifically. The Go model does have plenty of costs associated with it that all programs will have to pay. Rust can solve the same problems without implementing the Go model. I personally am hoping for generator syntax or async/await to clean this up. The door is open here; you need a better proposal than "stop using TLS in the standard library" (or "mark all TLS as !Send", which is not backwards compatible). This is very similar to the issues we had with libgreen in the first place; folks had to pay an extra cost for it even if they didn't need green threads, which is antithetical to Rust's zero-cost-abstraction philosophy. Something like Brian's proposal would work (#33368 (comment)). There are other proposals in this thread (some of them yours) that might work as well. I suggest making a comment listing all the viable proposals with their pros and cons, and perhaps making a discussion post on internals.rust-lang.org to figure out what folks like best. Then, make an RFC. (Discussion of proposals on Rust issues rarely gets anywhere, Rust issues do not have that kind of visibility. This issue tracker is for tracking implementation work that needs to be done on rustc itself, where the user-facing design decisions have already been made.) |
This comment has been minimized.
This comment has been minimized.
That's the point I don't get. You can't implement 1:1 scheduling on top of N:M anyways, so the discussion if Go's model is fit for Rust is out of the window anyways. This is only about N:M scheduling and coroutines specifically and can be implemented as a library on top of 1:1 scheduling without hurting the performance of anyone else whatsoever.
@Manishearth Can you give me an idea what that might be? The Rust stdlib comes prebuilt, which makes it impossible to fix the TLS problem in a way that's comfortable for Rust users. Is there possibly anything I missed (seriously)? P.S.: |
This comment has been minimized.
This comment has been minimized.
You could possibly have your own marker trait that works similar to A pluggable TLS is a viable solution, and you can try to flesh that out into a pre-RFC. The problem with an "alternate" stdlib is that it can end up being incompatible with large parts of the ecosystem, which we don't want. Even a flag for volatile TLS sounds like it could work here, though I'm not sure. The change has to be one that can't affect existing libraries, and if it's a flag or pluggable solution the only effect it can have on existing libraries is a performance difference. That's the standard anything of this form has to go through. There are probably other solutions this thread hasn't explored. |
Mark-Simulacrum
added
the
C-bug
label
Jul 25, 2017
This comment has been minimized.
This comment has been minimized.
|
Triage: we have generators as an experimental RFC, implemented in nightly. |
lhecker commentedMay 3, 2016
•
edited
The issue
thread_local!is used in the stdlib, which does not work well with Coroutines. Repeated access to a TLS variable inside a method might be cached and optimized by LLVM. If a coroutine is transferred from one thread to another this can lead to problems, due to a coroutine still using the cached reference to the TLS storage from the previous thread.What is not the issue
TLS being incompatible with coroutines for the most part (e.g. here) is well known and not an issue per se. You want to use
rand::thread_rng()with coroutines? Just userand::StdRng::new()instead! Most of the time it's just quite easy to circumvent the TLS by simply using something different. This is not true for the stdlib though. One way or the other you're using it somewhere probably.Possible solutions
thread_local!. I think that this could be hard to achieve in a performant way though.PANIC_COUNTand it's wonky implementation and still make entirely sure that a stack is unwound twice. Other uses of TLS inside the stdlib could be wrapped insideinline(never)without causing large overheads.I hope we can find a solution for this as this is really a huge problem for using stackful coroutines with Rust and who doesn't want "Go" but with Rust's syntax, eh?😉