Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Stabilize catch_panic #1236

Merged
merged 6 commits into from
Aug 31, 2015

Conversation

alexcrichton
Copy link
Member

Stabilize std::thread::catch_panic after removing the Send and 'static bounds from the closure parameter.

Rendered

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Aug 4, 2015
@sfackler
Copy link
Member

sfackler commented Aug 4, 2015

I'm on board with this, but catch_panic's placement in std::thread seems more of a historical artifact of its old life as task::try than anything. It doesn't make a ton of sense to put it there since it really has nothing to do with threads.

Maybe std::ffi would make more sense? It'd also make it a bit more explicit that it should not be used as a general purpose mechanism.

@alexcrichton
Copy link
Member Author

I personally feel that it's more appropriate in std::thread than std::ffi because it's not solely related to FFI (e.g. the thread pool use case). I feel that std::thread is relatively appropriate as it's a "thread related" construct in the sense that catching a panic prevents the thread from being torn down.

That being said I'm not 100% comfortable with std::thread, it just seems to me as the most appropriate module for now given the standard library's current set of modules.

@pythonesque
Copy link
Contributor

Big 👎 on removing the bounds. I think I've made my position clear numerous times on this one in other RFCs. The drawback (watering down Rust's exception handling story) is a big drawback. Stabilizing it with the bounds does not make it backwards incompatible to remove them later (or if it does, it's in a pretty subtle way).

And yes, before you respond, I'm well aware that Rust already has exception unsafety. But it only really has to be dealt with in destructors and unsafe code right now (plus some misuses of global state). The current variant with the Send + 'static bounds adds some new things to that (notably, thread locals) but already suffices for the main purposes people need it for (catching on FFI boundaries and preventing an outright crash; thread pools needing unsafe code isn't a huge deal, IMO). I don't think "we can't be perfect at this" is a good reason to remove it entirely.

(On an unrelated note, I do wish that thread local sharing could be mitigated as well, but I don't think it can be done backwards compatibly).

Finally, I'd add that stabilizing this has very serious consequences around allowing panics to turn into aborts (which I definitely want). This makes it far too easy to write code that visibly changes behavior under the two options. If the idea is that by accepting this RFC in any form we would be closing off that ability, I can't in good conscience say the RFC should be accepted at all. (Please don't respond to me that you can already do this with unsafe code--I know that. Virtually nobody does it in practice).

Edit: I see that the abort thing was specifically addressed in the RFC. But I don't think I found the way it was phrased ("people will know that we plan to do this eventually and therefore won't misuse panics") too comforting.

@mcpherrinm
Copy link

I very much want to see catch_panic stabilized.

I see arguments to both sides about the bounds. What if we stabilize a bounded version and an unsafe, unbounded version? This does nothing to address the TLS subverting concerns, but could plausibly help in some cases. Of course, offering two functions is mental overhead (and requires a name for the unbounded & unsafe function).

std::thread doesn't seem like the right place. I'd almost want to see an std::panic, but I'm not sure what else would belong there.

@kentonv
Copy link

kentonv commented Aug 4, 2015

Commenting as someone who has lots of production C++ experience (7.5 years Google, 2.5 years Sandstorm.io / Cap'n Proto) and would very much like to switch to Rust.

Here's the real-world facts that need to be dealt with:

  • All code can fail, because all code can have bugs.
  • Obviously, we don't want every single function everywhere to return Result<T, E> as a way to signal arbitrary "I had a bug" failures. This is what panic is for.
  • In a server handling multiple concurrent requests from different users, it is not OK to kill the whole server when some code fails. Servers need to be fault-tolerant. Only the request in which the error occurred should fail. If you want to "fail loudly", arrange monitoring such that someone gets paged when such a failure happens; don't return errors to 100 unrelated live users.
  • In an event-driven server, failing a thread is approximately as bad as failing the whole process in this regard, as each thread handles many concurrent requests.

Therefore: We need a way to isolate failures just to the individual event and its dependents, without harming unrelated events.

We can do this with catch_panic today, but it will force every event callback to act like it's in a separate thread, using redundant synchronization when accessing shared data structures. The synchronization is redundant if the event loop is single-threaded or otherwise guarantees that events targeting the same object will be handled in only one thread at a time. Normally this is one of the big advantages of "actor model" programming: that events are effectively atomic and need not synchronize. Forcing synchronization not only forces redundant boilerplate on programmers but also hurts performance, as those atomic ops aren't cheap even when not contended.

So for Sandstorm and Cap'n Proto, we really need this -- or some other as-yet unknown solution to the above problems.

@Gankra
Copy link
Contributor

Gankra commented Aug 4, 2015

My thoughts on this matter:

First and foremost, the bounds on catch_panic are about stronger exception safety than memory safety. All rust programs must already be sufficiently exception safe that the inclusion of an "anything goes" catch_panic is trivially memory safe, because you can arbitrarily emulate the "observing bad state" consequences of catch_panic with destructors further down the stack.

Therefore no matter what we do catch_panic should never be unsafe. It is safe. It has to be safe. We've already had this argument with mem::forget, and strict usage of unsafe won.

Second, my current model of Rust's error handling story is that everyone should endeavour to never panic if correctly written, but also be ready for everyone else to panic (modulo unsafe code which may need to assume certain operations don't panic).

From this perspective, catch_panic simply makes this story less painful. It gives you a new tool to be ready for panics with, without having to spawn a whole OS thread. Once you've removed the OS thread,
Send and 'static seem entirely arbitrary to me. Also it's not like adding catch_panic makes exception safety any more pervasive unless you explicitly use it. If you personally want the bounds, you can even
write your own wrapper around catch_panic that has this requirement, and forbid using otherwise in your projects. Alternatively you could lint against sharing/mutable-sharing/internal-mutability.

I do not want to see a scheme where we encourage people to unsafely transmute their types to be 'static just to handle panics properly.

I do not believe catch_panic will make panics into a proper try/catch error-handling scheme for several reasons:

  • Panics can become aborts. First, explicitly as a compiler flag in the future. But also double-panics, destructor panics, and "extra extreme" boundary conditions (Vec allocating may panic or abort based on the circumstances).
  • Panics are optimized for the "won't panic" case. The current implementation is zero-cost at runtime if your program never panics. This results in actually panicking being slower than in e.g. Java.
  • Panics are information-poor. Results are much more useful in general.

@dgrunwald
Copy link
Contributor

Big 👍 on removing the bounds.

catch_panic is most useful at FFI boundaries to prevent panics from unwinding into non-Rust code. FFI boundaries often deal with raw pointers *mut T, which are not Send.
Keeping the bounds around would make the primary usage of this API unergonomic, as it would force all FFI code to wrap raw pointers in newtypes just to transport them into the catch_panic closure.

@kentonv
Copy link

kentonv commented Aug 4, 2015

Panics can become aborts. First, explicitly as a compiler flag in the future. But also double-panics, destructor panics, and "extra extreme" boundary conditions (Vec allocating may panic or abort based on the circumstances).

FWIW, C++'s behavior of turning double/destructor-exceptions into aborts is a huge thorn in the side of fault-tolerant programming. If a destructor panics during unwind, the right thing to do (in the name of fault tolerance) is merge the two panics into one panic and keep unwinding. C++ can't really do that because its exceptions are typed and intended to be caught by type, making it unclear how to "merge" them. Rust panics are not typed and not intended to be machine-decoded at all, so you can define whatever arbitrary merge function you want, e.g. "just concatenate the description strings".

I think having a compiler flag that turns panics into aborts is a great idea, though. It would arguably make sense to use such a compiler flag for batch jobs (e.g. MapReduce) but not use it for interactive servers.

@pythonesque
Copy link
Contributor

Therefore no matter what we do catch_panic should never be unsafe. It is safe. It has to be safe. We've already had this argument with mem::forget, and strict usage of unsafe won.

And we've already had issues with Rc and RefCell because we weren't able to think through all the consequences of making it safe. This isn't law, and we don't need to decide all cases based on precedent (and FWIW, I was a big proponent of making mem::forget safe). It's entirely possible that "able to experience exception safety in destructors" is qualitatively different from doing so with catch_panic.

The reason for the existing bounds was that anything you can see through Send + 'static can already experience exception safety issues. So far the strongest argument I've seen is that Send is too strict because it forces the use of atomic operations with *mut T. What about removing Send but retaining 'static? That at least means you shouldn't be able to see issues through local variables.

@alexcrichton
Copy link
Member Author

@pythonesque

I think I've made my position clear numerous times on this one in other RFCs. The drawback (watering down Rust's exception handling story) is a big drawback. Stabilizing it with the bounds does not make it backwards incompatible to remove them later (or if it does, it's in a pretty subtle way).

It would be helpful to me to at least copy/paste a few of your thoughts here, and perhaps even word them in the context of this RFC. Some discussion at lunch today made me realize that the wording around the rationale here was a little light, so I've pushed an update which expands a bit on why I believe the bounds should be removed.

And yes, before you respond, I'm well aware that Rust already has exception unsafety. But it only really has to be dealt with in destructors and unsafe code right now (plus some misuses of global state).

As stated in the RFC, however, this isn't actually quite true. Broken invariants can be exposed through types like Mutex where antidotes to the poison are available. Additionally it's key to remember that exception safety is not just dealing with memory safety, logical invariants are often just as important to uphold. As a result only limiting the scope of exception safety to unsafe code doesn't quite reflect the state of Rust today.

Finally, I'd add that stabilizing this has very serious consequences around allowing panics to turn into aborts (which I definitely want).

Can you elaborate a bit on why you're worried here? I also very much want to have a panic-to-abort option to the compiler, and I'm under the impression that this RFC does not affect this design much. The only consequence I can think of is that catch_panic turns into just a function call instead of setting up a "catch" block.

What about removing Send but retaining 'static? That at least means you shouldn't be able to see issues through local variables.

This is certainly a possibility, but I've tried to articulate in the RFC why I believe 'static should be removed as well as Send. Could you take a peek at some of the new text?


@mcpherrinm

What if we stabilize a bounded version and an unsafe, unbounded version?

@gankro explained this as well, but because exception safety doesn't directly cause memory safety without otherwise having unsafe blocks it goes against the design principles of the standard library to mark this function unsafe. Unsafe functions in the standard library are only annotated when they can directly cause memory unsafety in otherwise safe situations, and this isn't quite the case for catch_panic.

std::thread doesn't seem like the right place. I'd almost want to see an std::panic, but I'm not sure what else would belong there.

There's actually some possibility for perhaps filling out std::panic over time, for example:

  • std::panic::catch
  • std::panic::is_panicking (moved from std::thread)
  • std::panic::panic (implementation detail of the panic! macro)
  • std::panic::panic! (actual panic! macro)

That being said we could always put it in std::thread and then perhaps deprecate it to another location at a future date.


@kentonv

So for Sandstorm and Cap'n Proto, we really need this -- or some other as-yet unknown solution to the above problems.

Thanks for weighing in! It's definitely helpful to get more use cases for either having or removing the bounds.


@dgrunwald

Keeping the bounds around would make the primary usage of this API unergonomic, as it would force all FFI code to wrap raw pointers in newtypes just to transport them into the catch_panic closure.

Excellent point! I forgot to include this in the text originally but I've now pushed an update which mentions this.

@pythonesque
Copy link
Contributor

It would be helpful to me to at least copy/paste a few of your thoughts here, and perhaps even word them in the context of this RFC. Some discussion at lunch today made me realize that the wording around the rationale here was a little light, so I've pushed an update which expands a bit on why I believe the bounds should be removed.

Catchable exceptions requires coding transactionally to preserve strong exception safety. Graydon talked about this quite a bit about this as well when he was maintaining Rust. I do not believe programmers are capable of coding transactionally in any useful way, at least not consistently. I also believe that adding ("with a caveat around thread locals") doesn't materially damage this proposition for normal programming purposes, because of the ways people tend to use thread locals.

As stated in the RFC, however, this isn't actually quite true. Broken invariants can be exposed through types like Mutex where antidotes to the poison are available. Additionally it's key to remember that exception safety is not just dealing with memory safety, logical invariants are often just as important to uphold. As a result only limiting the scope of exception safety to unsafe code doesn't quite reflect the state of Rust today.

I've been trying to forestall responses like these (but apparently not well enough). Suffice it to say that I'm well aware of the precise guarantees Rust provides and I think they are largely sufficient for everyday programming. I'm not convinced that's the case with an unbounded catch_panic.

Can you elaborate a bit on why you're worried here? I also very much want to have a panic-to-abort option to the compiler, and I'm under the impression that this RFC does not affect this design much. The only consequence I can think of is that catch_panic turns into just a function call instead of setting up a "catch" block.

The problem is that if people do start using catch_panic as I fear they will, we will effectively have two, incompatible versions of Rust--the one that panics on aborts and the one that doesn't. For an example of how this could affect real functionality in safe code, consider a parser that throws exceptions on errors (later catching them) rather than using Result, to speed up the fast path. People currently avoid doing things like this largely because there is the perception that catching panics is for FFI purposes only (due to its unsafety), but I don't think that will continue if we make it completely bounds-free.

This is certainly a possibility, but I've tried to articulate in the RFC why I believe 'static should be removed as well as Send. Could you take a peek at some of the new text?

I don't see any new text.

@alexcrichton
Copy link
Member Author

Catchable exceptions requires coding transactionally to preserve strong exception safety.

As noted in the RFC, it's important to consider this in the perspective of idiomatic Rust. One of the key parts of exception safety is observing a broken invariant, and all the methods of doing so today aren't necessarily idiomatic in everyday code. The RFC also explains how it's unlikely that catch_panic will not become idiomatic. Do you disagree with these points?

I don't see any new text.

Gah, sorry! Now updated.

@pythonesque
Copy link
Contributor

As noted in the RFC, it's important to consider this in the perspective of idiomatic Rust. One of the key parts of exception safety is observing a broken invariant, and all the methods of doing so today aren't necessarily idiomatic in everyday code. The RFC also explains how it's unlikely that catch_panic will not become idiomatic. Do you disagree with these points?

I somewhat agree with point (1), but I don't think conventions this early in a programming language's life will necessarily have that much importance going forward. I disagree with (2) because I think many Rust programmers simply aren't aware of this. If catch_panic and/or panic! mentioned (in the documentation) that panics could turn into aborts someday, I would be more likely to agree with this point.

Gah, sorry! Now updated.

Like I said, I don't think the thread pool scenario having to use unsafe code is a big deal; I expect fundamental libraries like that to have to use a certain amount of unsafe. The argument against 'static ("it's not perfect") isn't very compelling to me; it still capture many useful cases.

@kentonv
Copy link

kentonv commented Aug 4, 2015

Catchable exceptions requires coding transactionally to preserve strong exception safety.

Correct me if I'm wrong, but I think you're making the oft-repeated assertion that exceptions may leave shared data structures in an inconsistent state, that it's hard to make sure that doesn't happen, and that it's dangerous to continue under such circumstances.

This sounds logical in theory, but doesn't match with my real-world experience. In practice, I've found that the vast majority of the time, this just isn't actually a problem. The vast majority of actual exceptions occurring in our C++ system (which liberally throws exceptions in any unexpected circumstances) have no lasting harm other than erroring the request in which they were thrown. Off the top of my head I remember only one case where servers were left in a bad state after a certain kind of exception and manual cleanup was required -- and the bad state was in the kernel, so even restarting the process wouldn't have helped. But even if it would have helped, it would absolutely not be worth it for us to throw away the fault tolerance benefits we get from our exception strategy in order to avoid one-a-year persistent bad state bugs. On the contrary, having more common aborts -- or trying to sweep problems under the rug to avoid aborting -- would make production problems far more common for us.

@pythonesque
Copy link
Contributor

@kentonv At this risk of turning this into he-said-she-said, I have to disagree with you here. Web servers are only one class of software, and one that Rust is not even specifically aimed at, and you're clearly talking about a server / daemon of some sort, because in other programs dissecting things into a single request / response wouldn't really work. Either way, I don't see how preserving the 'static bound hurts the "completely fail the request" case, especially given the ability to rely on an existing thread pool library (maybe I'm not thinking hard enough).

@kentonv
Copy link

kentonv commented Aug 5, 2015

@pythonesque Yes, my experience is focused on distributed systems serving live traffic (but not just web servers -- also their back-ends, etc.). I don't mean to claim that my requirements are universal. If Rust doesn't want to cover this use case, that's fair -- but sad for me.

That said, it does seem to me that the arguments I'm making could also apply to interactive client apps too. User clicks on button. App fires off some processing in response. Something goes wrong. It's perhaps nicer to the user to cancel the one task initiated by the button press rather than take down the whole program and lose all their state. But yeah, I don't have much experience with such apps.

Either way, I don't see how preserving the 'static bound hurts the "completely fail the request" case.

Well... In C++, Cap'n Proto's event framework is based on promises. When you arrange for some event callback to be called later and return a value of type T, you get a Promise<T> to hold on to representing that registration. If you drop the Promise<T>, the callback never executes. Therefore it is safe for the event callback to contain references to anything that is known to outlive the promise.

I'm not sure to what extent Rust's lifetime checking is expressive enough to support that. The alternative is to fall back to refcounting everything, in which case 'static is fine. But if the semantics we use in C++ could be supported (and checked) in Rust, that would be nice, so it would be sad to see catch_panic() standing in the way of that, especially if the only purpose of keeping the constraint is to create obstacles to discourage catch_panic()'s use.

I guess to be honest I don't understand what the 'static bound is accomplishing here. Send makes sense in that it forces the poisoning system to kick in -- which to be honest, I could kind of see myself liking if not for the atomic ops overhead. But I don't understand why 'static was ever required in the first place.

@pythonesque
Copy link
Contributor

The 'static bound is intended to prevent borrowing, without which you know you have the only reference to the thing. But now that I thing about it, if you have access to Cell, RefCell and Rc this falls apart--Send + 'static was preventing that combination before. TBH, I rather like the existing bound, as it really does capture most of the stuff I'd be worried about (namely: shared, mutable state without poisoning, without which exceptions are largely fine). It's really a pity about *mut T and *const T. Unique is Send, of course, but I haven't seen it used much in client libraries.

FWIW, atomic operations don't hurt much if they're uncontended--on x86, a relaxed operation is pretty much free except that it doesn't allow coalescing or reordering multiple operations on the same memory address, and unordered (which is "atomic enough" for Rust) is the Java default. There could be an OIBIT for poisoning that allowed you to forego atomic operations altogether, but people seem to be against a proliferation of OIBITs.

What about a safe OIBIT Panic that marks a type exception safe, with impl Panic for .., impl<T> !Panic for Cell<T> {}, impl<T> !Panic for RefCell<T>, impl<'a, T> !Panic for &'a mut T? This wouldn't fall afoul of the unsafety concerns Gankro brought up, but it would still allow somewhat sane "default" behavior while allowing people to easily override it if necessary. This could then be made a bound for use with catch_panic. Given that the expectation is that people should be using it rarely, and the fact that adding it to a type would be safe, I think existing concerns about (e.g.) boxed traits would be much less important, since fixing them wouldn't require unsafe; it would just discourage using catch_panic without thinking it through.

I would say this: if the bounds are going to change, I would prefer the bounds to change and then stabilization, rather than doing both at once. Right now we have no experience with how catch_panic is likely to be used without such bounds; everything in this RFC regarding that is conjecture.

@Gankra
Copy link
Contributor

Gankra commented Aug 5, 2015

And we've already had issues with Rc and RefCell because we weren't able to think through all the consequences of making it safe.

We only had an issue with Rc, and even that issue was ridiculous. As in patching it was basically a political move, and had little technical merit. No reasonable program would ever run into this behaviour.

RefCell never had an issue (usize::MAX is the value for borrow_mut, so the whole thing just locks up until you drop one of your Refs).

Our mistake was in trying to say that mem::forget was unsafe, which meant we didn't think through the consequences of destructor leaks in our designs. This lead to issues in thread::scoped and Vec::drain but these were known when we opted to make mem::forget safe. By marking unbounded catch_panic safe, I like to think we'll avoid getting similarly reckless. Marking it unsafe will make us think "oh no this is unsafe so I don't need to care about it".

@ben0x539
Copy link

ben0x539 commented Aug 5, 2015

@dgrunwald

catch_panic is most useful at FFI boundaries to prevent panics from unwinding into non-Rust code. FFI boundaries often deal with raw pointers *mut T, which are not Send.
Keeping the bounds around would make the primary usage of this API unergonomic, as it would force all FFI code to wrap raw pointers in newtypes just to transport them into the catch_panic closure.

Seems like FFI code could just use an unrestricted unsafe catch_panic while the safe one keeps the bounds.

@alexcrichton alexcrichton self-assigned this Aug 5, 2015
@alexcrichton
Copy link
Member Author

@pythonesque

But now that I thing about it, if you have access to Cell, RefCell and Rc this falls apart--Send + 'static was preventing that combination before.

It's important to remember though (as I'm sure you do) that Send + 'static isn't bullet-proof. There are holes (via TLS) that can bypass this. As a result these sorts of bounds are basically purely advisory, e.g. they're not concretely required for memory safety. I think the ergonomics of the FFI case makes a pretty compelling argument for dropping the Sync bound, and I feel that the ergonomics of actually wanting to share data are a pretty good motivation for dropping the 'static bound as well, personally.

I'd be hesitant to add a one-off OIBIT which isn't as core to the language as Send or Sync because it's "yet another trait" to think about, especially with trait objects. It's certainly a plausible route, but I'm not sure if it'd end up pulling its weight.

@tomaka
Copy link

tomaka commented Aug 7, 2015

👍 for keeping the 'static + Send bounds.

The RFC states that exception-unsafe code can't be memory unsafe. But that's assuming that the unsafe code that the program is using indirectly is very well written and carefully auditted, like the stdlib is. I expect very few libraries that contain unsafe code to be exception safe.

In addition to this, the Rust language and stdlib have had several decisions that aim to reduce the number of logical mistakes that the programmer could make. For example variables being immutable by default, the hash map using the SipHasher by default, the strings being UTF8, etc.

In all these examples the default way is the "proper way", and you have to explicitely opt-in to do something potentially erroneous. Even if the potentially erroneous code is still memory safe.

I think that this principle ("the proper way by default") should be applied to exception safety as well. Forbid the user from observing any poisonned state, unless they explicitely opt-in to see it.

@Gankra
Copy link
Contributor

Gankra commented Aug 7, 2015

@tomaka I agree that I expect some wild exception-unsafety, but I imagine it's of the vanilla variety (as in, it's not safe even without catch-panic).

Also I believe this RFC doesn't change the proper default, which is just "don't catch panics". This is largely just needed for FFI and high-reliability systems.

@tomaka
Copy link

tomaka commented Aug 7, 2015

I'm definitely going to use catch_panic:

1 - In an HTTP server to return a 505 error to the user and send an email to the webmaster in case of panic.

2 - In my game to actually show an error to the user. Games are usually not run through the terminal, so in case of panic you'd just see the window magically disappear. I can't use thread::spawn, because on OS/X the window has to be created in the main thread.

Of course that's really subjective, but I don't see catch_panic as something that is rarely needed.
Even if you discourage using it, people are going to use it as it's the only way to catch panics without spawning a thread.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

IMO panics should never be used as a form of exception throwing. If you're doing so, you're doing it wrong. We should not encourage library writers to use panic (which is what this proposal indirectly does by providing stabilizing a way to catch these panics).

The original reasoning behind renaming fail! to panic! was this exact thing. Panic is not failure, since panic is by definition not recoverable. If you use panics as exceptions, you're not rusting right.

Failable (and also recoverable) methods should return Option<T> or Result<T, E>, not panic. If this rule is followed, catch_panic is completely unneeded. Providing a stable catch_panic would just make more people misuse panic! in this way.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

I strongly believe that if this should be stabilized it should be marked with unsafe and/or intrinsics. As @graydon said unsafe is more than just memory safety.

@BurntSushi
Copy link
Member

@ticki Take a look at the motivation section of the RFC. If something like catch_panic is not stabilized, what alternative would you propose? (Option<T> and Result<T, E> are insufficient.)

We should not encourage library writers to use panic

We are far past that point. Any library that uses, e.g., xs[i] can panic.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

@BurntSushi I know the major selling point of this is FFI, and, sure, that's a problem for languages having exceptions etc., but it should be avoided in pure Rust code, hence giving it a unsafe would be ideal.

@BurntSushi
Copy link
Member

Today, unsafe is pretty rigidly defined, and its definition is reasonably simple to grasp. Changing unsafe from "memory unsafety" to "possible footgun, probably shouldn't do this" is, IMO, a major change beyond the scope of this RFC. (I personally hope that ship has sailed.) Instead of using unsafe to prevent others from using it as a general purpose exception handling mechanism, we already merged a recover function with a PanicSafe bound.

If PanicSafe turns out to work well in practice, then this particular RFC will be closed I think.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

"memory unsafety" to "possible footgun, probably shouldn't do this" is, IMO, a major change beyond the scope of this RFC.

What? unsafe has never been solely for memory safety's sake. Read @graydon's comments above.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

@BurntSushi Type violation (invalid enums, for example) are (in most cases) not memory unsafe, but still illegal.

We are far past that point. Any library that uses, e.g., xs[i] can panic.

True. Indexing is panicable, because indexing is in most cases unrecoverable. Handling OoB is something that often leads to bugs in C++ and Java.

But encouraging people to use panics while still being able to recover them would mean that they'd be abused for error handling, which is out of the scope of panics (hence the name). There's a big difference between using panics as abnormal, unrecoverable failure and as fixable exceptional cases.

@BurntSushi
Copy link
Member

Type violation (invalid enums, for example) are (in most cases) not memory unsafe, but still illegal.

"unsafe" is not synonymous with "illegal."

But encouraging people to use panics while still being able to recover them would mean that they'd be abused for error handling.

See the RFC with PanicSafe. It is specifically addressing your concern about "encouraging catching panics as error handling," but without using unsafe specifically.

True. Indexing is panicable, because indexing is in most cases unrecoverable.

Of course they are. If I'm running a web server and a bug is triggered during a request that causes a panic, then it sure would be nice to not bring down the entire web server. This use case is part of the motivation in this RFC, albeit with "thread pools."

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

"unsafe" is not synonymous with "illegal."

Okay, fair enough, let me rephrase: I think that catching panics outside unsafe blocks should be illegal.

@RalfJung
Copy link
Member

Type violation (invalid enums, for example) are (in most cases) not memory unsafe, but still illegal.

In most cases, but not in all. unsafe is guarding exactly what safe code can rely on to always hold. There are some arbitrary choices to be made here, the contract is not entirely fixed by saying "We don't want crashes".

As an illustrative, but unrealistic example, imagine there was a way to decide, given a pointer, whether that pointer is still valid at the given type - then we could decide that pointer validity is not part of the contract that safe code relies on. Converting an integer to a shared borrow would be safe. And every pointer access would have to do the validity check to prevent crashes.
However, it turns out that doing such a check would be insanely expensive, and hence it was decided that all safe code can rely on pointers being valid. Now safe code does not need to do any of these checks, but converting integers to shared borrows becomes an unsafe operation.

As a more realistic example, we could decide for invalid enum discriminants to be safe. However, that would mean that safe code cannot rely on enum discriminants being valid, and hence every single match statement has to check for that case. That's not the decision that has been made; match statements in safe code assume that the enum discriminant is valid. As a consequence, code that could leave behind invalid enum discriminants must be marked unsafe.

The question that is discussed here is different. There is no property (that we know of) that safe code could rely on, that holds unless recover is used. There is this vague idea of "abstraction safety", but it's entirely unclear what that means in terms of "when safe code executes, which assumptions can it make?". (Notice that both of my examples above have a very clear answer to this questions: Pointers are allocated and of the right type. Enum discriminants are valid.) In other words, mis-use of recover cannot make safe code crash right now, and we cannot think of a situation in the future where something will be unsafe just because recover is around. Hence recover should not be unsafe.

Note that the PanicSafe trait has been added exactly to compensate this - to avoid accidentally using data that may have been poisoned by an exception. It provides the necessary precautions against accidentally using this footgun, without undermining and diluting our understanding of what unsafe is and protects.

@BurntSushi
Copy link
Member

I think that catching panics outside unsafe blocks should be illegal.

That's just another way of saying, "I think we should change unsafe to protect against more than just memory safety." Regardless of what previous commenters said, the current language specification is pretty clear about what unsafe means, and catching panics simply isn't unsafe according to that definition.

I'm not sure why exactly you're stuck on this. Your concern is that people will abuse this for standard error handling. I think almost everyone has that same concern. We are trying to address that concern with PanicSafe, rather than expanding the mandate of unsafe. It seems like you're stuck on the latter, but I haven't heard your thoughts on the former.

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

We are trying to address that concern with PanicSafe, rather than expanding the mandate of unsafe. It seems like you're stuck on the latter, but I haven't heard your thoughts on the former.

I'm not exactly sure how you'd define PanicSafes (Is it a attribute? What restrictions are there? Can I have "no panic" blocks?), but the idea is interesting. Is there a RFC for that thing?

I'm not against panics, though. I do not think panics should be considered unsafe {}, I'm saying that panic catching should be (since panics shouldn't be recoverable anyways).

@BurntSushi
Copy link
Member

Is there a RFC for that thing?

I've linked to it a few times already. It has been merged. #1323

@ticki
Copy link
Contributor

ticki commented Dec 10, 2015

@BurntSushi Cool. I must have missed it.

@glaebhoerl
Copy link
Contributor

Shall we also provide a safe mechanism to subvert trait coherence? After all, trait coherence is not directly implicated in memory safety.

As expressed by @graydon and @aturon above, I believe the fundamental requirement for a safe/unsafe distinction is that it has to be a property which can be precisely formulated and strictly enforced by the language. In other words, I believe the proper mantra to recite should not be "unsafe is only for memory safety", but rather "unsafe is not a lint".

@RalfJung
Copy link
Member

Shall we also provide a safe mechanism to subvert trait coherence? After all, trait coherence is not directly implicated in memory safety.

I am pretty sure that in combination with associated types, trait coherence is needed for memory safety. I know this is the case in Haskell, where type families have to be coherent, otherwise the program can crash. I don't have the time right now to come up with an example, though.

@glaebhoerl
Copy link
Contributor

Good point. But even so.

@jamwt
Copy link

jamwt commented Jan 8, 2016

I'm of the school of thought that Rust has one way of handling recoverable errors--Result. Panic should be like assert() in C--something terrible has happened, abandon all hope. An invariant. At work we use the panic2abort crate to ensure this, and it helps us sleep at night. As one data point, our Rust daemons have many weeks of uptime serving many billions of requests (as of course do various C daemons that use assert/abort and golang daemons that use panic), so it doesn't seem particularly daunting to achieve reliability using this approach. We use a lot of 3rd party crates (95, as of right now), and the community currently seems to treat panic with the same rigor--we don't have random .unwraps happening unexpectedly.

There are two principal reasons why I'd prefer we not make it easier or more idiomatic to catch panics:

  1. IME, writing RAII/dtor type code that unfailingly doesn't leak resources, always abort the database transaction, releases the flock etc in the face of unwinding is basically unsolvable-hard. Those types of issues are seldom examined rigorously and almost never exhaustively tested because there are so many implied code paths. Programmers just don't get it right, particularly when it's not the primary method of error propagation (vs. Result). Furthermore, often some other invariant ends up violated in the destructor (due to earlier execution than expected), and then you abort anyway, but sort of mask the "real" issue. Given this, in practice (in the kind of applications I manage, at least) crashing on the initial problem is far preferable to nondeterministic transactional behavior. It crashes loudly, the daemon restarts in a clean state, drops its fds and sockets and mmaps and later on the devs shuffle on by and fix the bug. The punishment fits the crime.
  2. Result and friends are so much better of a way to handle failure that I'm scared of encouraging exception-type behavior as yet-another recoverable error path for new rust developers used to exceptions.

RAII plus destructors plus lexical-scope drop is just awesomely powerful to use so long as it's easy to determine when your destructors run. Rust is sooo close to giving you exactly what you expect that it's a shame there's this odd corner where what it can give you is rather complicated to reason about.

FFI Counterargument

I'm going to cheaply lean on the C defense here--C libraries have plenty of abort() type invariants in practice. To say they're widely used is obviously an understatement. Yes, unwinding is indeed terrible if you have non-rust frames in your stack. Abort instead!

Responding to the "but you could recover.." or "if you're in e.g. a desktop app, don't punish the user.." Do you really know for sure nothing has gone wrong when you unwound? Maybe you'll try again and that transaction that didn't abort correctly will deadlock you.

Basically...

I'm not totally against catching panics because Rust is all about power and if you Know What You're Doing, by all means... If you're convinced it's the right call for your use case and you're pretty darn sure everything that needed to be undone was. Maybe Servo is one of those codebases, for example. Everyone hates browsers crashing, and the I imagine the Servo devs have a shot at getting early drop pervasively right.

But:

  1. Make it difficult to use, to be frank. Make it clear it's dangerous. Going as far as unsafe seems appropriate to me, if not in a letter-of-the-law way, then certainly a spirit-of-the-law one.
  2. Make it really easy to get panic2abort behavior, and continue to encourage the idea that any panic should be an irrecoverable the-sky-is-falling type event.

@jamwt
Copy link

jamwt commented Jan 8, 2016

I should admit that I'm sure my bias is in play here. I use rust for backend storage/database systems where a crash isn't a huge deal (the systems have to be highly redundant anyway) but incorrect behavior is a very big deal (customer data loss). And queuing/gradual degradation is a far more insidious problem than fast failures/restarts.

I'm sure the world looks very different if you're using Rust for an application running on an end-user's computer or something. Or if you're hosting collections of somewhat unknown code in a kind of general application server and you can't always enforce the applications will follow a predefined set of rules.

@ticki
Copy link
Contributor

ticki commented Jan 8, 2016

I agree with @jamwt, that it would be appropriate to make this unsafe. Unsafes are more than memory unsafety. They're a general barrier to avoid users doing things which they should only do if they know exactly what they are doing and why.

@bluss
Copy link
Member

bluss commented Jan 8, 2016

@jamwt Catching panics is a necessary evil for ffi: It's UB to unwind across the boundary.

A correct rust library exposing a rust function using the C ABI must catch panics at the boundary, otherwise unpredictable results happen. This is an area where Rust is not safe by default, instead you have to remember to catch panics, so in that sense we need to make it easy to do it correctly.

@barrkel
Copy link

barrkel commented Feb 21, 2016

One tiny perspective on this long discussion:

Panics in libraries are making a policy decision; catching them is required in order to prevent them from being global policy decisions. Abort on panic means your third party dependencies get to decide when your application aborts, but third parties do not know your application and what constraints it's working with. There are exactly three choices: (1) libraries must not use panics (APIs will be polluted with Results used for bugs instead of error conditions), (2) third party libraries that panic may not be used at all, or (3) panics must be catchable.

The point about policy decisions having transitivity is isomorphic to the situation where Rust is being used across an ffi; but ffi is not the only situation. Abort on panic is not the right behaviour for all applications. It's probably the wrong behaviour for most applications that aren't factored as compositions of small restartable processes; most applications are event-driven, whether servers or UIs. Tearing down anything beyond a simplistic single-threaded server is bad policy as previously discussed (one might consider tainting the process and killing it once all outstanding requests complete, but this still requires catching panics), and disappearing UI is a poor user experience - you'll want to best-effort save state and restore on restart, at a minimum.

My experience does not match @jamwt - exceptions do work well in servers written in GC languages. In practice it's mostly a non-issue, because there are far fewer things that need careful coding when you have a rock-solid guarantee of memory safety. Without memory safety, every memory allocation is hazardous; and in a language like C++ with copy constructors, destructors etc. mere function calls are fraught with danger. But this is mostly a social property of lack of memory safety, not of exceptions.

@arielb1
Copy link
Contributor

arielb1 commented Feb 21, 2016

@barrkel

A buggy library can cause problems in other ways than panicking - e.g. it can infinite-loop, or allocate too much memory and get killed, or it can corrupt data structures it has access to.

A "segmentation fault" abort is often not the desired thing to do with a GUI program. That is a slightly different issue than "catch_panic" - here you want to handle the panic, show a proper error message, and then abort the process.

@diwic
Copy link

diwic commented Feb 23, 2016

Panics in libraries are making a policy decision; catching them is required in order to prevent them from being global policy decisions.

So an interesting thought here is, in the case of a Rust library and non-Rust main program, would it be a good best practice to export a symbol which is a wrapper around panic::set_handler? So the non-Rust program can decide to abort, unload the Rust library or whatever makes the most sense given the type of panic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.