Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default behavior of unwinding in FFI functions #58794

Open
Mark-Simulacrum opened this Issue Feb 28, 2019 · 46 comments

Comments

Projects
None yet
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Feb 28, 2019

This is the tracking issue for the behavior of unwinding through FFI functions.

There are two choices here: we can abort if unwinding occurs through an extern "C" boundary. We abort on beta 1.34 and nightly 1.35, but will permit unwinding in stable 1.33.

We previously attempted this change in 1.24 and reverted in 1.24.1. We attempted to do so again in 1.33, but reverted once again pending lang team discussion on the topic.

There has been discussion on this topic in #52652, #58760, and #55982.

The stable behavior of permitting unwinding is UB, and can be triggered in safe code (#52652 (comment)). Notably, mozjpeg depends on this behavior and seems to have no good stable alternatives; there's been some discussion on internals.

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Feb 28, 2019

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Feb 28, 2019

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Feb 28, 2019

It would be helpful for the discussion if someone knowledgeable could write a summary covering the following:

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)
  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?
  • Are there any concrete plans for writing a custom unwinder/"tweaking the unwinder"? Has anyone tried this/is anyone working on this? (The answer to this question may be used to determine how Rust restricts itself by locking into the standard unwinder on major platforms.)

When I say major platforms, I mean GNU libunwind, Windows SEH, possibly others.


Is it possible to have different default behavior depending on which unwinder you're using? Say, unwind normally on major platforms, abort on others?

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Feb 28, 2019

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)

In another thread, @alexcrichton wrote:

From a technical perspective this is pretty feasible, but from a stabilization perspective is historically something we've never wanted to provide. We want the technical freedom to tweak unwinding as we see fit, which means it's not guaranteed to match what C++ does across every single platform.

As for the current implementation, my understanding is: on Unix it works; on Windows it mostly works, with some issues that could be solved. See my comment in that thread for more details.

  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?

On Unix, longjmp just resets the stack pointer and ignores unwinding.

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors, AFAIK. *

* (I said in other threads that it didn't, because I misread the description of this PR and thought that it changed things so destructors wouldn't run when unwinding via longjmp; in reality, it only did that to the abort-on-unwind handler itself.)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 28, 2019

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors

As far as the last part of that is concerned, that is an implementation-specific and unspecified behaviour.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

The current behavior on stable amounts to a soundness hole. For example, based on #52652 (comment), we can write (playground):

extern "C" fn bad() {
    panic!()
}

fn main() {
    bad()
}

The behavior of this program is undefined on stable because we attach the nounwind LLVM attribute to bad.

Soundness is non-negotiable and as such we landed #55982 to close this soundness hole. However, since there was no explicit confirmation of this step by the language team the change was reverted on 1.33 pending confirmation. The change is still seen in beta and nightly compilers.

Based on notes by @alexcrichton in #52652 (comment), #55982 (comment), #55982 (comment), and #55982 (comment), I propose that we go ahead with and confirm the change in #55982.

@rfcbot merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 10, 2019

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke. Do we really want to try merge it again without any changes?

The discussion on this topic is pretty fragmented, the internals thread mentioned in the top post has been quite active as well. I'm still waiting for the summary I requested #58794 (comment). I thought the scope of this issue is much bigger than what @Centril just mentioned. If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke.

This is untrue. The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP (which we are doing now).

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP

I don't think so. If no one in the community would've complained about the change, I don't think this would've been reverted a day before the stable release even though no lang team discussion had happened yet. That might've been used as justification to actually do the revert, but it's certainly not the only reason.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

@jethrogb It most definitely was the only reason; the release team cannot undo language team decisions and had there been one we would not have reverted.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

@Centril I'm saying that if there hadn't been any backlash no one would've even proposed to undo the change.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

@jethrogb Yes, there was backlash, but that was irrelevant to the acceptance or non-acceptance of the undo-PR itself. The sole reason for accepting the undo-PR was the lack of a completed T-Lang FCP.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 10, 2019

I don't have enough information to argue about procedural details and people's rationale for r+'ing this or that PR, and I wouldn't be very interested in doing so anyway. I just want to say that in the light of the the ongoing discussions and continued lack of consensus on how to address the legitimate needs of some projects to unwind through FFI, it seems premature to me to take this step now, just as it was premature the last times. Soundness is ultimately not negotiable, but there can absolutely be bad times and ways to roll out soundness fixes.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

It was suggested to me in a private conversation that this might lead to performance loss. I'd like to see some numbers on that. Because things “work” most of the time right now, it seems to me that LLVM currently generates code that would be similar to the code it would generate without nounwind.

I wholeheartedly agree with @rkruppe. I feel like not emitting nounwind is a good alternative to fix the unsoundness now (although not solving UB in general), while keeping users happy, and it gives us time to search for a real solution. For this real solution, I'd like to see an RFC-style discussion with a solid motivation and discussion of alternatives.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

As long as the soundness hole is closed one way or the other (aborting, not emitting nounwind, ...) I think it's fine.

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed. If "soundness is ultimately not negotiable" is to have any meaning a possible outcome must be that the hole is fixed but there's no #[unwind(...)]. I think it is long overdue to fix the hole as it was reported first in 2014! We have also communicated about the problem in two separate release notes.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 10, 2019

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed.

I don't feel it is helpful to draw such an antagonistic picture. There literally is no way to do FFI unwinding safely in Rust currently, and some people got frustrated enough by that that they went with something that "happens to work". I have hacked around limitations in ugly ways often enough that I can totally sympathize. Sure, they should instead have written an RFC to provide a defined way to do what they needed to do, but that's a lot of work and not everyone is up for that kind of contribution.

The #[unwind] attribute was planned anyway, so there are no forced additions to the language here---just an adjustment to the transition plan. I hope this discussion kickstarts the RFC process for #[unwind], that's the most constructive outcome I can imagine here.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 10, 2019

@Centril It is indeed a possible outcome that the relevant teams ultimately decide "damn those programs and use cases, we won't provide a way to unwind through FFI". However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected -- as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years and in particular not providing an alternative way to address the very reasonable needs that cause users to write programs with this UB. We now have the situation that people trying to do certain (fairly reasonable!) things with Rust not only have no way to achieve it without writing programs that have UB, they do not even have an alternative in sight that they could switch to when those programs break.

Rust is well within its rights to break those programs, and I am definitely not arguing that the de facto behavior of today should be ad-hoc blessed as defined behavior, but it will cause users serious problems to not provide some alternative way to do what they need to do. We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix. For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken. Such a warning is not possible in this case (as it's about runtime behavior), but we should similarly do our best to ease the pain. Holding off pulling the trigger for another couple months (peanuts compared to how long the soundness issue has been open!) while waiting on other issues get worked out is a quite easy way to do that.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

I first heard of the existence of #[unwind(...)] during the T-Release meeting where we decided to revert the change on 1.33. It also didn't have a tracking issue until 12 days ago. Moreover, @alexcrichton said this in #55982 (comment):

The #[unwind] attribute was added as a necessary evil when this first came up (but wasn't supposed to be necessary long-term) and is otherwise only tweaking a very low-level detail of LLVM that doesn't relate to the correctness of the API.

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

None of this suggests that "The #[unwind] attribute was planned anyway, [...]" or "the trajectory of #[unwind] so far is rather the opposite!".

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years [...]

Yes, I'm quite unhappy about the inaction here. I think the reason for the inaction has precisely been that we didn't want to break anyone. In the future I hope that we set deadlines for and better track soundness holes and C-future-compatibility issues.

We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix.

I think it is entirely reasonable that people use nightly until such time and help test the #[unwind(...)] attribute in the process. As @alexcrichton noted:

Additionally I don't think this can really bake without actually testing, this will remain virtually undetected unless everyone opts-in to testing it, so the only real way to get testing is to actually flip the defaults and see what happens. That's what we did last time and it's easy to always flip the defaults back if something comes up!

For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken.

I'm well aware of C-future-compatibility issues and but I think we let them sit around for far too long without actionable and well-triaged plans to address them. I think we are in need of schedules and deadlines.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Mar 11, 2019

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

On a philosophical level, I disagree. It's not a question of "legitimizing". It's a fact of life that people will rely on implementation details whether they're supposed to or not, unless you actively prevent them from doing so. Ideally you do prevent them, like rustc does with #[feature] flags, or at least actively assist them in avoiding it, like the hypothetical undefined behavior checker will do for violations of the memory model in unsafe code. But if you don't (and in some situations you can't), you can't shrug off responsibility for the breakage that ensues when the implementation changes.

On a more practical note, among the links in the original post, I think this (from here) is a key quote:

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

My thoughts:

  • The nounwind LLVM attribute is a red herring, since sticking it on a handful of functions in a larger codebase is very unlikely to have a measurable effect on performance. (If you want it on all your functions, you can always use panic=abort.) If abort-on-unwind is not merged, the default nounwind should just be removed for now. Likewise, if we wanted to make extern "C" functions unwindable by default and the only downside was that we wouldn't be able to automatically mark them nounwind, IMO it would easily be worth it.
  • However, the suggestion of unwinding via implicit return values changes the story. It doesn't necessarily conflict with unwinding across C code, since you could implement the transformation at the LLVM level and apply it to both C and Rust code. But if the C code is already compiled for an existing ABI (e.g. because it's a system library, or just using a different compiler toolchain), there's clear value in being able to interoperate with that while still supporting unwinding within Rust code.
  • There do exist targets where unwinding across arbitrary stack frames is not just unimplemented or difficult to implement, but impossible. WebAssembly in its current state is a partial example; you can implement unwinding using a helper written in JavaScript, but there are some pure-WebAssembly environments that don't have JS at all.
  • That implies that it is arguably beneficial to (eventually) require an explicit #[unwind] for functions that want to unwind into C, so that it can produce a compile error on such targets. On the other hand, if unwindable were the default, it could still produce a runtime error.
  • That also implies that making "a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++" is impossible. If that's the requirement for #[unwind] to exist, then it can never exist.
    • I don't think we have to be that strict. After all, there is already panic=abort mode where unwinding is not supported at all, yet we still have catch_unwind. Using it makes your code less than fully portable, but you can still use it. The same should be true for unwinding across FFI.
    • Indeed, I believe it would be against Rust's philosophy to not expose useful and necessary platform functionality just because it is not portable.

Of course, this needs to go through an RFC. I don't think it needs to be a particularly "hard" RFC, at least if we're just stabilizing unwinding across C; it could be accepted quickly enough that there would be basically no benefit in changing the implementation to abort by default in the meantime. But this seems to have been rather controversial so far, so who knows...

For that to happen, someone needs to write the RFC. Does anyone want to volunteer to do that? Should I?

I also think it would be useful to fix and stabilize unwinding across C++, as a feature which might have an even narrower set of supported platforms, but which is not at all hard to implement on most existing platforms and could be quite useful for mixed C++-Rust codebases. But that comes later.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Mar 11, 2019

Oh, one more thing (I'd edit this in, but that doesn't help people reading via email):

If the unwind attribute is stabilized, rather than #[unwind(allowed)], I'd like to see something like #[unwind(C)], indicating that you want to unwind across C code. In the future there could be a separate #[unwind(C++)] and perhaps others. There wouldn't necessarily be any way to verify that you chose the right language, but it would make it more evident what exactly the implementation is guaranteeing to be safe, and would allow targets that supported unwinding across C but not C++ to make #[unwind(C++)] a compile error.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 13, 2019

@rfcbot concern need-documented-replacement

In the absence of a documented replacement for how people should handle errors in C libraries that only support handling through unwinding, closing this would break a common use case.

Another way to fix the UB might be to drop the LLVM "nounwind" attribute. We could also add #[unwind], though it'll take some exploration of the details there.

I'm happy to support this as a sensible default after we document exactly what we expect people to do when interacting with inflexible C libraries that expect unwind-based error handling.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

@joshtriplett Yes, if they want to be portable.
Also, if the non-Rust library is using C++ exceptions, the glue code should be in C++, not C.

If you need to "unwind through C code", you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension, or try to find another way, I believe that's one of the things that's impossible/UB in the general case.

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 13, 2019

@eddyb

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

Why not? This is precisely what those C libraries that only support unwind-based error handling expect callbacks to do.

In any case, unwinding works in Rust today for this particular use case (Rust -> C -> Rust callback)

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 13, 2019

you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension

Crates can reasonably do that by doing the build as part of build.rs.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

@joshtriplett One thing off the top of my head (but there might be more reasons): libraries written without unwinding support in mind and/or not compiled with -fexceptions (or equivalent) may be UB if certain code is skipped (e.g. a loop waiting for a mutex to be unlocked / a thread to exit).

@MOZGIII

This comment has been minimized.

Copy link

MOZGIII commented Mar 13, 2019

I'd say in general it's really UB, however we can have defined behavior with certain known conditions. For example, in Elbrus architecture frames are explit on the CPU level, and stack unwinding is guaranteed to actually jump through the stack frames (no longjmp there). So, on Elbrus, for the whole platform is seems reasonable to assume that the bahevior is known, and that the unwinding works for everything by default. I'm not an expert in Elbrus though, I just head it somewhere. (Please validate my before actually relying on this!) The point is, maybe it can be different per-platform?
I'm not an expert in Rust internals either, but maybe there could be a certain autotrait a compiler can generate for Fn on a case by case basis if it has an indication that unwinding is safe? Just a wild idea...

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Mar 13, 2019

Certainly unsafe code is allowed to assume its destructors won't be skipped when unrolling up the stack, otherwise things like the crossbeam scoped API would be completely unsound. I think the only thing being discussed here is whether it's okay to unwind past specific, known chunks of Rust code.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 13, 2019

We are mostly worried about unwinding past C code.

@MOZGIII

This comment has been minimized.

Copy link

MOZGIII commented Mar 13, 2019

I'm confused. Unwinding Rust code can be made safe within the sequence of Rust frames, being that sequence the top-level app stack, or a callback inside of a C/C++/whatever passed the control to the Rust code, right? Problems and uncertantries arise when we unwind across Rust and non-Rust stack frames, like from Rust code (running in callback), to C++ code (that invoked the callback), running in Rust code (in the Rust app, that uses C++ library that has a call that takes a callback). In this case it might be tricky to guarantee that unwinding will correctly pass through the C++ layer (Rust -> C++ -> Rust). And so on with other combinations. My point about Elbrus is all such cases baheve very similarly with their architecture with multiple stacks (they have separate frames and data stacks).
Also, any part of the code is allowed to catch the unwinding, in theory. Yeah, that's not as strightforward as I anticipated initially.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 14, 2019

C code compiled with C++ exceptions enabled is not C. ABI-wise it is probably more C++ than C.

If the aim is to convert Rust panics to C++ exceptions, and then catch C++ exceptions and convert them to Rust panics, supporting that via extern "c++" makes more sense to me.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Apr 2, 2019

What's the status on this? We reverted the abort in #58795 for the stable branch, but AFAICS it's still there on beta and master. If the decision is still pending, we should also revert on beta so it doesn't land in 1.34 next week!

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Apr 2, 2019

Rust programs have UB without the PR, we want to exploit that UB to improve the performance of Rust panics in the near future (e.g. CraneStation/cranelift#553), and the PR makes those programs have defined behavior by guaranteeing an abort instead which is IMO a big improvement.

The revert was intended to buy us more time to explore some solutions and we have done so. Somebody needs to put in the work to write RFCs, implement them, etc.

Programs affected by this, like mozjpeg, have a migration path available: catch panics in the Rust code, pass error codes through FFI, re-raise as exceptions/longjmps/etc. in whatever other language is at the other side of the FFI. This is what those programs should have done in the first place.

Therefore I think we should revert the revert.

While it is sad that Rust cannot directly interoperate with C++ in C FFI (automatically inserting shims to convert from Rust panics to C++ exceptions and vice-versa), the right way to solve that problem is to submit an RFC with a solution.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Apr 2, 2019

I think we should revert the revert.

The revert was only on stable, so if we do nothing right now, the unwind-abort will be in 1.34. Maybe that's fine, but it doesn't seem like there's consensus per rfcbot #58794 (comment).

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Apr 2, 2019

I agree that we should propagate this to stable to avoid a stable regression, yes. This should never have been broken in the first place without discussion before changing the behavior. This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

I'd be all for seeing RFCs, to propose alternatives that would allow optimizations like the proposed one in cranelift. And in the meantime, let's not break stable users.

I'm aware that "this is unsound" is a permissible justification for breaking stable. However, there's a difference between "this is unsound" and "this is undefined (but people know how it works)". By all means, let's find a solution for this, and until then let's not stick a crowbar through the engine of a running vehicle to stop it for maintenance. ;)

cuviper added a commit to cuviper/rust that referenced this issue Apr 2, 2019

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Apr 2, 2019

I opened #59640 on beta to preserve the current stable behavior.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Apr 2, 2019

This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

It is not that the behavior "wasn't well defined" - the behavior is undefined by design. The main reason being that we can change the implementation to alert users that they are invoking undefined behavior, as well as to enable optimizations in FFI code and in the Rust panic implementation.

By all means, let's find a solution for this,

There is already a straightforward solution to this. People arguing that it's not what they wish it would be does not change that fact.

and until then let's not stick a crowbar through the engine of a running vehicle to stop it for maintenance. ;)

LLVM is allowed to optimize all this code under the assumption that it won't unwind because of the nounwind attribute. If we don't alert users that their code is broken loudly, the next LLVM upgrade could silently break their production code, potentially introducing security vulnerabilities.

I'd rather have users complaints of the form "You changed the implementation of some code that had undefined behavior and now I have to fix my code" than complaints of the form "You knew my code had undefined behavior, had an implementation of a way to alert me, yet decided not to do so, which resulted in my software having a security vulnerability".

I'd be all for seeing RFCs, to propose alternatives that would allow optimizations like the proposed one in cranelift. And in the meantime, let's not break stable users.

The current stable C FFI already allows these optimizations. If people want language features to more ergonomically interface with the unwinding strategies of other programming languages, like C++, they should open RFCs to do that.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Apr 3, 2019

Process question: Should we revert this on nightly too so we don't have to keep reverting this until the final decision has been made?

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Apr 3, 2019

Rust programs have UB without the PR, we want to exploit that UB to improve the performance of Rust panics in the near future (e.g. CraneStation/cranelift#553), and the PR makes those programs have defined behavior by guaranteeing an abort instead which is IMO a big improvement.

That link does not show that we want to exploit it "in the near future", given that Cranelift does not perform optimizations and so any small performance benefit from changing the calling convention would be purely academic – not to mention that rustc doesn't even support Cranelift yet.

Arguably it shows that we want to exploit it in the far future; that gives us plenty of time to add FFI unwinding attributes first, rather than rushing to break things and, at best, making crates like mozjpeg write pointless workarounds that will soon become obsolete.

I'd rather have users complaints of the form "You changed the implementation of some code that had undefined behavior and now I have to fix my code" than complaints of the form "You knew my code had undefined behavior, had an implementation of a way to alert me, yet decided not to do so, which resulted in my software having a security vulnerability".

Then remove nounwind.

bors added a commit that referenced this issue Apr 3, 2019

Auto merge of #59640 - cuviper:beta-no-unwind-abort, r=Mark-Simulacrum
[beta] Permit unwinding through FFI by default

Let's kick the can down the road, keeping FFI-unwind-abort out of stable until #58794 is resolved.

cc @rust-lang/release
@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Apr 3, 2019

Then remove nounwind.

As discussed yesterday on discord (cc @Centril @joshtriplett ), I believe that while these libraries have knowingly decided to rely on a particular implementation of undefined behavior, we might have failed to communicate during the last cycle that it's up to them to put in the work. Iff these libraries show willingness to fix their code, and require more time to do that, I'll be fine with delaying the landing of abort for another 6 weeks, as long as it then lands unconditionally.

Penalizing correct C FFI code (e.g. by removing nounwind) would set IMO a bad precedence (what's next? remove noalias, dereferenceable and align from Rust references because some other code exploits UB and these break that?). Also, I'm more sympathetic of incorrect-by-accident C FFI code that would be alerted by abort, and then fixed, than of libraries knowingly abusing UB.

making crates like mozjpeg write pointless workarounds that will soon become obsolete.

No other programming language allows unwinding through C FFI. The only standard way of propagating exceptions in C++ and D through FFI boundaries is catching all exceptions at the FFI boundary, and passing error codes instead - depending on the language at the other side, those error codes can be re-raised as panics/exceptions/sjlj/etc. The behavior of throwing from extern "C" functions in those languages is undefined as well, and aborting is what C++'s noexcept also chooses to do if an exception is thrown.

In the last 6 week, no pre-RFC or RFC has been filled to extend the language to support this use case, this issue has received very little attention, the design work required to extend the language with a more ergonomic solution is IMO significant (should we have #[unwind(c++/c-sjlj/c++itanium/c++seh/c++sjlj/rust)] vs extern "C++/extern "Rust" etc.), and it is unclear to me whether the added complexity of doing so is worth the improved ergonomics given that a solution to this problem that works 100% reliably between Rust and all other PLs (not only C++ or C, but also Python, Ruby, and everything else) has been available since Rust 1.0, and that the number of affected crates is small.

So I am not as certain as you are about "how soon" this workarounds will land.


There have been cases where there just was no way to do something in Rust without invoking UB (e.g. taking the address of a packed struct field), but this is not one of those cases. These libraries had a perfectly valid stable Rust alternative available, yet decided to pursue the UB route instead. I think that's unfortunate, but I think it would be more unfortunate to penalize those using C FFI correctly, as well as leaving vulnerable those who are invoking undefined behavior by accident instead of by design.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Apr 3, 2019

@joshtriplett

This worked in prior stable versions, and that it happens to not be well defined doesn't change that it worked in many cases and people were able to successfully use it in those cases.

However, there's a difference between "this is unsound" and "this is undefined (but people know how it works)".

I agree in principle. However, notice that the same is true for the kind of UB that we will eventually introduce with whatever derivative of or alternative to Stacked Borrows becomes "the real thing": we currently definitely do not exploit many kinds of UB in this space (we only emit noalias on function boundaries), so one could say "this is undefined (but people know how it works)".

I appreciate that unwinding across FFI might be different, but "it is UB just on paper, not in practice" is a slippery slope.

@mjbshaw

This comment has been minimized.

Copy link
Contributor

mjbshaw commented Apr 3, 2019

In the last 6 week, no pre-RFC or RFC has been filled to extend the language to support this use case,

I started one (which is largely based on jcranmer's post). If jcranmer's ideas sound reasonable and interesting to people, I can try expediting writing the RFC (at the cost of interacting with my family...).

I have a crate (objrs) that makes Rust code adhere to the Objective-C ABI, allowing Rust and Objective-C to interop. Throwing exceptions across Rust+Objective-C boundaries is a critical part of my crate (libobjc even has a C FFI function (objc_exception_throw) for throwing Objective-C exceptions). And I'm not the only one doing this: the more popular rust-objc crate also supports Objective-C exceptions across the Rust boundaries. I bring this up to show that there is interest in not only Rust → FFI → Rust exceptions (e.g., mozjpeg), but also C++ → Rust → C++ and Objective-C → Rust → Objective-C (e.g., objrs and rust-objc).

In short, there's a good argument to make for exceptions across FFI boundaries. It definitely needs an RFC to iron out the details and make clear what is and is not permissible. I just hope people are at least open to the idea.

No other programming language allows unwinding through C FFI. The only standard way

Focusing on a "standard way" for a language is a red herring. This kind of stuff is in the land of implementation-defined behavior, not standard-defined behavior. There's a reason GCC has -fexceptions and allows it to be used with C (to quote their docs: "you may need to enable this option when compiling C code that needs to interoperate properly with exception handlers written in C++"). With Rust, I imagine this being a mix of standard- and implementation-defined behavior (meaning Rust supports some common exception ABIs for FFI, but not necessarily on all platforms, and the onus is on the programmer to make sure the ABI used by Rust matches the ABI used by the FFI).

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Apr 3, 2019

@mjbshaw jcranmers' post is one of the many directions in which we could go, I think the direction is worth exploring, but not at the cost of interacting with your family. Before writing an RFC, it would probably make sense to first open an internal threads to collect all the use cases we want to support and all the constraints that we have, and try to reach a consensus on that, since those are going to restrict the design space for the RFC.


On discord we also discussed that maybe we could add a -Z no-ffi-abort-on-panic-Rust-1.34.0 feature flag to allowing users to temporarily, and for particular toolchain versions, turn off abort on panic through C FFI.

@cuviper

This comment has been minimized.

Copy link
Member

cuviper commented Apr 3, 2019

@gnzlbg

Iff these libraries show willingness to fix their code, and require more time to do that, I'll be fine with delaying the landing of abort for another 6 weeks, as long as it then lands unconditionally.

I take this as an ultimatum for the lang team -- from the release team perspective, IMO we should keep maintaining the status quo on stable until this issue is decided.

On discord we also discussed that maybe we could add a -Z no-ffi-abort-on-panic-Rust-1.34.0 feature flag to allowing users to temporarily, and for particular toolchain versions, turn off abort on panic through C FFI.

Would you make this flag available to stable users? -Z options usually aren't, and if users have to play games with RUSTC_BOOTSTRAP, they might as well go further and use #[unwind(allowed)].

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Apr 3, 2019

Would you make this flag available to stable users?

That's a good question. mozjpeg is used where stable Rust is needed, but Firefox is using RUSTC_BOOTSTRAP anyways. So... we should just ask them if they need it. While we usually don't do this, I think that as long as we clearly delimit for which Rust versions the flag is available, and make it clear that the flag will be removed in the future, allowing using such a temporary flag from 1-2 stable Rust toolchains is a good compromise.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Apr 4, 2019

cc @cuviper @kornelski @alexcrichton

Based on discussion in the language team meeting: We'd like to address the undefined behavior, and in the process of doing so we want to provide a stable Rust alternative for this. We'd like to set a reasonable deadline for a stable release making undeclared unwinds through FFI abort, something in the ~12 week range. In order to make a plan that seems likely to succeed, we'd like to have the folks who need this feature (either unwind-through-FFI or some manner of well-defined setjmp/longjmp) involved in the conversation and specifying the replacement feature.

Could we please get some positive confirmation, from the Rust-bindings-to-mozjpeg folks or others who need this, that it seems reasonable to develop a replacement for this in the near-term future?

@Centril Centril removed the I-nominated label Apr 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.