Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default behavior of unwinding in FFI functions #58794

Open
Mark-Simulacrum opened this Issue Feb 28, 2019 · 31 comments

Comments

Projects
None yet
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Feb 28, 2019

This is the tracking issue for the behavior of unwinding through FFI functions.

There are two choices here: we can abort if unwinding occurs through an extern "C" boundary. We abort on beta 1.34 and nightly 1.35, but will permit unwinding in stable 1.33.

We previously attempted this change in 1.24 and reverted in 1.24.1. We attempted to do so again in 1.33, but reverted once again pending lang team discussion on the topic.

There has been discussion on this topic in #52652, #58760, and #55982.

The stable behavior of permitting unwinding is UB, and can be triggered in safe code (#52652 (comment)). Notably, mozjpeg depends on this behavior and seems to have no good stable alternatives; there's been some discussion on internals.

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Feb 28, 2019

Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Feb 28, 2019

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Feb 28, 2019

It would be helpful for the discussion if someone knowledgeable could write a summary covering the following:

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)
  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?
  • Are there any concrete plans for writing a custom unwinder/"tweaking the unwinder"? Has anyone tried this/is anyone working on this? (The answer to this question may be used to determine how Rust restricts itself by locking into the standard unwinder on major platforms.)

When I say major platforms, I mean GNU libunwind, Windows SEH, possibly others.


Is it possible to have different default behavior depending on which unwinder you're using? Say, unwind normally on major platforms, abort on others?

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Feb 28, 2019

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)

In another thread, @alexcrichton wrote:

From a technical perspective this is pretty feasible, but from a stabilization perspective is historically something we've never wanted to provide. We want the technical freedom to tweak unwinding as we see fit, which means it's not guaranteed to match what C++ does across every single platform.

As for the current implementation, my understanding is: on Unix it works; on Windows it mostly works, with some issues that could be solved. See my comment in that thread for more details.

  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?

On Unix, longjmp just resets the stack pointer and ignores unwinding.

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors, AFAIK. *

* (I said in other threads that it didn't, because I misread the description of this PR and thought that it changed things so destructors wouldn't run when unwinding via longjmp; in reality, it only did that to the abort-on-unwind handler itself.)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Feb 28, 2019

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors

As far as the last part of that is concerned, that is an implementation-specific and unspecified behaviour.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

The current behavior on stable amounts to a soundness hole. For example, based on #52652 (comment), we can write (playground):

extern "C" fn bad() {
    panic!()
}

fn main() {
    bad()
}

The behavior of this program is undefined on stable because we attach the nounwind LLVM attribute to bad.

Soundness is non-negotiable and as such we landed #55982 to close this soundness hole. However, since there was no explicit confirmation of this step by the language team the change was reverted on 1.33 pending confirmation. The change is still seen in beta and nightly compilers.

Based on notes by @alexcrichton in #52652 (comment), #55982 (comment), #55982 (comment), and #55982 (comment), I propose that we go ahead with and confirm the change in #55982.

@rfcbot merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 10, 2019

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke. Do we really want to try merge it again without any changes?

The discussion on this topic is pretty fragmented, the internals thread mentioned in the top post has been quite active as well. I'm still waiting for the summary I requested #58794 (comment). I thought the scope of this issue is much bigger than what @Centril just mentioned. If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke.

This is untrue. The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP (which we are doing now).

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP

I don't think so. If no one in the community would've complained about the change, I don't think this would've been reverted a day before the stable release even though no lang team discussion had happened yet. That might've been used as justification to actually do the revert, but it's certainly not the only reason.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

@jethrogb It most definitely was the only reason; the release team cannot undo language team decisions and had there been one we would not have reverted.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

@Centril I'm saying that if there hadn't been any backlash no one would've even proposed to undo the change.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

@jethrogb Yes, there was backlash, but that was irrelevant to the acceptance or non-acceptance of the undo-PR itself. The sole reason for accepting the undo-PR was the lack of a completed T-Lang FCP.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 10, 2019

I don't have enough information to argue about procedural details and people's rationale for r+'ing this or that PR, and I wouldn't be very interested in doing so anyway. I just want to say that in the light of the the ongoing discussions and continued lack of consensus on how to address the legitimate needs of some projects to unwind through FFI, it seems premature to me to take this step now, just as it was premature the last times. Soundness is ultimately not negotiable, but there can absolutely be bad times and ways to roll out soundness fixes.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Mar 10, 2019

If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

It was suggested to me in a private conversation that this might lead to performance loss. I'd like to see some numbers on that. Because things “work” most of the time right now, it seems to me that LLVM currently generates code that would be similar to the code it would generate without nounwind.

I wholeheartedly agree with @rkruppe. I feel like not emitting nounwind is a good alternative to fix the unsoundness now (although not solving UB in general), while keeping users happy, and it gives us time to search for a real solution. For this real solution, I'd like to see an RFC-style discussion with a solid motivation and discussion of alternatives.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

As long as the soundness hole is closed one way or the other (aborting, not emitting nounwind, ...) I think it's fine.

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed. If "soundness is ultimately not negotiable" is to have any meaning a possible outcome must be that the hole is fixed but there's no #[unwind(...)]. I think it is long overdue to fix the hole as it was reported first in 2014! We have also communicated about the problem in two separate release notes.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 10, 2019

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed.

I don't feel it is helpful to draw such an antagonistic picture. There literally is no way to do FFI unwinding safely in Rust currently, and some people got frustrated enough by that that they went with something that "happens to work". I have hacked around limitations in ugly ways often enough that I can totally sympathize. Sure, they should instead have written an RFC to provide a defined way to do what they needed to do, but that's a lot of work and not everyone is up for that kind of contribution.

The #[unwind] attribute was planned anyway, so there are no forced additions to the language here---just an adjustment to the transition plan. I hope this discussion kickstarts the RFC process for #[unwind], that's the most constructive outcome I can imagine here.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 10, 2019

@Centril It is indeed a possible outcome that the relevant teams ultimately decide "damn those programs and use cases, we won't provide a way to unwind through FFI". However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected -- as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years and in particular not providing an alternative way to address the very reasonable needs that cause users to write programs with this UB. We now have the situation that people trying to do certain (fairly reasonable!) things with Rust not only have no way to achieve it without writing programs that have UB, they do not even have an alternative in sight that they could switch to when those programs break.

Rust is well within its rights to break those programs, and I am definitely not arguing that the de facto behavior of today should be ad-hoc blessed as defined behavior, but it will cause users serious problems to not provide some alternative way to do what they need to do. We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix. For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken. Such a warning is not possible in this case (as it's about runtime behavior), but we should similarly do our best to ease the pain. Holding off pulling the trigger for another couple months (peanuts compared to how long the soundness issue has been open!) while waiting on other issues get worked out is a quite easy way to do that.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 10, 2019

However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

I first heard of the existence of #[unwind(...)] during the T-Release meeting where we decided to revert the change on 1.33. It also didn't have a tracking issue until 12 days ago. Moreover, @alexcrichton said this in #55982 (comment):

The #[unwind] attribute was added as a necessary evil when this first came up (but wasn't supposed to be necessary long-term) and is otherwise only tweaking a very low-level detail of LLVM that doesn't relate to the correctness of the API.

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

None of this suggests that "The #[unwind] attribute was planned anyway, [...]" or "the trajectory of #[unwind] so far is rather the opposite!".

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years [...]

Yes, I'm quite unhappy about the inaction here. I think the reason for the inaction has precisely been that we didn't want to break anyone. In the future I hope that we set deadlines for and better track soundness holes and C-future-compatibility issues.

We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix.

I think it is entirely reasonable that people use nightly until such time and help test the #[unwind(...)] attribute in the process. As @alexcrichton noted:

Additionally I don't think this can really bake without actually testing, this will remain virtually undetected unless everyone opts-in to testing it, so the only real way to get testing is to actually flip the defaults and see what happens. That's what we did last time and it's easy to always flip the defaults back if something comes up!

For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken.

I'm well aware of C-future-compatibility issues and but I think we let them sit around for far too long without actionable and well-triaged plans to address them. I think we are in need of schedules and deadlines.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Mar 11, 2019

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

On a philosophical level, I disagree. It's not a question of "legitimizing". It's a fact of life that people will rely on implementation details whether they're supposed to or not, unless you actively prevent them from doing so. Ideally you do prevent them, like rustc does with #[feature] flags, or at least actively assist them in avoiding it, like the hypothetical undefined behavior checker will do for violations of the memory model in unsafe code. But if you don't (and in some situations you can't), you can't shrug off responsibility for the breakage that ensues when the implementation changes.

On a more practical note, among the links in the original post, I think this (from here) is a key quote:

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

My thoughts:

  • The nounwind LLVM attribute is a red herring, since sticking it on a handful of functions in a larger codebase is very unlikely to have a measurable effect on performance. (If you want it on all your functions, you can always use panic=abort.) If abort-on-unwind is not merged, the default nounwind should just be removed for now. Likewise, if we wanted to make extern "C" functions unwindable by default and the only downside was that we wouldn't be able to automatically mark them nounwind, IMO it would easily be worth it.
  • However, the suggestion of unwinding via implicit return values changes the story. It doesn't necessarily conflict with unwinding across C code, since you could implement the transformation at the LLVM level and apply it to both C and Rust code. But if the C code is already compiled for an existing ABI (e.g. because it's a system library, or just using a different compiler toolchain), there's clear value in being able to interoperate with that while still supporting unwinding within Rust code.
  • There do exist targets where unwinding across arbitrary stack frames is not just unimplemented or difficult to implement, but impossible. WebAssembly in its current state is a partial example; you can implement unwinding using a helper written in JavaScript, but there are some pure-WebAssembly environments that don't have JS at all.
  • That implies that it is arguably beneficial to (eventually) require an explicit #[unwind] for functions that want to unwind into C, so that it can produce a compile error on such targets. On the other hand, if unwindable were the default, it could still produce a runtime error.
  • That also implies that making "a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++" is impossible. If that's the requirement for #[unwind] to exist, then it can never exist.
    • I don't think we have to be that strict. After all, there is already panic=abort mode where unwinding is not supported at all, yet we still have catch_unwind. Using it makes your code less than fully portable, but you can still use it. The same should be true for unwinding across FFI.
    • Indeed, I believe it would be against Rust's philosophy to not expose useful and necessary platform functionality just because it is not portable.

Of course, this needs to go through an RFC. I don't think it needs to be a particularly "hard" RFC, at least if we're just stabilizing unwinding across C; it could be accepted quickly enough that there would be basically no benefit in changing the implementation to abort by default in the meantime. But this seems to have been rather controversial so far, so who knows...

For that to happen, someone needs to write the RFC. Does anyone want to volunteer to do that? Should I?

I also think it would be useful to fix and stabilize unwinding across C++, as a feature which might have an even narrower set of supported platforms, but which is not at all hard to implement on most existing platforms and could be quite useful for mixed C++-Rust codebases. But that comes later.

@comex

This comment has been minimized.

Copy link
Contributor

comex commented Mar 11, 2019

Oh, one more thing (I'd edit this in, but that doesn't help people reading via email):

If the unwind attribute is stabilized, rather than #[unwind(allowed)], I'd like to see something like #[unwind(C)], indicating that you want to unwind across C code. In the future there could be a separate #[unwind(C++)] and perhaps others. There wouldn't necessarily be any way to verify that you chose the right language, but it would make it more evident what exactly the implementation is guaranteeing to be safe, and would allow targets that supported unwinding across C but not C++ to make #[unwind(C++)] a compile error.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 13, 2019

@rfcbot concern need-documented-replacement

In the absence of a documented replacement for how people should handle errors in C libraries that only support handling through unwinding, closing this would break a common use case.

Another way to fix the UB might be to drop the LLVM "nounwind" attribute. We could also add #[unwind], though it'll take some exploration of the details there.

I'm happy to support this as a sensible default after we document exactly what we expect people to do when interacting with inflexible C libraries that expect unwind-based error handling.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

Whenever you have incompatible exception handling machinery you can always wrap it on both sides (using panic! & catch_unwind in Rust, throw & try/catch in C++, longjmp & setjmp or compiler extensions in C) and pass something closer to Result across the FFI boundary.

There's nothing special about exceptions when doing this, they're like any other "unrepresentable ABI" problem with FFI: you translate to something that is representable in both languages and pass that.

Opt-in language functionality like #[unwind], #[ffi_returns_twice], etc. can help in the long-term, but is IMO harder to get right, since you still have UB hazards left and right (sometimes more than translating a language feature to C ABI in that language) and can be less portable (e.g. incompatible unwinding mechanisms), so it's not a silver bullet.

I don't think that dropping nounwind will solve anything in the general case, as incompatible forms of exception handling are still UB by their very nature, it'd just make some code appear to work one some platforms.

EDIT: just saw #58794 (comment), and I agree with @comex (who went into more detail than me).

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 13, 2019

@eddyb Are you suggesting that any user of this should write C code to wrap any Rust callbacks and handle unwind there, and translate to/from a Rust Result?

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

@joshtriplett Yes, if they want to be portable.
Also, if the non-Rust library is using C++ exceptions, the glue code should be in C++, not C.

If you need to "unwind through C code", you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension, or try to find another way, I believe that's one of the things that's impossible/UB in the general case.

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Mar 13, 2019

@eddyb

(setjmp/longjmp might work in many cases with already-compiled C code but I can't in good faith recommend that approach)

Why not? This is precisely what those C libraries that only support unwind-based error handling expect callbacks to do.

In any case, unwinding works in Rust today for this particular use case (Rust -> C -> Rust callback)

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 13, 2019

you either need to compile the C code yourself with a compiler that supports a form of unwinding as an extension

Crates can reasonably do that by doing the build as part of build.rs.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

@joshtriplett One thing off the top of my head (but there might be more reasons): libraries written without unwinding support in mind and/or not compiled with -fexceptions (or equivalent) may be UB if certain code is skipped (e.g. a loop waiting for a mutex to be unlocked / a thread to exit).

@MOZGIII

This comment has been minimized.

Copy link

MOZGIII commented Mar 13, 2019

I'd say in general it's really UB, however we can have defined behavior with certain known conditions. For example, in Elbrus architecture frames are explit on the CPU level, and stack unwinding is guaranteed to actually jump through the stack frames (no longjmp there). So, on Elbrus, for the whole platform is seems reasonable to assume that the bahevior is known, and that the unwinding works for everything by default. I'm not an expert in Elbrus though, I just head it somewhere. (Please validate my before actually relying on this!) The point is, maybe it can be different per-platform?
I'm not an expert in Rust internals either, but maybe there could be a certain autotrait a compiler can generate for Fn on a case by case basis if it has an indication that unwinding is safe? Just a wild idea...

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Mar 13, 2019

Certainly unsafe code is allowed to assume its destructors won't be skipped when unrolling up the stack, otherwise things like the crossbeam scoped API would be completely unsound. I think the only thing being discussed here is whether it's okay to unwind past specific, known chunks of Rust code.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 13, 2019

We are mostly worried about unwinding past C code.

@MOZGIII

This comment has been minimized.

Copy link

MOZGIII commented Mar 13, 2019

I'm confused. Unwinding Rust code can be made safe within the sequence of Rust frames, being that sequence the top-level app stack, or a callback inside of a C/C++/whatever passed the control to the Rust code, right? Problems and uncertantries arise when we unwind across Rust and non-Rust stack frames, like from Rust code (running in callback), to C++ code (that invoked the callback), running in Rust code (in the Rust app, that uses C++ library that has a call that takes a callback). In this case it might be tricky to guarantee that unwinding will correctly pass through the C++ layer (Rust -> C++ -> Rust). And so on with other combinations. My point about Elbrus is all such cases baheve very similarly with their architecture with multiple stacks (they have separate frames and data stacks).
Also, any part of the code is allowed to catch the unwinding, in theory. Yeah, that's not as strightforward as I anticipated initially.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 14, 2019

C code compiled with C++ exceptions enabled is not C. ABI-wise it is probably more C++ than C.

If the aim is to convert Rust panics to C++ exceptions, and then catch C++ exceptions and convert them to Rust panics, supporting that via extern "c++" makes more sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.