Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default behavior of unwinding in FFI functions #58794

Closed
Mark-Simulacrum opened this issue Feb 28, 2019 · 133 comments
Closed

Default behavior of unwinding in FFI functions #58794

Mark-Simulacrum opened this issue Feb 28, 2019 · 133 comments

Comments

@Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Feb 28, 2019

This is the tracking issue for the behavior of unwinding through FFI functions.

There are two choices here: we can abort if unwinding occurs through an extern "C" boundary. We abort on beta 1.34 and nightly 1.35, but will permit unwinding in stable 1.33.

We previously attempted this change in 1.24 and reverted in 1.24.1. We attempted to do so again in 1.33, but reverted once again pending lang team discussion on the topic.

There has been discussion on this topic in #52652, #58760, and #55982.

The stable behavior of permitting unwinding is UB, and can be triggered in safe code (#52652 (comment)). Notably, mozjpeg depends on this behavior and seems to have no good stable alternatives; there's been some discussion on internals.

There is an RFC discussing this: rust-lang/rfcs#2699.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Feb 28, 2019

It would be helpful for the discussion if someone knowledgeable could write a summary covering the following:

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)
  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?
  • Are there any concrete plans for writing a custom unwinder/"tweaking the unwinder"? Has anyone tried this/is anyone working on this? (The answer to this question may be used to determine how Rust restricts itself by locking into the standard unwinder on major platforms.)

When I say major platforms, I mean GNU libunwind, Windows SEH, possibly others.


Is it possible to have different default behavior depending on which unwinder you're using? Say, unwind normally on major platforms, abort on others?

@comex
Copy link
Contributor

@comex comex commented Feb 28, 2019

  • What are the issues with allowing unwinding through foreign languages on major platforms? What is the interaction with C++ exceptions? (Saying it's "UB" doesn't cut it)

In another thread, @alexcrichton wrote:

From a technical perspective this is pretty feasible, but from a stabilization perspective is historically something we've never wanted to provide. We want the technical freedom to tweak unwinding as we see fit, which means it's not guaranteed to match what C++ does across every single platform.

As for the current implementation, my understanding is: on Unix it works; on Windows it mostly works, with some issues that could be solved. See my comment in that thread for more details.

  • What is the interaction between setjmp/longjmp and unwinding (in general, and on major platforms)?

On Unix, longjmp just resets the stack pointer and ignores unwinding.

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors, AFAIK. *

* (I said in other threads that it didn't, because I misread the description of this PR and thought that it changed things so destructors wouldn't run when unwinding via longjmp; in reality, it only did that to the abort-on-unwind handler itself.)

@nagisa
Copy link
Contributor

@nagisa nagisa commented Feb 28, 2019

On Windows, longjmp triggers SEH unwinding and so will run Rust destructors

As far as the last part of that is concerned, that is an implementation-specific and unspecified behaviour.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

The current behavior on stable amounts to a soundness hole. For example, based on #52652 (comment), we can write (playground):

extern "C" fn bad() {
    panic!()
}

fn main() {
    bad()
}

The behavior of this program is undefined on stable because we attach the nounwind LLVM attribute to bad.

Soundness is non-negotiable and as such we landed #55982 to close this soundness hole. However, since there was no explicit confirmation of this step by the language team the change was reverted on 1.33 pending confirmation. The change is still seen in beta and nightly compilers.

Based on notes by @alexcrichton in #52652 (comment), #55982 (comment), #55982 (comment), and #55982 (comment), I propose that we go ahead with and confirm the change in #55982.

@rfcbot merge

@rfcbot
Copy link

@rfcbot rfcbot commented Mar 10, 2019

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke. Do we really want to try merge it again without any changes?

The discussion on this topic is pretty fragmented, the internals thread mentioned in the top post has been quite active as well. I'm still waiting for the summary I requested #58794 (comment). I thought the scope of this issue is much bigger than what @Centril just mentioned. If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

This change was tried twice, and twice it was reverted because important parts of the ecosystem broke.

This is untrue. The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP (which we are doing now).

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Mar 10, 2019

The second time it was reverted it was reverted only because of the lack of a completed T-Lang FCP

I don't think so. If no one in the community would've complained about the change, I don't think this would've been reverted a day before the stable release even though no lang team discussion had happened yet. That might've been used as justification to actually do the revert, but it's certainly not the only reason.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

@jethrogb It most definitely was the only reason; the release team cannot undo language team decisions and had there been one we would not have reverted.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Mar 10, 2019

@Centril I'm saying that if there hadn't been any backlash no one would've even proposed to undo the change.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

@jethrogb Yes, there was backlash, but that was irrelevant to the acceptance or non-acceptance of the undo-PR itself. The sole reason for accepting the undo-PR was the lack of a completed T-Lang FCP.

@hanna-kruppe
Copy link
Contributor

@hanna-kruppe hanna-kruppe commented Mar 10, 2019

I don't have enough information to argue about procedural details and people's rationale for r+'ing this or that PR, and I wouldn't be very interested in doing so anyway. I just want to say that in the light of the the ongoing discussions and continued lack of consensus on how to address the legitimate needs of some projects to unwind through FFI, it seems premature to me to take this step now, just as it was premature the last times. Soundness is ultimately not negotiable, but there can absolutely be bad times and ways to roll out soundness fixes.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Mar 10, 2019

If you only care about the soundness issue at the IR level, another way to fix it is to never emit the nounwind attribute.

It was suggested to me in a private conversation that this might lead to performance loss. I'd like to see some numbers on that. Because things “work” most of the time right now, it seems to me that LLVM currently generates code that would be similar to the code it would generate without nounwind.

I wholeheartedly agree with @rkruppe. I feel like not emitting nounwind is a good alternative to fix the unsoundness now (although not solving UB in general), while keeping users happy, and it gives us time to search for a real solution. For this real solution, I'd like to see an RFC-style discussion with a solid motivation and discussion of alternatives.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

As long as the soundness hole is closed one way or the other (aborting, not emitting nounwind, ...) I think it's fine.

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed. If "soundness is ultimately not negotiable" is to have any meaning a possible outcome must be that the hole is fixed but there's no #[unwind(...)]. I think it is long overdue to fix the hole as it was reported first in 2014! We have also communicated about the problem in two separate release notes.

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Mar 10, 2019

However, I think we should separate discussion about new mechanisms like #[unwind(...)] from fixing the soundness hole. It cannot be that people knowingly depend on UB (and some do it unknowingly) and that therefore, we are forced to accept more additions to the language as so that the soundness hole can be closed.

I don't feel it is helpful to draw such an antagonistic picture. There literally is no way to do FFI unwinding safely in Rust currently, and some people got frustrated enough by that that they went with something that "happens to work". I have hacked around limitations in ugly ways often enough that I can totally sympathize. Sure, they should instead have written an RFC to provide a defined way to do what they needed to do, but that's a lot of work and not everyone is up for that kind of contribution.

The #[unwind] attribute was planned anyway, so there are no forced additions to the language here---just an adjustment to the transition plan. I hope this discussion kickstarts the RFC process for #[unwind], that's the most constructive outcome I can imagine here.

@hanna-kruppe
Copy link
Contributor

@hanna-kruppe hanna-kruppe commented Mar 10, 2019

@Centril It is indeed a possible outcome that the relevant teams ultimately decide "damn those programs and use cases, we won't provide a way to unwind through FFI". However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected -- as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years and in particular not providing an alternative way to address the very reasonable needs that cause users to write programs with this UB. We now have the situation that people trying to do certain (fairly reasonable!) things with Rust not only have no way to achieve it without writing programs that have UB, they do not even have an alternative in sight that they could switch to when those programs break.

Rust is well within its rights to break those programs, and I am definitely not arguing that the de facto behavior of today should be ad-hoc blessed as defined behavior, but it will cause users serious problems to not provide some alternative way to do what they need to do. We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix. For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken. Such a warning is not possible in this case (as it's about runtime behavior), but we should similarly do our best to ease the pain. Holding off pulling the trigger for another couple months (peanuts compared to how long the soundness issue has been open!) while waiting on other issues get worked out is a quite easy way to do that.

@Centril
Copy link
Contributor

@Centril Centril commented Mar 10, 2019

However, that would be a decision with severe downsides (more social ones than technical ones) which I don't think should be taken lightly, and IMO not at this very moment but rather after the other options have been explored and rejected

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

as @RalfJung said, the trajectory of #[unwind] so far is rather the opposite!

I first heard of the existence of #[unwind(...)] during the T-Release meeting where we decided to revert the change on 1.33. It also didn't have a tracking issue until 12 days ago. Moreover, @alexcrichton said this in #55982 (comment):

The #[unwind] attribute was added as a necessary evil when this first came up (but wasn't supposed to be necessary long-term) and is otherwise only tweaking a very low-level detail of LLVM that doesn't relate to the correctness of the API.

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

None of this suggests that "The #[unwind] attribute was planned anyway, [...]" or "the trajectory of #[unwind] so far is rather the opposite!".

While it's all good and well to say "this is UB, we've always said so, and programs with UB are completely invalid", the Rust project really made its bed itself here by not acting on the subject for years [...]

Yes, I'm quite unhappy about the inaction here. I think the reason for the inaction has precisely been that we didn't want to break anyone. In the future I hope that we set deadlines for and better track soundness holes and C-future-compatibility issues.

We should not cause users such problems if we can reasonably avoid it, even if it means delaying a soundness fix.

I think it is entirely reasonable that people use nightly until such time and help test the #[unwind(...)] attribute in the process. As @alexcrichton noted:

Additionally I don't think this can really bake without actually testing, this will remain virtually undetected unless everyone opts-in to testing it, so the only real way to get testing is to actually flip the defaults and see what happens. That's what we did last time and it's easy to always flip the defaults back if something comes up!

For comparison, some type system soundness bugs get a long grace period of time where the compiler warns instead of erroring on wrong programs to help people fix it before they get broken.

I'm well aware of C-future-compatibility issues and but I think we let them sit around for far too long without actionable and well-triaged plans to address them. I think we are in need of schedules and deadlines.

@comex
Copy link
Contributor

@comex comex commented Mar 11, 2019

I think there are also severe social downsides to not going ahead with this. Namely, we legitimize "There's no way to do X currently, so we'll do something that happens to work".

On a philosophical level, I disagree. It's not a question of "legitimizing". It's a fact of life that people will rely on implementation details whether they're supposed to or not, unless you actively prevent them from doing so. Ideally you do prevent them, like rustc does with #[feature] flags, or at least actively assist them in avoiding it, like the hypothetical undefined behavior checker will do for violations of the memory model in unsafe code. But if you don't (and in some situations you can't), you can't shrug off responsibility for the breakage that ensues when the implementation changes.

On a more practical note, among the links in the original post, I think this (from here) is a key quote:

For #[unwind] to become stable we'd need to provide a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++, which I don't really see happening any time soon, especially when we want to leave ourselves to implement unwinding via checked return values on targets where necessary.

My thoughts:

  • The nounwind LLVM attribute is a red herring, since sticking it on a handful of functions in a larger codebase is very unlikely to have a measurable effect on performance. (If you want it on all your functions, you can always use panic=abort.) If abort-on-unwind is not merged, the default nounwind should just be removed for now. Likewise, if we wanted to make extern "C" functions unwindable by default and the only downside was that we wouldn't be able to automatically mark them nounwind, IMO it would easily be worth it.
  • However, the suggestion of unwinding via implicit return values changes the story. It doesn't necessarily conflict with unwinding across C code, since you could implement the transformation at the LLVM level and apply it to both C and Rust code. But if the C code is already compiled for an existing ABI (e.g. because it's a system library, or just using a different compiler toolchain), there's clear value in being able to interoperate with that while still supporting unwinding within Rust code.
  • There do exist targets where unwinding across arbitrary stack frames is not just unimplemented or difficult to implement, but impossible. WebAssembly in its current state is a partial example; you can implement unwinding using a helper written in JavaScript, but there are some pure-WebAssembly environments that don't have JS at all.
  • That implies that it is arguably beneficial to (eventually) require an explicit #[unwind] for functions that want to unwind into C, so that it can produce a compile error on such targets. On the other hand, if unwindable were the default, it could still produce a runtime error.
  • That also implies that making "a guarantee that we actually implement a sound unwinding strategy for all possible platforms to go through C/C++" is impossible. If that's the requirement for #[unwind] to exist, then it can never exist.
    • I don't think we have to be that strict. After all, there is already panic=abort mode where unwinding is not supported at all, yet we still have catch_unwind. Using it makes your code less than fully portable, but you can still use it. The same should be true for unwinding across FFI.
    • Indeed, I believe it would be against Rust's philosophy to not expose useful and necessary platform functionality just because it is not portable.

Of course, this needs to go through an RFC. I don't think it needs to be a particularly "hard" RFC, at least if we're just stabilizing unwinding across C; it could be accepted quickly enough that there would be basically no benefit in changing the implementation to abort by default in the meantime. But this seems to have been rather controversial so far, so who knows...

For that to happen, someone needs to write the RFC. Does anyone want to volunteer to do that? Should I?

I also think it would be useful to fix and stabilize unwinding across C++, as a feature which might have an even narrower set of supported platforms, but which is not at all hard to implement on most existing platforms and could be quite useful for mixed C++-Rust codebases. But that comes later.

@comex
Copy link
Contributor

@comex comex commented Mar 11, 2019

Oh, one more thing (I'd edit this in, but that doesn't help people reading via email):

If the unwind attribute is stabilized, rather than #[unwind(allowed)], I'd like to see something like #[unwind(C)], indicating that you want to unwind across C code. In the future there could be a separate #[unwind(C++)] and perhaps others. There wouldn't necessarily be any way to verify that you chose the right language, but it would make it more evident what exactly the implementation is guaranteeing to be safe, and would allow targets that supported unwinding across C but not C++ to make #[unwind(C++)] a compile error.

@joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Mar 13, 2019

@rfcbot concern need-documented-replacement

In the absence of a documented replacement for how people should handle errors in C libraries that only support handling through unwinding, closing this would break a common use case.

Another way to fix the UB might be to drop the LLVM "nounwind" attribute. We could also add #[unwind], though it'll take some exploration of the details there.

I'm happy to support this as a sensible default after we document exactly what we expect people to do when interacting with inflexible C libraries that expect unwind-based error handling.

@gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Jul 11, 2019

I wanted to apologize. I was called out by using the unconstructive word malice here. In that sentence, by us, I meant rustc, the tool, as in, it would be "evil" for rustc to continue emitting nounwind if we aren't going to land a fix for this soon.

@joshtriplett I agree with you about just trying to remove nounwind. There is precedence about doing this, e.g., when LLVM was miscompiling noalias, we stopped emitting it, for a long time.

In retrospect, we have had a mis-optimization that impacts all Rust users and a PR to fix it for over a year. During this whole time, we always had the chance to just stop emitting nounwind and protect all Rust users. Instead, we didn't do that. First, because the fix was landing "real soon", so why waste time on a workaround. Second, because we didn't wanted to land something that would justify to continue to exploit the undefined behavior. For some reason, for some of us, banning the exploit of UB here was more important than protecting all Rust users. Both were mistakes, and I hope that we can learn from them and, in the future, try to land the quickest fix possible that protects all users first, and only start worrying about proper solutions once that is done. You never know when a simple fix is going to take 1 day or 1 year to land.

bors added a commit that referenced this issue Jul 12, 2019
[beta] Permit unwinding through FFI by default

This repeats #61569 for Rust 1.37, as #58794 is not yet resolved.

cc @rust-lang/release
r? @Mark-Simulacrum
@joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Jul 18, 2019

@gnzlbg Thank you for taking the time to explain in more detail, and thank you for taking the time to understand the problem and past stresses involved.

It sounds like we're now in agreement on the right ways forward here?

Let's introduce the simplest possible mechanism to just remove nounwind from a function. I don't know if that should be a simplified version of @BatmanAoD's RFC 2699, or if we should just introduce a new RFC, but either way we should get this in place ASAP.

Once we have that in place, we also need to decide how long after that we start aborting again on unwind of an extern function call tagged with nounwind. At this point, with the amount of attention this has received, I wouldn't have an objection to just doing so at the same time, and providing some very clear release notes explaining the issue.

@rust-lang/lang and @gnzlbg: Do you believe that the easiest path forward is a new RFC or some updates to RFC 2699? I'm starting to think that a new RFC might be better, and then we can leave RFC 2699 for the problem of standardizing panic/unwind ABIs.

@BatmanAoD
Copy link
Member

@BatmanAoD BatmanAoD commented Jul 18, 2019

@joshtriplett Note that the #[unwind(allowed)] annotation already exists; an RFC would just need to stabilize it.

Having recommended in my RFC that the attribute itself be dependent on the compiler's ability to verify that the panic implementation has been explicitly selected (rather than using the default, which may be a moving target as rustc continues to evolve), I'm not sure I'd be in favor of actually stabilizing the attribute as-is. But it would be sufficient to permit mozjpeg-sys et al. continue to work, I think.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Jul 19, 2019

@joshtriplett I don't want to create a divide where there is none, but it seems to me that @gnzlbg is advocating to just stop emitting nounwind to LLVM altogether while we wait for a solution. I don't see them expressing an opinion on stabilizing (or removing) the unwind(allowed) attribute.

@joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Jul 19, 2019

@BatmanAoD I would like to see your RFC continue, to develop a more precise specification of panic behavior. But I'd also like to see #[unwind(allowed)] stabilized more-or-less as-is, as soon as reasonably possible.

@jethrogb If we can't trivially get #[unwind(allowed)] stabilized, then I'm all for fixing the problem that way, but if we can get #[unwind(allowed)] stabilized quickly then let's do that.

@BatmanAoD
Copy link
Member

@BatmanAoD BatmanAoD commented Jul 19, 2019

@joshtriplett Here is the tracking issue for stabilizing #[unwind(allowed)]: #58760

cuviper added a commit to cuviper/rust that referenced this issue Jul 31, 2019
cuviper added a commit to cuviper/rust that referenced this issue Aug 25, 2019
bors added a commit that referenced this issue Aug 26, 2019
Permit unwinding through FFI by default

This repeats #62505 for master (Rust 1.38+), as #58794 is not yet resolved. This is a stopgap until a stable alternative is available, like [RFC 2699](rust-lang/rfcs#2699), as long as progress is being made to that end.

r? @joshtriplett
@cuviper
Copy link
Member

@cuviper cuviper commented Aug 29, 2019

For those watching, @joshtriplett just proposed a simple #[unwind] specification.

@joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Sep 9, 2019

@rfcbot resolved blocked-on-rfc-2699
@rfcbot concern blocked-on-rfc-2753

Summarizing much further discussion: RFC 2699 is a much more general "how do we handle cross-language panics/unwinds/etc", while RFC 2753 is a much simpler proposal to just provide an #[unwind] attribute to disable nounwind and fix the UB.

@nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Nov 22, 2019

@rfcbot fcp cancel

So a lot has happened since this FCP was created. In RFC rust-lang/rfcs#2797, we created the ffi-unwind project group, for one thing. One of the first things we've been doing is debating this exact topic in some depth. Therefore, I'm going to cancel this FCP and close this issue -- we expect to try and resolve this issue very soon with a firmer decision about the direction we're going, and we'll re-evaluate in light of that.

@rfcbot
Copy link

@rfcbot rfcbot commented Nov 22, 2019

@nikomatsakis proposal cancelled.

@nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Nov 22, 2019

If you'd like to participate in the group, btw, most of the discussion has been taking place in Zulip in the #wg-ffi-unwind stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet