Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nounwind attribute logic #63909

Open
wants to merge 4 commits into
base: master
from

Conversation

@RalfJung
Copy link
Member

commented Aug 26, 2019

With #62603 landed, we no longer add abort-on-panic shims to extern "C" functions. However, that means we should also not emit nounwind for extern fn because we are not actually catching those panics. The comment justifying that nounwind in the source is just factually wrong.

I also noticed that we annotate extern declarations (of any ABI) with nounwind, which seems wrong? I removed this. Code like mozjpeg (the code that prompted the removal of the abort-on-panic shim) throws a panic from Rust code that enters C code (meaning the extern "C" definition must not have nounwind, or else we got UB), but then the panic bubbles through the C code and re-enters Rust code (meaning the extern "C" declaration that gets called from the Rust side must not have nounwind either).

This should be beta-backported if #62603 gets backported.

TODO: add test case for #63943.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Aug 26, 2019

r? @estebank

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -6,14 +6,36 @@
extern {
// CHECK: Function Attrs: nounwind
// CHECK-NEXT: declare void @extern_fn
fn extern_fn();
// CHECK-NOT: Function Attrs: nounwind
fn extern_fn(); // assumed not to unwind

This comment has been minimized.

Copy link
@RalfJung

RalfJung Aug 26, 2019

Author Member

This part of the test seems dubious to me. Wouldn't code that bubbles Rust panics through C (the kind of code for which #62603 got merged) also have an extern "C" declaration like this where the panic enters back into Rust?

This comment was marked as resolved.

Copy link
@gnzlbg

gnzlbg Aug 26, 2019

Contributor

For nightly Rust, where extern "C" definitions panic, and extern "C" declarations that panic are UB, this test is correct.

For stable Rust, where extern "C" declarations that panic are UB, but we don't want to temporarily break code that does it, this test should check that nounwind is not present..

This comment has been minimized.

Copy link
@RalfJung

RalfJung Aug 26, 2019

Author Member

We don't have a stable vs nightly difference in semantics. That would be a bad mess.

We document panicking from extern "C" as UB, but (similar to #62825) are willing to patch things such that code exploiting this UB is not broken until we provide adequate alternatives to said code.

So, I think this function should not be unwind. I am waiting for @alexcrichton to confirm.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Aug 26, 2019

Contributor

So, I think this function should not be unwind

This is right. The check should say CHECK-NOT: nounwind.

This comment has been minimized.

Copy link
@alexcrichton

alexcrichton Aug 26, 2019

Member

I believe this is correct, all imported FFI functions, by default, should be tagged with nounwind

This comment has been minimized.

Copy link
@RalfJung

RalfJung Aug 26, 2019

Author Member

But doesn't this mean that when e.g. mozjpeg calls C that calls Rust (the situation due to which #62603 was accepted), and a panic passes from C to Rust, we still violate nounwind?

This PR removes the nounwind on the first edge the panic crosses (Rust -> C), but there's another edge (C -> Rust) and we should not have nounwind there either.

This comment has been minimized.

Copy link
@gnzlbg

gnzlbg Aug 27, 2019

Contributor

This PR removes the nounwind on the first edge the panic crosses (Rust -> C), but there's another edge (C -> Rust) and we should not have nounwind there either.

Yes, that's correct, that is, the check should be CHECK-NOT: nounwind. Otherwise, the second example in #63943, which is equivalent to mozjpeg use case, would still be miscompiled.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Aug 27, 2019

Author Member

Okay, I have implemented that.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

I also noticed that we annotate extern "Rust" declarations with nounwind, which seems wrong?

Yeah, that sounds like a big bug. cc @eddyb (I think we discussed that at some point?)


This fixes a soundness bug that causes mis-compilations, and is IMO release notes worthy. cc @Centril

@gnzlbg
Copy link
Contributor

left a comment

So I've left some comments, under the assumption that this primarly targets master, and therefore Rust nightly. Some of the tests would be incorrect for beta/stable, because there the semantics differ. I mentioned that in the comments, but I don't know how best to adress that. I think it would probably make sense to add what makes sense for nightly here, and tune the tests for stable Rust when doing the beta backport.

@@ -6,14 +6,36 @@
extern {
// CHECK: Function Attrs: nounwind
// CHECK-NEXT: declare void @extern_fn
fn extern_fn();
// CHECK-NOT: Function Attrs: nounwind
fn extern_fn(); // assumed not to unwind

This comment was marked as resolved.

Copy link
@gnzlbg

gnzlbg Aug 26, 2019

Contributor

For nightly Rust, where extern "C" definitions panic, and extern "C" declarations that panic are UB, this test is correct.

For stable Rust, where extern "C" declarations that panic are UB, but we don't want to temporarily break code that does it, this test should check that nounwind is not present..

src/test/codegen/nounwind-extern.rs Outdated Show resolved Hide resolved
src/test/codegen/unwind-extern.rs Show resolved Hide resolved
src/test/codegen/unwind-extern.rs Show resolved Hide resolved
src/test/codegen/unwind-extern.rs Show resolved Hide resolved
src/test/codegen/unwind-extern.rs Show resolved Hide resolved
@RalfJung

This comment has been minimized.

Copy link
Member Author

commented Aug 26, 2019

So I've left some comments, under the assumption that this primarly targets master, and therefore Rust nightly. Some of the tests would be incorrect for beta/stable, because there the semantics differ.

How do they differ? #62603 has landed on master and I am assuming will be backported to beta. So the semantics should be the same.

@RalfJung

This comment has been minimized.

Copy link
Member Author

commented Aug 26, 2019

This fixes a soundness bug that causes mis-compilations, and is IMO release notes worthy.

Which mis-compilations does this fix? I do not know of any.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

On my machine using stable Rust 1.37.0:

extern "C" fn bar() -> ! { panic!("lolz") }
extern "C" fn baz() -> i32 {
    if let Ok(_) = std::panic::catch_unwind(|| bar() ) {
        unsafe { std::hint::unreachable_unchecked() }
    }
    42
}
fn main() { std::process::exit((baz() != 42) as i32); }

returns success with cargo run --release but fails with RUSTFLAGS="-C lto=fat" cargo run --release. If I change bar to:

fn bar() -> ! { panic!("lolz") } // no extern "C"

then it always works.

If unwinding from extern "C" should be supported, e.g., for Rust->C->Rust FFI, then the code above should always work.

This means that for Rust->C-Rust, adapting the code above to the following:

// foo.cpp
using fn_type = void(*)();
[[ noreturn ]] extern "C" void foo(fn_type x);
[[ noreturn ]] extern "C" void foo(fn_type x)  { x(); /* unreachable: */ throw 0; }
// main.rs
extern "C" { fn foo(x: extern "C" fn() -> !) -> !; }
extern "C" fn bar() -> ! { panic!("lolz") }
extern "C" fn baz() -> i32 {
    if let Ok(_) = std::panic::catch_unwind(|| unsafe { foo(bar) }) {
        unsafe { std::hint::unreachable_unchecked() }
    }
    42
}
fn main() {
   std::process::exit((baz() != 42) as i32);
}

should also always return success, but like the example above, it returns success with cargo run --release but fails with RUSTFLAGS="-C lto=fat" cargo run --release. Removing extern "C" from bar fixes it.

That is, AFAICT, as long as we make extern "C" function definitions nounwind, Rust->C->Rust is unsound, and examples like this get miscompiled on stable Rust. It looks like this PR should actually fix that - otherwise, there is another issue somewhere else breaking Rust->C->Rust FFI.

@RalfJung

This comment has been minimized.

Copy link
Member Author

commented Aug 26, 2019

returns success with cargo run --release but fails with RUSTFLAGS="-C lto=fat" cargo run --release

Good catch!
Still, we don't usually put all soundness fixes into the release notes, do we?

That is, AFAICT, as long as we make extern "C" function definitions nounwind, Rust->C->Rust is unsound, and examples like this get miscompiled on stable Rust. It looks like this PR should actually fix that - otherwise, there is another issue somewhere else breaking Rust->C->Rust FFI.

This PR does fix that, doesn't it?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

Still, we don't usually put all soundness fixes into the release notes, do we?

I don't know, no idea who decides that. I thought this might at least be interesting enough for that.

This PR does fix that, doesn't it?

I haven't tried a patched compiler - will try when this lands on nightly unless somebody beats me to it.

@cuviper

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

@gnzlbg Note that we've removed the abort-unwind behavior difference between nightly/beta/stable, per #62505 (comment) -> #62603.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

@gnzlbg Note that we've removed the abort-unwind behavior difference between nightly/beta/stable, per #62505 (comment) -> #62603.

Ah! I did not know. Sorry @RalfJung , my comments are all wrong, they were referring to the old nightly-only behavior.

@estebank

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

@rust-lang/compiler this is outside of my wheelhouse.

@estebank

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2019

@alexcrichton
Copy link
Member

left a comment

In its current state this seems like it's undoing an optimization which has been in rustc for quite some time? I was under the impression that it was well known that extern imported functions (and defined functions in Rust) are undefined behavior if they unwind. This looks to be fixing that situation but this has historically been a relatively important optimization for some scenarios.

src/librustc_codegen_llvm/attributes.rs Outdated Show resolved Hide resolved
@rfcbot

This comment has been minimized.

Copy link

commented Sep 12, 2019

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@cuviper

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

We will make sure to allow enough time for everyone to either check a box or raise a concern,

Time-check: 1.38 releases in two weeks, on September 26th. Can we at least get a beta decision on #62603's backport? Just to maintain the status quo compared to stable...

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

Just to confirm:

since we don't abort, we should also stop marking nounwind.

This would only be the case for function definitions, right ? nounwind would still be emitted for extern "C" function declarations, right ?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

Ah no, this removes nounwind for declarations and definitions.

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

@cuviper I believe that got discussed on Zulip or Discord, with a decision to proceed as it was on previous stable and current nightly. @Mark-Simulacrum Can you confirm?

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

The release team discussed this general problem and wanted to see compiler team backport approval for #62603 specifically, not this PR. I do not believe that such an approval has happened as of this writing.

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

@Mark-Simulacrum Sorry, I think I misunderstood. I thought it was that the team agreed on a backport of #62603, and hadn't decided anything about this PR because it wasn't even on nightly yet.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

#62603 has not been approved for beta backport by the compiler team. The release team wants to see that approval, so as to avoid 1-release-long breakage. This PR is separate (the release team is not currently interested in a beta backport for this PR, primarily because it is an improvement whereas #62603 is a straight "let's do the exact same as on stable").

(Edit: To be clear, I and I believe the release team is not really against a backport of this PR, just that we haven't discussed it ourselves).

@joshtriplett

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

@Mark-Simulacrum Got it, thank you for clarifying.

@cuviper

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

@joshtriplett I know we've discussed that a few times in different settings. I took your verbal approval at RustConf to r+ #62603, but it's still waiting for beta-accepted. Call it an abundance of process-oriented caution. :)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

@rfcbot fcp cancel -- I'm canceling the FCP because T-compiler didn't really need to be included. The question here is more of a "policy" question.

@rfcbot

This comment has been minimized.

Copy link

commented Sep 12, 2019

@nikomatsakis proposal cancelled.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

@rfcbot merge

In the language team meeting, we confirmed that we would like to merge this, to remove the current inconsistency: since we don't abort, we should also stop marking nounwind.

We would like to see progress on RFC 2753, to provide a long-term solution, and when an implementation of RFC 2753 gets merged, we should go back to using nounwind and aborting on unwind.

Meanwhile, we feel that merging this will actually provide motivation to make that progress, as it removes an optimization opportunity. We would very much like to see that optimization opportunity reclaimed.

Note: using rfcbot for consensus, but if rfcbot completes we'd like to go ahead and merge and beta-nominate this. We will make sure to allow enough time for everyone to either check a box or raise a concern, though.

Please feel free to use/edit any portion of this message to help with release notes; I'm also happy to help summarize this issue for release notes.

-- Text originally by @joshtriplett

@rfcbot

This comment has been minimized.

Copy link

commented Sep 12, 2019

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Centril

This comment has been minimized.

Copy link
Member

commented Sep 13, 2019

@rfcbot concern lets-not-increase-de-facto-reliance-meanwhile

I noted in #63909 (comment) and in conversations with @joshtriplett (where I also said I'd leave such a concern on FCP-merge...) that:

Based on recent discussion in rust-lang/rfcs#2753 I think the idea of reaching quick consensus and stabilization on that RFC was overly optimistic. Therefore, I'd like for this to remain UB (i.e. the PR should not land) as I believe a prolonged time within which we don't emit nounwind will just increase reliance on us not doing so.

Also, situations like noalias (although that was due to LLVM bugs) suggests that once we remove these attributes it becomes increasingly hard to readd them.

I strongly disagree. Landing #62603 and not landing my PR here feels disingenious -- I am aware that that is not your intent, but putting on the hat of a Rust language user, that is the message that would come across to me. The lang team should take a stance and either respect or not respect the existing UB-exploiting code -- whereas right now, we are somewhat respecting it but then not really: landing #62603 indicates "we respect your use-case", but then not landing this PR here means we still are happily risking subtle miscompilations in such code. These are very contradicting signals we are sending here, which is no good.

I don't respect the existing UB-exploiting code. I think I've made that clear before that I do not agree with "people need to do their jobs" based reasoning. On a social level, I find that objectionable behavior to do so knowingly and it leads to "tragedy of the commons" situation. There are differences of opinion on the language which is why there are contradicting signals. What I think we're really agreeing to at the moment is "keeping the status quo". I should note that when I FCP-ed #58794 (comment), I expected this to be a mere formality.

The story would be different if there is a big benefit to actually telling LLVM about this bit of Rust-level UB -- but so far, all we have in that regard are claims that the optimization is important, but no data. The perf results for this PR are negative (as in, no perf change); rustc is not the heaviest user of FFI but it does a fair bit of FFI to LLVM so I think this is not an entirely silly benchmark either.

@gnzlbg seems to be of a different opinion re. noalias here at least. I would also like to not end up in a situation where we de-facto stabilize the ABI.

You were also not using performance impact as an argument, making it seem like you want to use miscompilations(*) as a means to "teach users a lesson", which I consider one of the worst possible reasons to miscompile code -- in particular because landing #62603 teaches the opposite lesson.

I don't think that not actively facilitating reliance on spec-UB is "teaching users a lesson".

Note that this is not about using UB to disincentivize users to use some feature; you are proposing to risk actual breakage in the compiled code. All I want to do is remove that risk. There is no spec change involved here. This PR does not change what is UB in Rust.

When you have a single compiler, not-rustc-UB tends to translate into not-spec-UB over time.

And just to reiterate, I would be totally on board with closing this PR if we re-enable the abort-on-panic shims, and if we do it for real (i.e., let it ride the trains to stable).

That would be my preference but that option seems foreclosed until we have a stable alternative.

Meanwhile, we feel that merging this will actually provide motivation to make that progress, as it removes an optimization opportunity. We would very much like to see that optimization opportunity reclaimed.

I disagree with this rationale. First, I think it puts pressure on the wrong people. Second, I think this will contribute to stress and rushing through a sub-optimal solution because time is running out. I don't think either of those are a good idea.


I would not like to spend further language team meeting time discussing this as it seems clear to me that it would not change things and the time is better spent on other things, e.g. figuring out a stable alternative.

@BatmanAoD

This comment has been minimized.

Copy link

commented Sep 13, 2019

First, I think it puts pressure on the wrong people.

Maybe this is just a poor choice of phrase, but are you suggesting that there are people on whom there should be more pressure? If so, why?

I do not agree with "people need to do their jobs" based reasoning.

I haven't heard this phrase before; do you mind explaining what kind of reasoning you're objecting to?

@Centril

This comment has been minimized.

Copy link
Member

commented Sep 13, 2019

Maybe this is just a poor choice of phrase, but are you suggesting that there are people on whom there should be more pressure? If so, why?

What I think is that more pressure should not be laid on the folks who benefit from the optimization. Keeping the status-quo of pressure on folks who do rely on what we have said is UB is (acceptable) fallout from that.

I haven't heard this phrase before; do you mind explaining what kind of reasoning you're objecting to?

I object to the reasoning that it's socially acceptable to rely on things that are explicitly UB "because there's no other way to do it" and that "folks need to do their jobs". Lotsa folks need to do their job but such a reliance may be detrimental to other users, the lang team, etc. Also, who determines "what you want to do is reasonable"? Folks have varying opinions on that... e.g. some think a fully stable ABI is reasonable (I don't).

@RalfJung

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2019

Also, situations like noalias (although that was due to LLVM bugs) suggests that once we remove these attributes it becomes increasingly hard to readd them.

I see no parallel here. No noalias violations that we have currently would have been prevented by keeping the attribute in. They got detected only because Miri was able to catch them, not because of miscompilations; and some of them predate the noalias removal.

I don't respect the existing UB-exploiting code.

Then we should add the abort-on-panic shim and call it done.

The current situation is not a middle-ground. It is a "both sides are worse off" kind of situation.

I don't think that not actively facilitating reliance on spec-UB is "teaching users a lesson".

Well, I am not sure which other terms to use for your argument here. The main argument I could extract so far is "if we don't do this, it could happen that someone else might also start doing what mozjpeg already does". I think that argument can be rephrased as "we should use de-facto (not de-jure!) UB as a teaching device" (the kind of teaching device that hurts you when you do the wrong thing). Maybe I missed other arguments you made?

FWIW, this argument I mentioned is not backed by data (quite the contrary: it's been many months since we first reverted the abort-on-panic shim and no new users came up), and I don't even see the thing it wants to prevent as a big problem (we are not changing our intentions to re-add the attribute once the shim is back in place). I see miscompilations in production code as a much bigger problem.

When you have a single compiler, not-rustc-UB tends to translate into not-spec-UB over time.

I see little to no evidence for that. In the UCG discussions, a lot of the UB we are considering is not-de-facto UB. For example, barely any of our aliasing rules are de-facto UB right now, and still the general reception of Miri being very strict about checking these rules is positive.

I object to the reasoning that it's socially acceptable to rely on things that are explicitly UB "because there's no other way to do it" and that "folks need to do their jobs". Lotsa folks need to do their job but such a reliance may be detrimental to other users, the lang team, etc. Also, who determines "what you want to do is reasonable"? Folks have varying opinions on that... e.g. some think a fully stable ABI is reasonable (I don't).

Language design involves (amongst others) trade-offs between the interests of the language designers and its users -- and in your consideration, I see no indication that user's interests are taken into account. I would be entirely on-board with your reasoning if this was a research language, but I think for a language that wants to be a foundational piece of infrastructure, this kind of no-compromise attitude will do more harm than good.

The picture you are painting of Rust users is an adversarial one: if we give in an inch by not making LLVM exploit this UB any more, they'll jump on that and completely disregard the intentions behind that change.

This picture does not at all reflect my experience. Most of the time my impression is that users understand the language design concerns around UB, and understand that we want to keep our options open; my discussions along those lines have mostly been constructive (there are exceptions, of course). Users don't want to halt Rust language development, but they also want to use Rust to do "something" and sometimes, only after doing all the work they realize that what they want to do is not (de-jure) possible in Rust.

To speak more concretely, I am regularly talking with users about Stacked Borrows or validity invariants; much of that is de-jure but not de-facto UB and it is also much more fuzzy than nounwind. For example, for validity invariants, the current de-jure UB contradicts our own advice from two years ago (mem::uninitialized)! I expected a lot of pushback, people saying "this always used to work, it should still work". But mostly, people were just happy that we gave them precise rules, even if that requires changing the entire ecosystem to use MaybeUninit. So, if anything, the data we have indicates that we can constructively work with our users and turn de-jure into de-facto UB once we give them a proper migration path and help them detect UB (all of which is given for nounwind, much more so than for uninitialized data). The same kinds of conversations happened for Stacked Borrows.

So, I think your worry is unfounded. I doubt there will be a lot of pushback against re-adding the attribute and the abort-on-panic shim once one of the RFCs lands; if past experience tells us anything, people will just be happy that finally there are some proper rules for what they want to do.

@BatmanAoD

This comment has been minimized.

Copy link

commented Sep 14, 2019

Thank you very much for your comment. I do have one small correction.

it's been many months since we first reverted the abort-on-panic shim and no new users came up.

There are several users of cross-language unwinding participating in RFC 2753 (though to be fair, they may have been doing so since before the abort revert). In particular, @acfoltzer has explained how Lucet uses this functionality and is helping draft the RFC text itself.

@acfoltzer

This comment has been minimized.

Copy link

commented Sep 16, 2019

I doubt there will be a lot of pushback against re-adding the attribute and the abort-on-panic shim once one of the RFCs lands; if past experience tells us anything, people will just be happy that finally there are some proper rules for what they want to do.

As one of the people who would benefit in the interim from the lack of nounwind, I can confirm that I'll be absolutely thrilled to ditch that behavior once an RFC lands. After all, it'll hopefully contain some of my own text 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.