Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upRegression in rlua crate tests between rustc 1.23.0 and 1.24.0, only on windows. #48251
Comments
This comment has been minimized.
This comment has been minimized.
|
One possibility as to why that it only happens on windows is that the implementation of longjmp on that platform uses the same exception framework that Rust uses for panics. |
This comment has been minimized.
This comment has been minimized.
That is interesting! That WOULD explain the behavior I'm seeing. |
kyren
referenced this issue
Feb 16, 2018
Closed
rlua error handling is completely broken on windows with rustc 1.24+ #71
This comment has been minimized.
This comment has been minimized.
It's possible to temporarily unlock unstable features on stable channel with env var ( |
kennytm
added
O-windows
regression-from-stable-to-stable
C-bug
labels
Feb 16, 2018
This comment has been minimized.
This comment has been minimized.
That's incredibly helpful, thank you! |
This comment has been minimized.
This comment has been minimized.
|
Here is a summary of the current situation as I understand it: kyren/rlua#71 (comment) I may have gotten some details wrong, if I have feel free to |
This comment has been minimized.
This comment has been minimized.
SoniEx2
commented
Feb 18, 2018
|
I guess when they say it's a "C API" they are not kidding. :p Would it make sense to make an standalone wrapper, called "Universal API for Lua, with C calling conventions"? |
This comment has been minimized.
This comment has been minimized.
Lua's API is not my favorite :( |
This comment has been minimized.
This comment has been minimized.
|
Hmm. I think there's a good case to be made for disabling #46833 -- or at least making it opt-in -- until there are stable attributes to opt-out. @wycats points out that this likely to affect Helix as well. I would definitely prefer not to have public packages relying on |
This comment has been minimized.
This comment has been minimized.
|
Re-reading the comments on that PR, I see that I wrote this:
But this got overlooked. |
This comment has been minimized.
This comment has been minimized.
I knew that was wrong when I did it, I'll forget this hack exists :P |
This comment has been minimized.
This comment has been minimized.
Something I wanted to point out, is that even if a crater run was done, it wouldn't have caught my issue at least because as I understand it they're only done on linux? If there was some kind of donation box for running crater on windows I would contribute to it! |
This comment has been minimized.
This comment has been minimized.
|
@kyren I believe crater actually tests windows these days, but I could be wrong. Also, cross-posting from the chucklefish/rlua repo:
Can anyone verify if this is correct? I think in my ideal world, we would not require any attributes -- rather, we would have a way to skip the panic in the case where Rust is using the native unwind facility (but you would opt-in to that, probably, similar to |
This comment has been minimized.
This comment has been minimized.
|
I talked a bit more with @wycats about the Helix situation, which is somewhat different, though very much related: Ruby also sometimes longjmps. @wycats wanted some form of attribute that could be used to label functions that longjmp so that the compiler could verify that there is nothing on the stack with a dtor. This seems like it would also help @kyren in verifying this claim that they made:
|
This comment has been minimized.
This comment has been minimized.
|
At this point, it seems pretty clear to me that:
In the short term, I think we should amend #46833 so that the "panic wrapper" uses an (unstable) opt-in -- e.g., |
This comment has been minimized.
This comment has been minimized.
|
Is it possible to adjust the implementation to only catch Rust-generated unwinding? I seem to remember that we did something like this in the past to not conflict with C++ exceptions but I may be misremembering. |
This comment has been minimized.
This comment has been minimized.
I'm sorry to say that it doesn't :( off the top of my head it's going to require a reasonable amount of work as well. |
This comment has been minimized.
This comment has been minimized.
|
Sorry for the breakage y'all are seeing here @kyren! We've long wanted to improve our ecosystem-wide regression testing to include Windows as well to catch issues like this sooner, but we still have yet to implement that :( Your explanation of what's happening was quite helpful to me, and it also reminded me of what I believe is an existing bug in Rust! I'm unfortunately traveling right now and not at a Windows computer, but I can try to verify this information when I get back home in a week. In any case though I believe this is actually a fixable bug in Rust without actually adding any new attributes or stabilizing anything. I agree with @nikomatsakis that we'll probably want to have some form of dealing with exceptions at the FFI boundary eventually but the longer we can put that off the better! (in my opinion at least) The tl;dr; is that I believe we need to implement exception filters in our landing pads. Rust landing pads should, in theory, only run for Rust exceptions. Not for all SEH exceptions. I'm pretty certain this bug has come up before in the past but I was unfortunately unable to dig up the issue report. The crux of this issue is that code like this: struct A;
impl Drop for A {
fn drop(&mut self) {
println!("hello!");
}
}
fn main() {
let _a = A;
cause_a_segfault();
}
fn cause_a_segfault() {
unsafe { *(0x1 as *mut u32) = 3; }
}I think will actually print out "hello!". The bug may be more nuanced than this but I'm pretty certain something like this is a problem, I can try to create an actual reproduction of this sort of issue when I get back home to a Windows computer to try out some various pieces. Basically what this means is that our "landing pads" are executing for any and all SEH exceptions. This seems like it includes setjmp/longjmp, which is a bug! (a very longstanding bug!) This is part of the whole "we don't want you unwinding into Rust" in that making this work is tricky and whatnot, but it turns out that this is our own problem where there's a whole bunch of ways to "unwind into Rust" on MSVC using SEH that we must deal with. For those familiar with LLVM I believe the problem looks like this: fn foo2() {}
#[no_mangle]
pub extern fn bar() {
foo2();
}On MSVC this code currently generates IR that looks like: define void @bar() unnamed_addr #1 personality i32 (...)* @__CxxFrameHandler3 {
start:
; invoke foo::foo2
invoke void @_ZN3foo4foo217h18e6b2931a4bedc6E()
to label %bb2 unwind label %funclet_bb1
bb1: ; preds = %funclet_bb1
call void @llvm.trap()
unreachable
bb2: ; preds = %start
ret void
funclet_bb1: ; preds = %start
%cleanuppad = cleanuppad within none []
br label %bb1
}The problem (as I understand it), is that if %cleanuppad = cleanuppad within none [](or something about that line I believe) I remember that when implementing unwinding on MSVC I never did figure out how to place a filter here on "only run this for Rust exceptions". Other platforms like MinGW which use a custom personality function have filtering enabled which (IIRC) only runs cleanups for Rust-originating exceptions (what we want). On MSVC I never got a custom personality working (LLVM only recognizes a few I think?) so sticking to one of the standard personality functions meant we got something working quickly at least! Now all is not lost I think! I believe what we may want to do is alter the default codegen for landing pads. We already have implemented a method of "only catch Rust exceptions" as that's why Phew! That's quite a lot of information. To summarize, though:
|
This comment has been minimized.
This comment has been minimized.
|
Such a wonderful, detailed explanation, thank you @alexcrichton!
I'm totally fine with this hacky fix until 1.25! If you're okay with it, I'd like to do a release of my crate with this hack for the time being, and then when 1.25 hits remove it and just require rustc >= 1.25. However, let's be realistic, I'd obviously prioritize the entire rust ecosystem over my one little crate, so if you think that's a problem I don't have to do that. This is not such a problem for Chucklefish, since we have sort of a chucklefish_hax branch anyway where this hack could live, so I'm fine if it's just too gross to release publicly with that hack. I'm actually relieved really, because the Lua C API had always been in this "legal gray area" w.r.t. Rust. @nikomatsakis suggested changes regarding having the compiler help about Drop types being on the stack, and inter-operating with other unwinding mechanism obviously sound really great and helpful, but even just knowing that calling into the Lua C API should be supported at all is plenty for right now :D For the record, I think having a public API that relies on something like longjmp or C++ exceptions is really, exceptionally gross (pun intended), but I'm glad that Rust is attempting to deal with the unfortunate reality of it. I'm completely fine with it though if the finer details of it take a while to work out. |
This comment has been minimized.
This comment has been minimized.
|
@kyren |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis said:
For what it's worth, I think this space deserves a more full-fledged RFC. As @nikomatsakis pointed out, Helix has similar but not identical concerns, and it would be great to get a conversation about longjmps and "unwinding past the FFI boundary" into one place so we can work out the requirements and solve them holistically. |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton do you object to reverting this change while we sort it out? |
This comment has been minimized.
This comment has been minimized.
|
My take is that we should revert now but possibly bring the feature back with @alexcrichton's filtering suggestion -- although it's not entirely obvious to me that filtering is what end users will want, either. (But it's probably a decent default.) |
This comment has been minimized.
This comment has been minimized.
|
@kyren do you think (by any chance) it would be possible to make a standalone regression test? |
nikomatsakis
added a commit
to nikomatsakis/rust
that referenced
this issue
Feb 20, 2018
This comment has been minimized.
This comment has been minimized.
|
This is my interim plan. I have PR to revert on stable (#48378) and will prepare a similar PR for beta. Then I have a more expansive diff that reverts but allows one to opt back in to the wrapper by doing |
bors
added a commit
that referenced
this issue
Feb 27, 2018
This comment has been minimized.
This comment has been minimized.
As of rustc 1.25.0-beta.4 (62ad16c 2018-02-26), rlua tests pass on windows! The revert seems to have 100% fixed the problem.
I've been following the "saga" haha, and I know you're now working on #48572 now instead, but I'm hopeful that you can get that change into beta / stable since that seems to be the best of all worlds. This bug had a lot more tendrils than I expected, thank you guys so much for all the work you've been doing on this! |
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Feb 27, 2018
bors
added a commit
that referenced
this issue
Feb 27, 2018
sdroege
referenced this issue
Feb 28, 2018
Closed
Change `CallbackGuard` to require explicit destruction #250
This comment has been minimized.
This comment has been minimized.
FWIW, this requires reverting a bunch of changes to gtk-rs and related crates which are now running into UB with 1.24.1 due to this being reverted. |
This comment has been minimized.
This comment has been minimized.
Say more? Did they have their own panic guards which were removed? |
Manishearth
added a commit
to Manishearth/rust
that referenced
this issue
Feb 28, 2018
Manishearth
added a commit
to Manishearth/rust
that referenced
this issue
Feb 28, 2018
This comment has been minimized.
This comment has been minimized.
@nikomatsakis Yes, exactly that :) But fortunately not in any released version yet (unless there was a release of some crate outside of the gtk-rs organization that I'm not aware of). It was planned to release this really soon now though... |
This comment has been minimized.
This comment has been minimized.
|
I don't really know the state of it or whether it would be appropriate for a stable point release, but it would be nice if #48572 could be merged instead of a straight rollback, to avoid missing out on the feature for another rust stable version. I do feel a bit guilty for triggering the reverted behavior, not having to worry about panic unwinds at extern function boundaries is quite nice, even if I'm not currently relying on it. |
This comment has been minimized.
This comment has been minimized.
|
We plan to discuss this in the core team meeting tonight. Given #48572, we may forego the revert. @kyren, is it ok w/ you to wait until the next beta to have a fix? (I'm not sure how I feel about backporting #48572 to beta, but I guess it's ok -- backporting to stable seems too risky to me, personally.) |
This comment has been minimized.
This comment has been minimized.
|
@kyren Did you try @petrochenkov 's suggestion to change longjmp to __instrinsic_longjmp ? If so, did it work? |
This comment has been minimized.
This comment has been minimized.
So I finally got around to trying this and I can confirm that this DOES work, thank you very much for the suggestion! Since I basically have a fix for this, I'm not in a huge rush to have this change in the rust stable compiler, but this is only speaking for me. This issue probably causes problems for other people that have to deal with at least any of Lua or Ruby or maybe libpng, depending on how strict they are about handling errors properly or how deep into the APIs they go. I would LIKE it if the fix was in the next stable version of Rust, but even then it's not critical because like I said, I have a fix. The main thing I really needed from all of this was an assurance that longjmp APIs are at least supposed to be compatible with Rust in general, which just practically saves me a lot of heartache. If #48572 lands in nightly and the regression test for longjmp stays long term, but I have to wait two rust stable cycles to remove the now small rlua hack, that's certainly not the end of the world. The only way this would be a problem were if the fix for longjmping APIs ended up requiring an unstable feature that was a very long way off, or the hacky fix I have broke for some other reason in the interim, or if #48572 or similar was not merged, or if there was another decision that similarly changed direction so that I can't reliably call into a longjmp based API. If the fix did make it into beta though that would certainly feel a lot more "certain" to me and allow me to relax about it a lot more :D
I was typing this reply as I saw your comment :D |
kyren
referenced this issue
Feb 28, 2018
Closed
lua_pcall/lua_error invocations might be unsafe (all of them) #21
alexcrichton
added a commit
to alexcrichton/rust
that referenced
this issue
Feb 28, 2018
Manishearth
added a commit
to Manishearth/rust
that referenced
this issue
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
|
triage: leaving assigned to niko but it might be good for him to delegate to alex or someone else if there's someone else taking point on this now. |
Manishearth
added a commit
to Manishearth/rust
that referenced
this issue
Mar 1, 2018
This comment has been minimized.
This comment has been minimized.
1.24.1 was released with this reverted now. |
This comment has been minimized.
This comment has been minimized.
|
So, we talked about this in the core team meeting, and -- as you can see -- decided to revert after all. It was a bit of a tough call, but we felt like when in doubt, we ought to bias towards "back off and try again more carefully". The plan is to land @alexcrichton's change in #48572, which should enable us to re-enable the abort-guards by default. We are also keeping the explicit (I still think we ought to address the interop question in a more thorough way, however.) @sdroege I do apologize for whatever churn this caused gtk-rs and anyone else! |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis No problem, "git revert" is cheap and there was no release yet. All options were bad, let's just hope it sticks the next time :) |
This comment has been minimized.
This comment has been minimized.
|
Since the revert made it into 1.24.1, this is now fixed! Thanks for the great work, closing! |
kyren commentedFeb 16, 2018
•
edited
[The following explanation of the problem contains a lot of assumptions about the problem's nature, I could be extremely wrong! Take this explanation with a grain of salt]
Wild guess, it has something to do with #46833.
So, I'm not sure this is a rust bug exactly, but I wanted to make you aware of a crate regression that's causing me quite a lot of problems. Ultimately, I believe this comes from me doing some things that are not.. exactly definitely kosher, but that I don't have a convenient workaround for.
Ultimately, the problem stems from the Lua C API having error handling fundamentally based around setjmp / longjmp. In order to write a usable wrapper in Rust, without having to write inconvenient, slow C shims, Rust must, at times, call into Lua C API functions that in turn trigger a longjmp, and that longjmp must necessarily pass over Rust frames.
I realize how unsafe this is! However, there are some limitations here that I think make it at least somewhat reasonable to be able to do:
This last point is unfortunate, and causes me a lot of headaches, and at least one strange practical problem. We at Chucklefish build rlua for consoles, but for consoles the gcc crate does not work, and we have to build Lua out of tree (also to include Lua console specific patches). Needing to include C shims in rlua means that we would also have to copy paste those shims into the console projects, and this is a less than ideal situation.
The reason I think this issue has to do with #46833 is that I can actually fix the regression if I simply add an
#[unwind]attribute to each rust function that calls lua_error. However, this is only necessary for some reason on windows, and the attribute is currently unstable!I think it's completely reasonable to need to mark functions as
#[unwind]if they will trigger a longjmp, but I'm not exactly sure what to do in the meantime untilunwind_attributesbecomes stable. Maybe this behavior on windows IS simply a bug? Maybe it's a bug on linux / macos that this doesn't abort? I would especially not like to have to write C shims if ultimately I can get rid of them whenunwind_attributesbecomes stable.(Side Note: Chucklefish actually uses the stable rust compiler in production now, so this is especially annoying. We were very excited about the 1.24 release, because it includes incremental compilation and rustfmt-preview while also being stable, and we noticed this bug in trying to switch to the stable compiler on windows.)
Edit: If people have to dig into the
rluasource, I'm sorry, it is a tangled nest of "unsafe, unsafe everywhere". The best example to look at is thetest_errortest, and the methodLua::create_callback_function. Adding the#[unwind]attribute to:callback_call_impl,safe_pcall, andsafe_xpcallfixes that test on windows, on unstable.Edit 2: Edited for clarity, but also I need to point out that I need to do something, because this bug currently makes
rluajust basically broken on windows. It's not "just" a test failure, error handling in general will just abort.