-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for setjmp / longjmp #2625
Comments
Something like this would add zero checks over what C offers, moving the responsibility for solving all these issues to the user. Particularly, Drop values moved in one branch need to be manually MVPAdd pub type JmpBuf = <implementation-defined>;
/// Saves the current execution context into `env`. If `longjmp(env, val)` is used
/// to restore execution at this call site, this function returns `val`. Otherwise,
/// it returns `0`.
#[rustc_returns_twice]
pub unsafe fn setjmp(env: &mut JmpBuf) -> c_int;
/// Restores execution at the `setjmp` call-site that stored `env`, returning
/// there the `value` (if `value == 0`, then `setjmp` returns `1`).
///
/// If the function that called `setjmp` exits before the call to `longjmp`, the behavior is
/// undefined.
///
/// All variables for which `drop` would be called if `longjmp` was to be replaced with
/// `panic!` and `setjmp` with `catch_unwind` are not dropped.
pub unsafe fn longjmp(env: &JmpBuf, value: c_int); The invocation of
If Upon return to the scope of |
Ultimately, what I am hoping to achieve is something like: Which would call It would also be nice to have: The MVP suggested above allows such a macro to be written, so I am in favor of that approach (at least until we find something better). There may be opportunity for something higher-level than the MVP that can be more safely exposed to the user and more easily documented (and leave rust more free to change the internals). The challenge is that, even if this is the only use case to support, we need some knowledge about how the FFI function might call My use case is postgres, to do something like
All of this depends on So, if we want to make something higher-level in rust, it would need a way to specify what this magic global variable is. |
It's not that easy. For example, Moreover, In other words, the "observational equivalence" argument does not apply in this case, |
To clarify, what I meant by "is not unsound" is that performing a I did not meant to say that it is not unsound in the sense that safe abstractions around unsafe code written without considering setjmp/longjmp remain safe - as you mention, they don't.
That's a good point. |
@RalfJung I think C++'s semantic might be what we need:
We might just need to replace "non-trivial destructor" with something else, and also |
C's As for From what I can tell, this is a (probably horribly incomplete) list of what we must make undefined to permit all of Rust's existing safety invariants to continue to hold up:
Presumably a |
FWIW, here is the macro I am currently using, which seems like a valid use of
It is intended to wrap a FFI function call which, when evaluated, may call
(In the above example, I looked at the optimized IR, and it looks fine to me, but someone who understands this better might see a problem:
This usage is quite constrained, but it serves my purposes. Because the wrapper function is always almost identical (just the FFI call expression changes), and because it's marked Note that
but it should have the |
I think this should work as long as we use the |
In practice,
At least the AddressSanitizer, X86SpeculativeLoadHardening, and SPARC ones seem potentially relevant even to a simple function like |
The path of least resistance would be to add a |
fn bar(_a: A) { println!("a moved") }
fn foo() {
let mut buf: jmp_buf = [0; 8];
let a = A;
if unsafe { setjmp(&mut buf) } != 0 {
bar(a); // error[EO382]: use of moved value: `a`
return;
}
bar(a); // Note: `a` was previously moved here
// Note: `setjmp` may return multiple times
unsafe { longjmp(&mut buf, 1) };
} |
@eaglgenes101 I don't think it can work like that, at least if the lint is only based on fn bar(_a: A) { println!("a moved") }
fn foo() {
let mut buf: jmp_buf = [0; 8];
let a = A;
if unsafe { setjmp(&mut buf) } == 1 {
bar(a);
return;
} else {
bar(a);
}
} At best we could error stating that different branches depending on a |
Why is it important for Rust to support Strawman: Document in the |
@briansmith some C libraries emulate exceptions using |
Some C++ libraries use C++ exceptions too, but Rust doesn't support them either, even though they're much more reasonable than Of course it's nice to support Postgres but I think we should try to find ways of doing it that don't involve adding |
Rust has C FFI, AFAIK it does not have C++ FFI (yet). These are C functions available on all systems with a C standard library and are easily callable via C FFI. If we want Rust to be able to interoperate with all correct and potentially legacy C code out there via its C FFI, we need to support them in some form. There is precedence of supporting bad and dangerous C APIs in Rust C FFI for the purpose of being able to interface with C code that uses them (e.g.
I would be interested in better ways of interfacing with Postgres, and I think @jeff-davis would be interested too. My first suggestion was for @jeff-davis to write a C wrapper that performs the |
I don't think that is a goal, and I don't think it should be, especially in cases where one can work around the issues by writing a little bit of C that wraps the other C code and resolves the issue. I don't know about |
I would argue it really doesn't have a lot of impact. One might add some extra cases to the borrow checker, but other than that it's fairly similar to simple exception handling, something that Rust already has. On the other hand, would it be possible to make it |
So currently all rustc internal attributes start with
That's an open question. Is |
Here's what compcert's documentation has to say about
|
GCC, for one, supports a |
@briansmith A lot of languages work well with C in the easy cases. I love rust because it works well with C in the hard cases. I would really like to allow pure rust extensions to be written for postgres. Introducing C creates a lot of headaches that I'd rather not deal with. |
@comex
If it were only Does anybody know why it is called
But we probably want to keep the same name as C since that is what people using these interfaces will probably search for. |
Unrelated question, the GCC docs for
Since C++17, the signature of [[noreturn]] void longjmp( std::jmp_buf env, int status ); so in Rust |
Yes, it's legal in C to
I believe so. |
I have seen a fair amount of interest in my project (postgres-extension.rs), including multiple conference talks and internal presentations to several groups at my place of employment. There are also similar projects, like pg-extend.rs. Unfortunately, it's hard to make the case that these projects actually work without a working Any suggestions? |
What that commit is doing is the safe bet, because right now the situation is unclear. If I were to do this right now, I would just write a tool that auto-generates a C shims that automatically insert I think that the best way to attack this problem would be an RFC to add In the meantime, one can play with |
I don't see why we'd want to make |
Currently the behavior of jumping over Rust code is undefined in all cases, and there is no RFC that changes that in any way. There is an RFC for a |
I don't need |
Still, something like this: let b = setjmp(&mut buf);
c_ffi_fn(&mut buf); has a jump in Rust code if The jump is minimal, but it is still there =/ |
What if we just defined very tight restrictions for the use of a For any function that calls a
In other words, I could live with even the most sweeping restrictions if they help avoid the problems that you are worried about. The pattern for something like |
I'm just pointing out that somebody would need to write an RFC explaining what it means to jump over at least some subset of Rust code, however minimal that is, and push it through the process. Like even if we were only to allow this pattern: setjmp(&mut buf);
c_ffi_fn(&mut buf, some_arg); if setjmp(&mut buf);
{
let _tmp = copy some_arg;
c_ffi_fn(&mut buf, _tmp);
} such that We have explored here a lot of things that would be ok, and a lot of things that would not. So there is definitely enough material for someone to try to write down such an RFC. We don't have a WG FFI yet, but such work could be part of it, which could help push it through the process (cc @joshtriplett ). |
Thank you. That sounds like a good way to move forward. |
Just checking in to see if any off-thread progress has been made? I'm happy to help - just not sure what that help would be at this stage ... |
I don't know how useful this feedback is, but I use setjmp() regularly in nasty kernel selftests that I've written, and I don't actually want the returns-twice behavior. For example, I use it to exit a block of code from a signal handler. The fact that The feature I would actually want is a lot closer to
The sematics are that, if no one calls Some benefits over And I think this would give me all of the functionality I have ever wanted from |
IIRC, that's all that's needed when trying to use libpng, as long as you can choose where the |
You can implement |
I assume |
…ngjmp` `pthread_exit` can't unwind through `catch_unwind` anymore because of [1]. Because the actual `longjmp` isn't supported by Rust at this time and [3] is tricky to set up, this commit implements something similar using inline assembler. [1]: rust-lang/rust#70212 [2]: rust-lang/rfcs#2625 [3]: https://github.com/jeff-davis/setjmp.rs
…ngjmp` `pthread_exit` can't unwind through `catch_unwind` anymore because of [1]. Because the actual `longjmp` [2] isn't supported by Rust at this time and [3] is tricky to set up, this commit implements something similar using inline assembler. [1]: rust-lang/rust#70212 [2]: rust-lang/rfcs#2625 [3]: https://github.com/jeff-davis/setjmp.rs
just some questions (perhaps rust internal is a better place? But here is already an open issue.) The example translated from C is very promising and it works: But considering that the library relies heavily on the operations in the manner of longjmp, I am afraid that such operations may easily break up rust runtime. |
Was there ever any resolution to this. In R, for example, errors are thrown using setjmp/longjmp and we would like a way to catch Currently, I am using a panic as a poor cousin alternative but panics from C-called functions @amluto, @Amanieu I like the |
The question is, where should Ideally it should be in A final possibility would be for this to belong in a third party crate which wraps the |
Well, |
Maybe |
Not exactly: Then there's also |
hmm, maybe instead there could be a macro kinda like this: // in core/std
macro_rules! returns_twice_wrapper {
($vis:vis extern $abi:literal unsafe fn $id:ident($($args:ident: $arg_tys:ty),*) -> $ret:ty;) => {
$vis unsafe fn $id<R>(f: impl Fn($ret) -> R, $($args: $arg_tys),*) -> R {
extern $abi {
#[returns_twice]
fn $id($($args: $arg_tys),*) -> $ret;
}
let v = $id($($args),*);
f(v)
}
};
} use like: // in libc
returns_twice_wrapper! {
pub extern "C-unwind" unsafe fn setjmp(v: *mut jmp_buf) -> c_int;
}
// in user code
fn test_setjmp() {
unsafe {
let mut jb: MaybeUninit<jmp_buf> = MaybeUninit::uninit();
let jb = &mut jb as *mut _;
setjmp(|v: c_int| {
if c == 0 {
println!("between setjmp and longjmp");
longjmp(jb, 1);
} else {
println!("after longjmp");
}
}, jb)
}
} |
Summary: We have seen edenfs crashing with SIGBUS in production. They are caused by btrfs checksum errors when reading mmap buffers. This diff adds a SIGBUS handler that zero-fills the page causing SIGBUS on demand as an attempt to avoid SIGBUS crash. The idea is that the zero-filled page will fail the indexedlog checksum somehow so it will become a (seemingly regular) error, and might trigger re-tries (ex. re-fetch via network) to transparently fix the issues without user intervention. Considerations and discoveries: - `setjmp` error handling: hard to verify soundness in Rust. See rust-lang/rfcs#2625 - Simply `return` from the signal handler: The SIGBUS will happen again and the program enters an infinite loop trying to read the bad data. - `mmap` support to return zero pages without SIGBUS: There were some interests and attempts in the Linux kernel community. However, nothing usable is merged yet. This signal handler assumes the SIGBUS behavior stay consistent during the process lifetime. Namely, for previously returned `Ok` data, if read again later from the same process, it should remain `Ok` not `SIGBUS`. It's okay to get `Ok` consistently, and after a hard reboot, only get `SIGBUS`. While practically this assumption seems okay to make (and the use of `mmap` relies on the assumption that the returned `Ok` data remain unchanged), it is hard to predicate all corner cases, and the code is optimized for easy-to-understand rather than "async signal safety". Therefore, the feature is default off for now. Reviewed By: zzl0 Differential Revision: D51623839 fbshipit-source-id: 506b676ee1c6d4075a02417622adb0d3477c7a7a
Motivation
rust-lang/libc#1216 proposes adding
setjmp
andlongjmp
tolibc
. These functions are required to interface with some C libraries, but sadly these functions are currently impossible to use correctly. Users deserve a solution that works and warns or prevents common pitfalls.Issues with current solution
The first issue is that
libc
cannot add LLVMreturns_twice
attribute tosetjmp
:Basically, LLVM assumes that
setjmp
can return only once without this attribute, potentially leading to miscompilations (playground):Because
setjmp
is notreturns_twice
, LLVM assumes that thereturn x;
will only be reached beforex = 13
, so it will always return42
. However,setjmp
returns0
the first time, and returns1
when jumped into it from thelongjmp
.Using a volatile load instead works around this issue (e.g. playground). Basically, if stack variables are modified after a
setjmp
, all reads and writes until all possiblelongjmp
s will probably need to be volatile.One way to improve this could be to add an stable attribute
#[returns_twice]
that users can use to mark thatextern "C" { ... }
functions can return multiple times and for Rust to handle these functions specially (e.g. by emitting the corresponding LLVM attribute). An alternative would be for Rust to provide these functions as part of the language.Modulo LLVM-level misoptimizations, C only allows
setjmp
in specific "contexts" [0], and these features interact badly with languages with destructors. Rust does not guarantee that destructors will run (e.g.mem::forget
is safe), so skipping destructors usinglongjmp
is not unsound [1]. However,unsafe
code will need to take into account that code outside it can uselongjmp
to skip destructors when creating safe abstractions (e.g. see Observational equivalency and unsafe code).More worrying is how
longjmp
subverts the borrow checker to, e.g., produce undefined behavior of the form use-after-move without triggering a type error (playground):This prints "a moved" twice, which means that the variable
a
was moved from twice, so the second time an use-after-move happened which type-checked. Obviously,longjmp
is not the only way to achieve this in Rust, e.g. it is trivial to use the pointer methods to do so as well, butlongjmp
combined withDrop
types makes this happen with no effort (playground):That is, using
setjmp+longjmp
to create double-drops (undefined behavior) is trivial.Finally, there are problems with creating wrappers around these functions (playground):
Prints
b = 666
anda = 0
. The problem here is that this code saves the stack pointer insidefoo
, but thenfoo
returns, and afterwards thelongjmp
jumps to a stack frame that is no longer live, sodbg!(a)
readsa
after it has been free'd (use-after-free).There are probably many other problems with these two functions, that does not mean that they are impossible to use correctly. Still, it would be good to have a solution here that at least warns about potentially incorrect usages since reasoning about these and the surrounding unsafe code is often very tricky.
It would also be good to have a way to soundly model these in miri and detect when they are used incorrectly.
At a minimum we should be able to write down documentation for these functions in Rust. Where exactly can they be used, what does the unsafe code surrounding them need to uphold to be correct, etc.
cc @nikomatsakis (wrote blog post about observational equivalence and unsafe code), @rkruppe , @RalfJung @ubsan - the unsafe code guidelines should probably say whether extern "C" functions are allowed to modify the stack pointer etc. like setjmp/longjmp do.
setjmp
:While C11 7.13.1.1p5 states:
That is,
let x = setjmp(...);
would be UB in C. In Rust having a result of a function be usable only in some contexts would be weird.longjmp
is undefined behavior, e.g., [support.runtime] states:The text was updated successfully, but these errors were encountered: