Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[ffi_returns_twice] #2633

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 9, 2019

@gnzlbg gnzlbg force-pushed the gnzlbg:returns_twice branch from 8088b92 to edb3a68 Feb 9, 2019

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Feb 9, 2019

My initial thought is that I’m not sure from the RFC what the soundness and safety implications are, e.g.:

  • I feel like you should only be able to apply this attribute to unsafe fn and maybe only externs.
  • Is it sound to allow this on a function that has parameters with lifetimes?
  • Is it safe to invoke drop in a function with this attribute?
@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 9, 2019

  • I feel like you should only be able to apply this attribute to unsafe fn and maybe only externs.

This attribute, as proposed in this RFC and as implemented in the PR, can only be applied to extern functions. These are unsafe to call.

  • Is it sound to allow this on a function that has parameters with lifetimes?

C FFI is already unsound.

  • Is it safe to invoke drop in a function with this attribute?

This RFC does not extend Rust with the ability to write bare functions that return multiple times.


Stable Rust already allows you to just add an extern function that returns multiple times and call it. That's already possible. This RFC adds an attribute, #[ffi_returns_twice] that allows users to tell the compiler that these extern functions might return multiple times - that's it.

This RFC doesn't allow you to write your own in Rust nor tells you what these unsafe functions are / aren't allowed to do. That would be part of the UCG. The RFC mentions some types of UB that are easily introduced when writing code using these types of functions (use after move, deallocating memory without running destructors, etc.), but these things are always UB.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Feb 9, 2019

The RFC only mentions that it can be added to extern functions, but it does not explicitly prohibit use anywhere else (in fact, it doesn’t mention what happens for other types of functions at all). Perhaps that can be clarified?

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 9, 2019

The RFC only mentions that it can be added to extern functions, but it does not explicitly prohibit use anywhere else (in fact, it doesn’t mention what happens for other types of functions at all). Perhaps that can be clarified?

Uh, indeed. I meant functions in an extern { ... } block not fn "extern" .... I've replaced all uses of "extern" with "foreign", does that make it clear?

@gnzlbg gnzlbg force-pushed the gnzlbg:returns_twice branch from d1daf77 to c33392c Feb 9, 2019

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Feb 11, 2019

I thinks that’s a big improvement, but I think it would also be good to have a statement like “Adding this attribute to any other type of item is an error”.

@pyfisch

This comment has been minimized.

Copy link
Contributor

pyfisch commented Feb 12, 2019

This RFC adds a new function attribute, #[ffi_returns_twice], which indicates that an foreign function can return multiple times.

I find it confusing that the RFC always talks about returning multiple times but the attribute is called "twice".

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 12, 2019

I find it confusing that the RFC always talks about returning multiple times but the attribute is called "twice".

I do so too.

This attribute is called #[ffi_returns_twice] and only works in Rust FFI function declarations. Those need to match the C function declaration, and in C this attribute is called returns_twice for whatever reason (maybe returns_multiple_times was too long).

The only people that will need this attribute is those reading C code on one side, and writing the Rust FFI wrapper on the other. So keeping the name the same, even if its unfortunate, makes the lives of the only users that this attribute will have easier.

EDIT: the docs do say that it returns multiple times do, maybe it is worth it to call out why it is called returns_twice in the docs (C legacy).

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Feb 13, 2019

Since you have a couple of these ffi_* attributes, another alternative I'd like considered is having it be one attribute with different parameters #[ffi(returns_twice)], #[ffi(nounwind, readnone)], etc.

(I don't know if it's better, but if they interact with each other -- like IIRC you mentioned const and pure would -- it might be easier to check and read one attribute than multiple.)

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 14, 2019

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 14, 2019

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 14, 2019

Basically all I ask is that you make sure this

In the presence of types that implement Drop, usage of APIs that return multiple times requires extreme care to avoid deallocating memory without invoking constructors.

is known to everyone using setjmp/longjmp.

Though shall not jump over stack frames though do not own.

undefined behavior of the type "use-after-move".

In the presence of types that implement `Drop`, usage of APIs that return
multiple times requires extreme care to avoid deallocating memory without

This comment has been minimized.

@RalfJung

RalfJung Feb 14, 2019

Member

Might be worth saying "deallocating pinned memory" here. At least to my knowledge that is the only case where we really care in terms of type system guarantees.

This comment has been minimized.

@gnzlbg

gnzlbg Feb 18, 2019

Author Contributor

Isn't Pin just a normal Rust library ?

This comment has been minimized.

@RalfJung

RalfJung Feb 18, 2019

Member

It is. So what?

Unsound thread spawning and Rc are also both just normal Rust libraries, and yet having one makes the other unsound. The same goes for Pin and functions that let you deallocate memory without running drop.

bors added a commit to rust-lang/rust that referenced this pull request Feb 15, 2019

Auto merge of #58315 - gnzlbg:returns_twice, r=alexcrichton
Implement unstable ffi_return_twice attribute

This PR implements [RFC2633](rust-lang/rfcs#2633)

r? @eddyb
@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 16, 2019

What does this attribute actually do in LLVM?
I would've assumed it only disables certain optimizations, but maybe I'm wrong?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Feb 16, 2019

@eddyb In GCC at least, it's described as:

The returns_twice attribute tells the compiler that a function may return more than one time. The compiler will ensure that all registers are dead before calling such a function and will emit a warning about the variables that may be clobbered after the second return from the function. Examples of such functions are setjmp and vfork.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 16, 2019

Then the implementation is misleadingly incomplete, since we should probably have the frontend diagnostics gcc (and presumably clang) have, regarding this, and possibly more.

The "dead registers" part can be emulated with the right asm! invocation AFAIK (see https://github.com/edef1c/libfringe for something similar to setjmp/longjmp but the control-flow is redirected in a cooperative multitasking way, so no double-returns), so I doubt backends would have more trouble with it than asm! itself.

@comex

This comment has been minimized.

Copy link

comex commented Feb 16, 2019

@eddyb I tracked that down in an earlier thread:
#2625 (comment)

In particular, LLVM (and thus Clang) actually does not do the dead-registers thing; the effects are more subtle.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 16, 2019

Thanks, that's pretty comprehensive.
I'm not sure "returns twice" is even the right way to describe this, it seems everything about it is designed around "capturing state".

Wouldn't you need all those effects even if you had a function which, after storing enough state, never returned the first time, but rather switched to some other execution (with a different stack pointer), to come back later and "returned" via longjmp?

If not, why? It's still a bit unclear to me why LLVM goes to such lengths (is it working around platform bugs/limitations?).

If yes, then maybe this could be made more general than "returns twice", and serve more purposes.

cc @sunfish @edef1c @Amanieu

@comex

This comment has been minimized.

Copy link

comex commented Feb 16, 2019

Wouldn't you need all those effects even if you had a function which, after storing enough state, never returned the first time, but rather switched to some other execution (with a different stack pointer), to come back later and "returned" via longjmp?

Mostly no, one yes.

No:

  • Stack slot coloring (and wasm local coloring): has to do with the current function's stack frame. It would normally allow a stack slot used for one variable at the time of setjmp to be reused for another variable later in the function if, based on the function's control flow, the original variable is dead. This would be a problem if you longjmp back and then try to access the original variable.

  • Tail call optimization: also has to do with the stack frame. If you do a tail call, the caller's stack frame will be clobbered in its entirety by the callee :)

  • Inlining: not a correctness issue AFAIK. Rather, since the above two pessimizations apply to the entire function that contains a call to setjmp, avoid inlining it into another function since that would pessimize that function as well.

  • For the SPARC-specific check: well, I looked into it a bit more, and you probably don't care about the details, but TL;DR: on SPARC, after longjmp, registers will be restored to the most recent values they had within the setjmp-ing function, not the values they had at the time of setjmp. This is the same behavior as stack variables. But whereas reusing stack slots for unrelated things in a function is unusual (see above), reusing registers is necessary since there's only a fixed number of them; therefore the check just avoids saving data in registers across the call.

  • ASAN: supposedly because it introduces variables which might be modified between setjmp and longjmp (I think the check is wrong)

Yes:

  • The X86SpeculativeLoadHardening check is really about the fact that the "return" from longjmp may not be an actual ret instruction. It should probably be fixed to also apply to coroutine-switching functions such as Unix swapcontext and Windows SwitchToFiber. (X86SpeculativeLoadHardening is rather new code, being a Spectre mitigation.)
@comex

This comment has been minimized.

Copy link

comex commented Feb 16, 2019

Postgres is not the only API based on this, I know Lua does this as well -- though I wonder why they don't seem to need this attribute?

Lua provides a function lua_pcall ("protected call") which encapsulates the setjmp for you. I looked at rlua and hlua and they both use this. Interestingly, rlua also uses the non-protected lua_call in some circumstances, which can longjmp to an outer stack frame, killing the Rust stack frame in the middle. That's a bit spooky, but it seems to be careful about it (not having variables with destructors in those functions)... anyway, that's not relevant to ffi_returns_twice.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 16, 2019

@comex So all the pessimisations are all about reloading a saved state knowing it couldn't have been invalidated by anything "downstack" of the call that saved said state?

Which would mean "returns twice" is really "can return more than once", right? Other than for vfork I guess.

Couldn't a hypothetical setjmp take a closure instead of returning, making it a lot more like catch_unwind?
In Rust, that would mean jmp_buf can be passed to the closure by reference, ensuring it cannot be used outside of the scope it's valid for.

And longjmp could have semantics similar to unwinding, except without destructors running.

@jeff-davis

This comment has been minimized.

Copy link

jeff-davis commented Feb 17, 2019

For my use case -- catching longjmps from C code -- a setjmp with semantics like catch_unwind sounds like it would also work. I might have to see some more details though to make sure I can set up the C globals properly to interface with the C code to longjmp to the right place.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 17, 2019

Which would mean "returns twice" is really "can return more than once", right?

Correct (see the RFC).

Couldn't a hypothetical setjmp take a closure instead of returning, making it a lot more like catch_unwind?

I haven't tried but I believe you could implement similar APIs on top of setjmp, sigsetjmp, etc. I don't know if you could make them safe, you'll need a way to prevent closures not only from moving Drop types in, but also from using Drop types at all.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 17, 2019

@gnzlbg Disallowing the closure having a Drop can be done with bounds on the closure itself (e.g. F: Copy, although that's a bit too restrictive), but restricting what the code inside does it outside the scope of Rust right now.

You'd need "no unwind cleanup in any code reachable from here" which is just as hard as reentrance restrictions (you can make a safe Cell::with_mut with those, they're really powerful inter-procedural analyses to expose via traits).

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 18, 2019

but restricting what the code inside does it outside the scope of Rust right now.

Yeah, I don't know if writing a catch_unwind-like API is worth it. It would restrict and contain the scope in which users need to make sure that no destructors are skipped, which adds some value, but within those scopes this would still need to be checked manually, so it wouldn't just allow you to rely on the compiler for catching issues with destructors.

There are also a whole lot of things that can trigger undefined behavior while performing a longjmp and that one has to keep in mind: not performing volatile reads / writes to memory that's modified between the setjmp and the longjmp (across multiple functions, e.g., if these functions get inlined), calling longjmp from a different thread (e.g. spawning a thread down the stack and calling longjmp there, or creating a task and sending it to an executor and trying to do longjmp from there), you need to be aware of signals, etc.

One could experiment with creating APIs that improve the amount of care required for using these correctly, but for any experiment to be useful, you need to be able to use this at all.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Feb 18, 2019

Yeah I was only interested in mitigating the "returns multiple times" and "can't longjmp after exiting the stack frame setjmp was called in" aspects, although you'd probably still need a way to tell LLVM about the state saving and potential downstack restore.

I wasn't thinking about destructors at all.

Centril added a commit to Centril/rust that referenced this pull request Feb 23, 2019

Rollup merge of rust-lang#58315 - gnzlbg:returns_twice, r=alexcrichton
Implement unstable ffi_return_twice attribute

This PR implements [RFC2633](rust-lang/rfcs#2633)

r? @eddyb

bors added a commit to rust-lang/rust that referenced this pull request Feb 24, 2019

Auto merge of #58315 - gnzlbg:returns_twice, r=alexcrichton
Implement unstable ffi_return_twice attribute

This PR implements [RFC2633](rust-lang/rfcs#2633)

r? @eddyb
@kornelski

This comment has been minimized.

Copy link
Contributor

kornelski commented Feb 27, 2019

If it was followed by support in the borrow checker, it would add extra baggage to an already very complex and critical component of the language. Extra complexity in the borrow checker for such niche feature is IMHO not acceptable.

OTOH without support in the borrow checker, these functions are inherently dangerous to use. It's questionable why to even try to use them in Rust when all Rust adds is more footguns. Calling an intermediate C function to access these two unusual functions seems perfectly fine, and it even makes the code safer and more robust, since in the context of C functions returning twice is easier to reason about.

For emulating exceptions (such as aforementioned Postgresql, as well as libjpeg and libpng), catch_unwind + panic!() is a much much better solution.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 27, 2019

If it was followed by support in the borrow checker, it would add extra baggage to an already very complex and critical component of the language. Extra complexity in the borrow checker for such niche feature is IMHO not acceptable.

I'd guess its a good thing nobody is proposing that.

OTOH without support in the borrow checker, these functions are inherently dangerous to use

So is all unsafe code that we can't prove correct.

@gnzlbg

This comment has been minimized.

Copy link
Contributor Author

gnzlbg commented Feb 27, 2019

Note that #[returns_twice] only allows a function to return twice, nothing more, nothing less. vfork is such a function: the process gets cloned, so you end up with two processes, and the vfork function returns a different value in each process (so it returns twice). AFAICT this isn't really that dangerous at all.

For emulating exceptions [...] Calling an intermediate C function to access these two unusual functions [...]

This RFC does not propose adding setjmp to the language, only adding an attribute to specify that some functions might return multiple times (setjmp is one of those functions, but there are others).

@Amanieu

This comment has been minimized.

Copy link
Contributor

Amanieu commented Feb 27, 2019

vfork is such a function: the process gets cloned, so you end up with two processes, and the vfork function returns a different value in each process (so it returns twice). AFAICT this isn't really that dangerous at all.

Actually vfork acts exactly like setjmp from the point of view of the language. The fact that you are missing is that vfork doesn't really create a new process: it creates a "thread" in the current process (sharing the same address space) and runs it on the same stack as the original thread.

Since another thread is using its stack, the original thread is suspended until the vfork child has either exited or called execve. Once that happens, the effects are just as if a longjmp has been performed back to the vfork call.

Incidentally this is also why, like with setjmp, it is UB for a vfork child to return from the function that called vfork.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Feb 28, 2019

Also, note that even fork is dangerous in some sense.

@jeff-davis

This comment has been minimized.

Copy link

jeff-davis commented Mar 1, 2019

@kornelski I don't understand the point about catching a postgresql exception using catch_unwind + panic(). Are you saying that catch_unwind is guaranteed to catch a longjmp regardless of the argument to longjmp? Is that documented somewhere?

@kornelski

This comment has been minimized.

Copy link
Contributor

kornelski commented Mar 1, 2019

@jeff-davis No, never! You can't mix the two, and I would never expect them to co-operate in any way.

But you can nicely replace jumping altogether. In case of libjpeg you set your error callback to extern "C" handler() {panic!()} and then the rust code can catch it. The libjpeg doesn't mind that, since it's not longjmp-specific.

@jeff-davis

This comment has been minimized.

Copy link

jeff-davis commented Mar 2, 2019

@kornelski OK. My use case is working with postgres, and its error reporting / exception facility is based on longjmp. Even if I could replace it, I don't think panic!() would be appropriate, because sometimes those exceptions must be caught in C.

So I would really like some form of setjmp in rust, even if it's dangerous and must be used in very specific ways. #[ffi_returns_twice] seems like a reasonable solution, but my knowledge of rust internals is limited.

@eaglgenes101

This comment has been minimized.

Copy link

eaglgenes101 commented Mar 12, 2019

I was the one that personally suggested augmenting the borrow checker to account for multiple returns (to support library implementations of continuation mechanisms). I still have getting that through as a long-term goal, but if making it initially ffi-only and without borrow checker additions is the plan, well then.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 13, 2019

@kornelski Do you mean with some #[unwind] attributes on the Rust side and compiling the C code with... -fexceptions I guess?

Otherwise unwinding is UB.

@kornelski

This comment has been minimized.

Copy link
Contributor

kornelski commented Mar 13, 2019

@eddyb Yes, that's what I mean. I realize it's not fully baked in Rust yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.