New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support defining C-compatible variadic functions in Rust #2137

Merged
merged 25 commits into from Sep 29, 2017

Conversation

@joshtriplett
Copy link
Member

joshtriplett commented Sep 3, 2017

Support defining C-compatible variadic functions in Rust, via new intrinsics.
Rust currently supports declaring external variadic functions and calling them
from unsafe code, but does not support writing such functions directly in Rust.
Adding such support will allow Rust to replace a larger variety of C libraries,
avoid requiring C stubs and error-prone reimplementation of platform-specific
code, improve incremental translation of C codebases to Rust, and allow
implementation of variadic callbacks.

This RFC does not propose an interface intended for native Rust code to pass
variable numbers of arguments to a native Rust function, nor an interface that
provides any kind of type safety. This proposal exists primarily to allow Rust
to provide interfaces callable from C code.

Rendered

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 3, 2017

A bit of background: this was inspired by a co-worker who wanted to define a debugging library to intercept certain C calls, for use with $LD_PRELOAD. He wanted to try writing that library in Rust. However, some of the calls he wanted to intercept included variadic functions; that was the only thing stopping him from writing that code in Rust, and forcing him to write it in C instead.

@comex

This comment has been minimized.

Copy link

comex commented Sep 3, 2017

/// Obtain the variable arguments of the current function. Produces a
/// compile-time error if called from a non-variadic function. The compiler
/// will supply the appropriate lifetime when called, and prevent that
/// lifetime from outliving the variadic function.

Having the compiler enforce special lifetime rules for one function sounds really dubious. Either don't enforce lifetimes or make it work with the normal rules - either with a API that takes impl for<'a> FnOnce(VaList<'a>), or with the …args option. (There's no rule that the extra argument has to be a VaList itself, as opposed to something that can be turned into one.)

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 3, 2017

@comex I didn't intend it to be a "special lifetime rule", so much as the compiler simply defining and passing the appropriate lifetime.

I don't really think an API based around calling a closure would work very smoothly, but I'd be open to switching to the ...args option. And having that be an object that can produce a VaList (borrowed from that object), rather than the VaList itself, seems plausible. I might experiment with that and see how it turns out. Suggestions for the name of such a type?

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 3, 2017

I'd also appreciate feedback from appropriate compiler folks, regarding what solution might make the most sense from a compiler-internals point of view, in terms of preventing the type from outliving the variadic function associated with it.

Cc @arielb1 @eddyb @petrochenkov

@canndrew

This comment has been minimized.

Copy link
Contributor

canndrew commented Sep 3, 2017

Either don't enforce lifetimes or make it work with the normal rules

People have talked about introducing a special 'fn lifetime which represents the lifetime of the current function. This could be a good usecase for that.

ie.

impl<'a> VaList<'a> {
    pub unsafe fn start() -> VaList<'fn>
}
impl<'a> Drop for VaList<'a>;
/// The type of arguments extractable from VaList
trait VaArg;

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

This could be an unsafe trait?

This comment has been minimized.

@joshtriplett
the raw C types corresponding to the Rust integer and float types above:
```rust
impl VaArg for c_char;

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

Unless these type aliases became concrete types, I don't think the std and libc crates need to provide any additional impls.

This comment has been minimized.

@joshtriplett
let args = VaList::start();
let x: u8 = args::arg();
let y: u16 = args::arg();
let z: u32 = args::arg();

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

You mean

let mut args = VaList::start();
let x: u8 = args.arg();
let y: u16 = args.arg();
let z: u32 = args.arg();

?

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

Good catch, thank you.

```rust
let closure = |arg, arg2, ...| {
// implementation
};

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

You can't cast a closure to extern "C" fn, so I don't think supporting variadic closure is necessary in this RFC.

fn main() {
    unsafe {
        let _x = (|| {}) as extern fn(); //~ ERROR E0605
    }
}

This comment has been minimized.

@jethrogb

jethrogb Sep 3, 2017

Contributor

Plus, I think extern "C" variadic functions should always be unsafe functions, and closures currently can't be unsafe.

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

I didn't realize this; I'd assumed that RFC 1558 made this possible, but digging through it, it doesn't allow coercing to extern fn, only to fn. I filed rust-lang/rust#44291 to allow coercing to extern fn, but in the meantime, I'll drop this for now and move it to an open for the future. Would have been nice for more convenient callbacks.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 3, 2017

The only safe way that comes to mind is requiring the function take a VaList<'a> lang item type and not use the same lifetime passed to it (anonymous or not, elision would still see it) anywhere else in the signature, and we synthesize the necessary intrinsic calls to create the VaList.

EDIT: there would need to be a way to mark the function as variadic, args: VaList... maybe?

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Sep 3, 2017

The RFC says that only libc would be allowed to provide impls of VaArg. Does this mean that it is an unstable intrinsic or language feature? Or does it mean that libc becomes explicitly magical among creates?

joshtriplett added some commits Sep 3, 2017

Drop variadic closures
Rust doesn't support passing a closure as an `extern "C" fn`, so these
wouldn't serve any purpose yet.
@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 3, 2017

@mark-i-m Fixed now, since libc's types are just type aliases.

joshtriplett added some commits Sep 3, 2017

Drop VaList::start; name the VaList argument instead, to improve life…
…times

This allows the compiler to pass an appropriate lifetime without as much
magic.
@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Sep 3, 2017

@eddyb OK, based on your comment, I've tried reworking the RFC so that it accepts a VaList<'a> as an argument, with the syntax args: ... (not mentioning the type or its lifetime, so you can't reuse it elsewhere in the signature); the compiler can then pass an appropriate lifetime that prevents the VaList from outliving the variadic function. Does that look reasonable to you?

use std::intrinsics::VaArg;
#[no_mangle]
pub unsafe extern "C" fn func(fixed: u32, args: ...) {

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

This should be mut args: ... then?

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

Good call! Fixed.

Such a declaration looks like this:

```rust
pub unsafe extern "C" fn func(arg: T, arg2: T2, args: ...) {

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

For symmetry, I think we should also allow

extern {
    fn func(arg: u8, arg2: u16, args: ...);
//                              ^~~~~
}

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

Ah, for external variadic functions with the current mechanism?

I'm not sure that makes sense to add, since the argument isn't named in C. What would be the advantage of this?

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

@joshtriplett Just for consistency, the args: has no meaning here 😄. This is really outside of the scope of this RFC, feel free to ignore it.

Note that extracting an argument from a `VaList` follows the platform-specific
rules for argument passing and promotion. In particular, many platforms promote
any argument smaller than a C `int` to an `int`. On such platforms, extracting

This comment has been minimized.

@ubsan

ubsan Sep 3, 2017

Contributor

This is true for all platforms, it's required by the C standard.

If the argument has integral or enumeration type that is subject to the integral promotions, or a floating-point type that is subject to the floating-point promotion, the value of the argument is converted to the promoted type before the call.

Basically, any integral type smaller than an int is converted to int (and any floating point type smaller than a double (basically float or short float, for implementations that provide the latter, is converted to double)).

Side note, this paragraph points out something important:

After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed.

emphasis on class type and enumeration - this means that it should be valid for programs to impl VaArg for their #[repr(C|uN|iN)] structure and simple enumeration types.

I'd also argue that VaArg should be implemented for &T, &mut T, Option<&T>, and Option<&mut T>. This is all unsafe anyways :)

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

I'll phrase the bit about promotions more carefully, and put some thought into the structure and enum cases.

I'd prefer not to implement VaArg for references; there are plenty of ways to convert a pointer to a reference in unsafe code, if you really want a reference, and those ways allow you to inject the appropriate lifetime of that reference more easily.

This comment has been minimized.

@ubsan

ubsan Sep 4, 2017

Contributor

I prefer not treating references specially - they are just pointers after all. And it just makes sense sometimes - if you're implementing printf:

#[no_mangle]
unsafe extern "C" printf(fmt: &CStr, args: ...) -> c_int {
  for ch in fmt {
    // ...
    // found a %s:
    {
      print!("{}", args.next::<&CStr>());
    }
  }
}

This comment has been minimized.

@jethrogb

jethrogb Sep 4, 2017

Contributor

Except &CStr is a fat pointer.

This comment has been minimized.

@ubsan

ubsan Sep 4, 2017

Contributor

@jethrogb for now ;)

This comment has been minimized.

@joshtriplett

joshtriplett Sep 5, 2017

Member

@ubsan If, at some point in the future, there's a reasonable possibility of extracting a reference directly from a variadic function argument and getting a reasonable result, then we can add such a mechanism at that time. Until then, though, I'd like to stick with only allowing raw pointers.

This comment has been minimized.

@joshtriplett

joshtriplett Sep 5, 2017

Member

@ubsan I clarified the RFC's details about promotion to take your explanation into account, and to avoid suggesting "platform-specific".

This comment has been minimized.

@ubsan

ubsan Sep 6, 2017

Contributor

@joshtriplett I disagree with the reasoning there. There is no reason not to allow references. You almost never actually want raw pointers when dealing with C; you want Option<&T> (and given that references are valid in C functions...)

impl<'a> VaList<'a> {
/// Extract the next argument from the argument list.
pub unsafe fn arg<T: VaArg>(&mut self) -> T;

This comment has been minimized.

@ubsan

ubsan Sep 3, 2017

Contributor

I don't like this name - I'd prefer next.

This comment has been minimized.

@cramertj

cramertj Sep 3, 2017

Member

bikeshed: next_unchecked?

This comment has been minimized.

@mark-i-m

mark-i-m Sep 3, 2017

Contributor

I think arg was chosen to match the C name: va_arg... If we are going to deviate from this, we could change the interface altogether...

fn my_vararg_fn(args: VaList<c_int>) {...}

This comment has been minimized.

@ubsan

ubsan Sep 3, 2017

Contributor

but you can pass any type through variable arguments, not just one type.

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

This was chosen to match va_arg, yes. I intentionally didn't use next, because I don't want to give the impression that this works anything like an iterator, rather than a razor-sharp ball of unsafety.

impl<'a> Drop for VaList<'a>;
/// The type of arguments extractable from VaList
unsafe trait VaArg;

This comment has been minimized.

@ubsan

ubsan Sep 3, 2017

Contributor

what's the point of the VaArg trait anyways? No Rust types should be invalid to pass/take in variadic functions (although perhaps non-#[repr(C)] types shouldn't be valid? there's no technical reason though).

This comment has been minimized.

@cramertj

cramertj Sep 3, 2017

Member

+1 for replacing the trait with a #[repr(C)] check.

This comment has been minimized.

@joshtriplett

joshtriplett Sep 3, 2017

Member

@ubsan I don't think it makes sense to accept arbitrary Rust types, and in particular not references, or arbitrary structures. But accepting #[repr(C)] structures makes sense (that's possible in C), as does accepting C-style enums with a declared size.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 3, 2017

@joshtriplett Not bad, I don't have any more suggestions at this time, as I believe that solves the problem of the short-lived data escaping the function.

[reference-level-explanation]: #reference-level-explanation

LLVM already provides a set of intrinsics, implementing `va_start`, `va_arg`,
`va_end`, and `va_copy`. The implementation of `VaList::start` will call the

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

Since you've changed the RFC to args: ..., the VaList::start function should no longer be mentioned. Just say va_start will be automatically inserted at the start of the function.

name along with the `...`:

```rust
pub unsafe extern "C" fn func(fixed: u32, ...args) {

This comment has been minimized.

@kennytm

kennytm Sep 3, 2017

Member

VaList::start should now be an alternative itself. This is at best an "alternative syntax".

@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Sep 3, 2017

This is getting long, so I write it as the main comment here.

You mentioned

A function declared with extern "C" may accept a VaList parameter, corresponding to a va_list parameter in the corresponding C function. For instance, the libc crate could define the va_list variants of printf as follows:

unsafe extern "C" { pub fn vprintf(format: *const c_char, ap: VaList) -> c_int; }

This will conflict with impl Drop for VaList which associated va_end to it. Consider

unsafe extern fn forward_to_printf(fmt: *const c_char, ap: ...) -> c_int {
    vprintf(fmt, ap)
}

Here ap is moved into vprintf, which means vprintf should be responsible for dropping ap. However, in C, it is the caller's responsibility to call va_end:

int forward_to_printf(const char* fmt, ...) {
    va_list ap;
    va_start(ap, fmt);
    vprintf(fmt, ap);
    va_end(ap);
}

That means vprintf cannot take VaList as-is. Perhaps it could provide a method:

impl VaList {
    unsafe fn as_va_list(&mut self) -> va_list;
}

so the caller still retains the ownership of VaList:

unsafe extern fn forward_to_printf(fmt: *const c_char, ap: ...) -> c_int {
    vprintf(fmt, ap.as_va_list())
}
(Some non-standard stuff)

Uniquely-borrowing the VaList instead of moving it will allow you to translate weird things like this into Rust:

void just_print_one(va_list ap) {
    printf("%d\n", va_arg(ap, int));
}
void print_nums(size_t n, ...) {
    va_list ap;
    va_start(ap, n);
    for (size_t i = 0; i < n; ++ i) {
        just_print_one(ap);
    }
    va_end(ap);
}
int main() {
    print_nums(4, 1, 2, 3, 4);
}

Note that this violates the C standard §7.16/3, so it is actually invalid. But this works on my machine™.

The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap253).

253) It is permitted to create a pointer to a va_list and pass that pointer to another function, in which case the original function may make further use of the original list after the other function returns.

Actually, would it be even better if args: ... means the type of args is &mut VaList instead of VaList? And va_list is an alias of *mut VaList.

@aturon aturon merged commit 09e34be into rust-lang:master Sep 29, 2017

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Sep 29, 2017

This RFC has been merged! The tracking issue keeps the syntax explicitly opened as a not-completely-resolved question; further discussion on that point should move over to the tracking issue.

Thanks @joshtriplett!

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented May 16, 2018

The RFC as accepted proposes putting VaList in core::intrinsics. However, in rust-lang/rust#49878 , @dlrobertson observed that core::intrinsics has a module-level #![unstable] attached to it, requiring the core_intrinsics feature to use. On that basis, he suggested putting it in core::ffi instead. If that seems reasonable, I could update the RFC to change the proposed location of the API.

(I'd like to avoid bikeshedding this if possible. It definitely has to go in core so that it works without std. The exact location in core doesn't matter much. core::ffi seems like the appropriate place to me.)

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented May 16, 2018

@joshtriplett that seems reasonable, although I want to make sure - it's still on unstable, right?

core::intrinsics seems like an incredibly weird place for VaList and associated things.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented May 16, 2018

@ubsan It's still going to be unstable until stabilized, yes. The issue was that it'd be non-trivial to ever stabilize bits of core::intrinsics.

Also, I think core::intrinsics still makes sense for the underlying functions like va_start and va_arg, which we likely won't stabilize, unlike the higher-level VaList interface.

@dlrobertson

This comment has been minimized.

Copy link

dlrobertson commented May 16, 2018

@ubsan The VaList implementation will be unstable under the feature c_variadic

@ubsan

This comment has been minimized.

Copy link
Contributor

ubsan commented May 16, 2018

@joshtriplett right, that makes sense. We really don't want to stabilize bits of core::intrinsics - we did it once with transmute, and it's... eergh

@dlrobertson

This comment has been minimized.

Copy link

dlrobertson commented Dec 11, 2018

I've started implementing variadic functions in rust, but some people have suggested changing the syntax to pub unsafe extern "C" fn func(fixed: u32, mut args: ...VaList). The benefit of this being, this will be closer to the syntax used for variadic generics. If we go this route, this will also mean I will not have to add another ast::TyKind and hir::TyKind, which will drastically simplify the code required.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Dec 11, 2018

@dlrobertson Writing VaList gets into questions of "but what about lifetimes". We discussed that alternative as part of the RFC and the syntax we settled on didn't have that. Could we please stick with just mut args: ... if possible?

Could you explain more about the complexity that would create?

@dlrobertson

This comment has been minimized.

Copy link

dlrobertson commented Dec 11, 2018

@joshtriplett I have git branches with each implemented, just not generating the va_start/va_end yet. My main concern is the diverging syntaxes for variadic generics and variadic arguments.

Writing VaList gets into questions of "but what about lifetimes".

I'm not sure what is meant by this? The VaList is generated inside the function, so it must not be able to escape. As a result, the type we generate in typeck must be subst with a scoped region for the given function. Then you get a lifetime error when you run something like the following.

fn test(fixed: i32, ap: ...VaList) -> VaList { ap }
// or even
fn test<'a>(fixed: i32, ap: ...VaList<'a>) -> VaList<'a> { ap }

Could you explain more about the complexity that would create?

The complexity is just a side point and mainly has to do with the internals of parsing in the compiler. Either way we've converged after typeck.

  • Using ... syntax
    • Add a new TyKind to the ast and hir (not TyS)
    • Parse the new type in parse_ty_common
    • set variadic for the function if TyKind::VarArgs was found
  • Using ...VaList
    • Add variadic to the ast and hir Arg
    • If we find ... eat it and set variadic for the arg
    • set variadic for the function if arg.variadic == true

Adding a new TyKind is fairly invasive, but like I said, I have both implemented, so I'll be okay with either.

@jethrogb

This comment has been minimized.

Copy link
Contributor

jethrogb commented Dec 11, 2018

I'm not sure what is meant by this? The VaList is generated inside the function, so it must not be able to escape. As a result, the type we generate in typeck must be subst with a scoped region for the given function. Then you get a lifetime error when you run something like the following.

fn test(fixed: i32, ap: ...VaList) -> VaList { ap }
// or even
fn test<'a>(fixed: i32, ap: ...VaList<'a>) -> VaList<'a> { ap }

I believe the issue is that you're writing a type that has a lifetime parameter, except it's not possible to write the correct lifetime as there is no name for it. By writing mut args: ... you're sidestepping the issue because you're not naming the type.

@dlrobertson

This comment has been minimized.

Copy link

dlrobertson commented Dec 11, 2018

I believe the issue is that you're writing a type that has a lifetime, except it's not possible to write the correct lifetime.

Good point. That could indeed be confusing. For now I'll continue with just ... then.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Dec 11, 2018

@Ixrec

This comment has been minimized.

Copy link
Contributor

Ixrec commented Dec 11, 2018

I also see it as a good thing that C variadics get a syntax which is similar enough to variadic generics to be unsurprising, but still distinct enough that it's hard to get confused about which feature you're looking at. Because they are two very different and fundamentally incompatible features.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Dec 11, 2018

@dlrobertson Writing VaList gets into questions of "but what about lifetimes"

This is easily solved by checking that the lifetime parameter of VaList doesn't show up anywhere else in the signature.

It needs to be an "universally quantified" lifetime anyway, just like VaList::copy, so that the body of the function is checked with 0 assumptions, and without the type being explicitly spelled out, we would have to synthesize that ourselves somewhere later in the compiler.


@dlrobertson got a bit into the weeds of the implementation, but this is my perspective on why I think it's better for the user to write out VaList:

With e.g. args: VaList... you're explicit that the variadics you're using are C ones, and what the mechanism you interact with to extract arguments, is (i.e. that you have a VaList value).
I'd rather have args: VaList... (C) vs args: A... (VG) than args: ... vs args: A....

We could even rename VaList to make it clearer it's C variadics (CVaList?).

(Also, a random note: Pat: ...Type or Pat...: Type are suboptimal choices of syntax when you take into account Pat: Type becoming Pat syntax)

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Dec 11, 2018

That seems like revisiting the same argument that was made and explored during the RFC. Is there a new argument here that wasn't previously explored?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment