FFI and union #5492

Closed
sanxiyn opened this Issue Mar 22, 2013 · 28 comments

Projects

None yet
@sanxiyn
Member
sanxiyn commented Mar 22, 2013

How would one call C functions involving union with Rust FFI?

SpiderMonkey's jsval is one example.

Contributor

There could be unsafe enum with the layout defined to be the same as C for interoperability. The only other way to deal with it would be finding the alignof and sizeof of the union in C for each platform and then translating that to Rust.

Member
sanxiyn commented Apr 24, 2013

Referencing Aatch/rust-xcb#2.

Contributor
yichoi commented Apr 25, 2013

referencing servo/servo#398

referencing servo/rust-mozjs#9

Contributor
Aatch commented May 9, 2013

The unsafe enum idea appeals to me, since I thought about it as an option when trying to solve the union issue in rust-xcb, but decided that relying on the representation of enums was too "hacky" and fragile.

Member
pnkfelix commented Jul 1, 2013

brson mentions in the description for #6346 that a "macro based solution" would be appropriate here, though I do not current know what that would entail. (It sounds to me like a potential alternative to the changes to the grammar to add unsafe enum that have been discussed here.)

Member
pnkfelix commented Jul 1, 2013

Nominating for milestone 3, feature complete.

Contributor
cmr commented Jul 31, 2013

I don't think a "macro-based solution" would be appropriate, as you need to restrict the valid range of values at the site of usage, which macros cannot do.

Contributor
graydon commented Aug 8, 2013

An attribute on an enum that makes it have no discriminant and makes any match on the variant-part succeed, should be sufficient. Not pretty but neither are C union semantics.

Contributor
graydon commented Aug 8, 2013

accepted for feature-complete milestone

Skrylar commented Dec 13, 2013

I ran in to this problem recently as well; Allegro makes use of Unions for passing events around in C, which turns out to be a pain to deal with in Rust.

@mzabaluev mzabaluev referenced this issue in gi-rust/grust Feb 8, 2014
Open

Need a way to represent union values #11

Member

We do want to solve this problem eventually, but it need not block 1.0. Assigning P-low.

@pnkfelix pnkfelix added P-low and removed P-high-untriaged labels Feb 13, 2014
alxkolm commented Nov 3, 2014

What status?

What's the recommended way to do FFI-compatible unions?

Contributor
jdm commented Jan 21, 2015

I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.

Contributor

I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.

Make sure you get the alignments right. The struct should have #[repr(C)] and the field posing as the union (or the inner type, in case the newtype struct emulates the union itself) has the alignment of the most-aligned variant.

@jdm Even when variants are different sizes? transmute errors when T and U have different sizes, and transmute_copy is just as dangerous since it copies sizeof(U) bytes, triggering "undefined behavior".

Contributor

Also, the overall size of the union is a multiple of the alignment of its most-aligned variant. This union has the size of 8:

union A {
    int32_t intval;
    char chars[5];
};

Which would require a Rust representation like:

#[repr(C)]
struct A {
    union_data: [i32; 2]
}

So yes, representing unions is not for the unwary.

@mzabaluev For a C union like this:

struct INPUT {
  DWORD type;
  union {
    MOUSEINPUT    mi;
    KEYBDINPUT    ki;
    HARDWAREINPUT hi;
  };
};

I use a struct field rather bytes. It's easier because the size and alignment change between platforms, and you can't do [u8; size_of::<MOUSEINPUT>()]

#[repr(C)]
pub struct MOUSEINPUT { ... }
#[repr(C)]
pub struct KEYBDINPUT { ... }
#[repr(C)]
pub struct HARDWAREINPUT { ... }

#[repr(C)]
pub struct INPUT {
    pub tag_: DWORD,
    pub union_: MOUSEINPUT, // MOUSEINPUT largest and most aligned
}
Contributor

@alexchandel Good when it works, but sometimes the largest variant is not the most aligned, like in my example above.

Is there a reason why this bug is tagged as "P-low"?
The alternatives that are proposed and I guess currently used entails that a great care is taken for handling alignment properly.
The last example on how this can be fixed without any language addition, is a perfect example how the language is promoting to write code that is incorrent because it don't provide a proper solution

I don't know how feasible it would be to implement, but an example usage could be:

#[repr(union)]
pub struct XEvent {
  pub type_: c_int,
  pub xany: XAnyEvent,
  // ...
  pub pad: [c_long; 24],
}

Like C unions, each field would start at the beginning of the struct, and the size of the struct would be that of its longest field. This wouldn't require adding union as a language keyword. The only limitation I can think of would be that accessing a field in the union would require unsafe, which is already used often when interfacing with C libraries.

A macro based solution could look something like:

union! {
  pub union XEvent {
    pub type_: c_int,
    pub xany: XAnyEvent,
    // ...
    pub pad: [c_long; 24],
  }
}

// functions generated by macro:
impl XEvent {
  pub unsafe fn type_<'a> (&'a self) -> &'a c_int { ::std::mem::transmute(self) }
  pub unsafe fn type__mut<'a> (&'a mut self) -> &'a mut c_int { ::std::mem::transmute(self) }
  pub unsafe fn xany<'a> (&'a self) -> &'a XAnyEvent { ::std::mem::transmute(self) }
  pub unsafe fn xany_mut<'a> (&'a mut self) -> &'a mut XAnyEvent { ::std::mem::transmute(self) }
  // ...
  pub unsafe fn pad<'a> (&'a self) -> &'a [c_long; 24] { ::std::mem::transmute(self) }
  pub unsafe fn pad_mut<'a> (&'a mut self) -> &'a mut [c_long; 24] { ::std::mem::transmute(self) }
}

The only thing that prevented me from writing this macro is the inability to determine the size of the union at compile time. The best workaround I could come up with is providing a guess of the size of the largest field and making the union generate tests to verify this.

union! {
  pub union XEvent : [c_long; 24] {
    pub type_: c_int,
    pub xany: XAnyEvent,
    // ...
    pub pad: [c_long; 24],
  }
}

// test generated by macro:
#[test]
fn test_union_size_XEvent () {
  use std::cmp::max;
  use std::mem::size_of;
  let sizes = [
    size_of::<c_int>(),
    size_of::<XAnyEvent>(),
    // ...
    size_of::<[c_long; 24]>(),
  ];
  assert!(sizes.iter().fold(0, |a, b| max(a, *b)) == size_of::<[c_long; 24]>());
}

Of course, it would be much easier on developers of language bindings to have unions available as a language feature.

Member

winapi would benefit massively from unions as part of the core language. I currently use a macro to make do, but its just not the same.

Member

I'm interested in unions as well, for several Linux kernel APIs. The proposal of having an "unsafe union", guaranteed to match the C layout, would work perfectly; almost any non-trivial instance of such a C union only makes sense to access in an unsafe block, given its trivial equivalence to the unsafe std::mem::transmute.

serprex commented Oct 25, 2015

Most unions in C have a descriptor field, therefore there's a need for 2 cases (has-desciptor & has-no-descriptor). Being able to specify a struct-unique enum with custom type descriptor & the fields corresponding values would allow Rust to use the union in a type safe manner while being able to interoperate with C APIs

Essentially something like

#[enum_explicit_descriptor(t)]
#[enum_explicit_values = "I: 0, N: 1"]
unsafe struct TValue{
  t: u8,
  val: unsafe enum IntOrFloat{
    I(i32),
    N(f32),
  },
}

Using unsafe struct to handle cases where the type descriptor isn't adjacent to the union. Even then, something could be done like

#[enum_explicit_descriptor_type(u8)]
#[enum_explicit_descriptor_typeoffset(-1)] // This could be behind-the-struct by default
#[enum_explicit_values = "I: 0, N: 1"]
enum IntOrFloat{
  I(i32),
  N(f32),
}

Then there'd need to be compile-time machinery that makes sure there's a valid u8 behind the enum in definitions, though user code would access a struct TValue{ t:u8, val: IntOrFloat }

The issue of having typeoffset could be resolved by requiring explicit enums only be contained in structs & have enum_explicit_layout_typeoffset be specified by the struct. Would require a bit more strictness though since one wouldn't be able to know how to find the descriptor of an &IntOrFloat parameter

Contributor

@serprex: I don't think it's worthwhile to add language support for external descriptors of unions, even in cases where there is a 1:1 match between a single descriptor field value and a union variant. The code using unions is expected to be close to FFI, where unsafe is the norm; so variant matching can be always unsafe, and the burden of ensuring the correct variant would be completely on the programmer, as it is in C.

Member

@mzabaluev I agree. For a first pass, at least, we just need an unsafe construct to access fields of a C union in a C-compatible, interoperable way. We can always produce a safe wrapper around that, and even produce macros to generate such wrappers for common cases.

Member

I posted a preliminary proposal using #[repr(C,union)] struct { ... } (requiring unsafe blocks for field accesses, assignments, or initializations) to https://internals.rust-lang.org/t/pre-rfc-unsafe-enums/2873/23.

Owner
huonw commented Jan 5, 2016

Closing in favour of rust-lang/rfcs#877.

@huonw huonw closed this Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment