Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upunions #1444
Conversation
sfackler
reviewed
Jan 5, 2016
|
|
||
| A union may have trait implementations, using the same syntax as a struct. | ||
|
|
||
| The compiler should warn if a union field has a type that implements the `Drop` |
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 5, 2016
Member
Should this be a warning or an error? I assume that the destructor of the field would not run when the union is dropped, right?
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
I would prefer to make it an error, yes. However, Rust does not consider leaks or failing to run a destructor unsafe behavior, per the discussion that occurred around scoped threads. See the documentation of std::mem::forget.
So, I assumed that people would object to making this an error. If not, then I can quite happily change this.
This comment has been minimized.
This comment has been minimized.
sfackler
Jan 5, 2016
Member
My preference would be to forbid Drop types for now. We can always change it to allow them later if there turn out to be compelling use cases.
This comment has been minimized.
This comment has been minimized.
nrc
added
the
T-lang
label
Jan 5, 2016
This comment has been minimized.
This comment has been minimized.
|
It might be worth summarising some of the discussion from the internals thread - there is a lot of it and it's not easy to follow the threads of conversation. I strongly agree that we should support untagged unions in Rust. However, I think that Unions as enums works particularly nicely if are able to use variants as types (which may well be a long way off or may never happen, sadly). In that case only the downcast from the enum type to the variant type has to be unsafe (which it would be for any enum, I imagine) and then other use of the variant can be in safe code. In this case, the only difference for repr(union)/unsafe enums is that you can't match them. |
This comment has been minimized.
This comment has been minimized.
|
See the mention of I can certainly see the argument for that, given that Rust enums represent tagged unions. However, modeling untagged unions on enums produces some syntactic challenges, though. How do you access a field of a union? Enum normally only supports pattern-matching syntax; since the pattern-matching requires unsafe code, pulling out a field F would require something like this: I suspect such syntax would also drive people to include more code in the unsafe block than necessary. By contrast, field access syntax would simplify that to As discussed in the rust-internals thread and mentioned in the alternatives section of this RFC, you could potentially support struct field access syntax with An Writing to fields seems similarly more complicated with enums. As a minor additional nit, Rust warns by default for enum constructors that start with a lowercase character; many FFI interfaces would end up needing to disable those warnings. I think the case of defining an inline structure would work better with an RFC for anonymous struct and union types; I'd be quite happy to write such an RFC as well. Many FFI interfaces will want those anyway, for the common case of a struct containing an anonymous union. However, I don't think that should form part of this RFC; I would suggest a followup after resolving this one. In the meantime, it seems simple enough to define a struct (or tuple struct) and make that a field of the union. All that said, I could live with |
This comment has been minimized.
This comment has been minimized.
As far as I know, I've captured all the major threads of discussion (including alternatives raised and the reasons for them) in the alternatives section. If I've missed one, I can certainly add it. The largest discussion was between |
mahkoh
reviewed
Jan 5, 2016
|
|
||
| If a union contains multiple fields of different sizes, assigning to a field | ||
| smaller than the entire union must not change the memory of the union outside | ||
| that field. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 5, 2016
Contributor
In particular, what happens here:
untagged_union X {
a: u8,
b: u16,
}
let mut x = X { b: 1 };
x.a = 1;
let y = x;Does the compiler have to copy the unused part?
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
Copying a union into some other variable must always copy the entire memory of the union, unless the compiler can prove that nothing reads from other fields of the destination, in which case it could potentially elide moving some data around.
For instance, if you pass y to an FFI function, Rust can't know what parts of the union you intend to read, so it needs to copy the whole thing. On the other hand, if you pass y to a Rust function, and rustc can see that the called function only reads y.a, never y.b, then rustc could potentially elide the copy.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 5, 2016
Contributor
Copying a union into some other variable must always copy the entire memory of the union
Why? Simply make accessing any variant but the one that was written to last undefined.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
That would break many valid usages. For instance, consider a union of a common_header struct and several structs that start with that header; writing to common_header should not invalidate the rest of the data. Ditto for many other common patterns used with unions.
Note that factoring the common header out of the union does not solve the problem. For instance, you might have different types of common headers used for subsets of other fields. And in general, moving fields into or out of a union could require platform-specific understanding of size and alignment.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 6, 2016
Contributor
It would be good to have some examples of such code to see how unions must behave.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 6, 2016
Author
Member
Trivial example:
struct S {
header: COMMON_HEADER,
otherfields: SOME_TYPE,
}
untagged_union U {
header: COMMON_HEADER,
s: S,
// ...
}Writing to u.header (or fields of u.header) should not invalidate u.s and in particular u.s.otherfields.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
retep998
Jan 6, 2016
Member
As I mentioned in my other reply, MSVC does support writing to one variant and reading from another, which means that writing to one variant does not invalidate the non-overlapping bytes of other variants. So regardless of what the C standard dictates, we'd have to support this case on Windows at the very least, and I'm sure other major C compilers behave similarly.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 6, 2016
Author
Member
@mahkoh ACPICA includes a union with that exact pattern; see "union acpi_object" and ACPI_OBJECT_TYPE in https://github.com/acpica/acpica/blob/master/source/include/actypes.h .
mahkoh
reviewed
Jan 5, 2016
|
|
||
| The compiler should consider a union uninitialized if declared without an | ||
| initializer. However, providing a field during instantiation, or assigning to | ||
| a field, should cause the compiler to treat the entire union as initialized. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
What do you mean? This shouldn't matter one way or another for efficiency; I wrote this to clarify under what circumstances the compiler should give an error about accessing uninitialized data.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 5, 2016
Contributor
If that's what you mean by initialized the that's fine. But since many fields might not be initialized, why do you even require that one field is initialized before the union can be accessed?
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
Because it seems straightforward for the compiler to detect, and unambiguously an error. It makes sense to write one field and then pass the union to some function expecting to read that field; it never makes sense to read a field from a newly declared union that you've never written to or initialized at all.
Given your comment, I should update the RFC to clarify this paragraph to specifically reference compiler errors about accessing uninitialized variables.
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 5, 2016
Contributor
it never makes sense to read a field from a newly declared union that you've never written to or initialized at all.
It makes sense to pass a reference to a union to another function which then fills it.
This comment has been minimized.
This comment has been minimized.
solson
Jan 6, 2016
Member
@mahkoh The same is true of (), but we don't allow let x: (); f(&mut x);. We shouldn't start allowing stuff like this now without a strong reason.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 6, 2016
Author
Member
Right now, you can declare a mutable structure without any initializer, write to all of the fields, and then use the structure; you only get an error if you don't write to a field. Given the definition of a union, it seems completely equivalent to say that you can declare a mutable union without any initializer, write to one of its fields, and then use the union.
That said, if this proves a sticking point, I don't think dropping it would make uses of unions significantly more onerous.
This comment has been minimized.
This comment has been minimized.
solson
Jan 6, 2016
Member
Right now, you can declare a mutable structure without any initializer, write to all of the fields, and then use the structure; you only get an error if you don't write to a field.
This doesn't seem to be true: http://is.gd/PGiVd6
I agree that dropping it doesn't make unions much worse.
mahkoh
reviewed
Jan 5, 2016
| behavior](https://doc.rust-lang.org/nightly/reference.html#behavior-considered-undefined). | ||
| In particular, Rust code must not use unions to break the pointer aliasing | ||
| rules with raw pointers, or access a field containing a primitive type with an | ||
| invalid value. |
This comment has been minimized.
This comment has been minimized.
mahkoh
Jan 5, 2016
Contributor
Undef propagation seems to be the bigger problem because it can actually happen with the primitive types usually used in FFI unions.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
If you mean the "Reads of undef (uninitialized) memory" item, yes, agreed; your unsafe code should avoid that. (Rust already has unsafe functions that would make it possible to read uninitialized memory.)
mahkoh
reviewed
Jan 5, 2016
| existing construct the `#[repr(union)]` attribute modifies). | ||
| - Use a compound keyword like `unsafe union`, while not reserving `union` on | ||
| its own as a keyword, to avoid breaking use of `union` as an identifier. | ||
| Potentially more appealing syntax, if the Rust parser can support it. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
joshtriplett
Jan 5, 2016
Author
Member
Interpreting "union" as both a keyword and as an identifier seems rather challenging to support, and potentially fragile for future parser changes.
This comment has been minimized.
This comment has been minimized.
|
I would rather have tuple structs with a |
This comment has been minimized.
This comment has been minimized.
|
I am desperate to have untagged unions in some form to make my life easier, and I support this RFC being accepted and implemented as soon as possible. |
This comment has been minimized.
This comment has been minimized.
|
Since |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@nagisa For FFI purposes, we need named fields. A tuple version of |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett that was more of a “instead of” comment. IMHO you should not expose these untagged unions to safe rust code anyway, so either way is fine, I guess, but I’m not sold by the named fields argument. Also, for additional point (assumingly, a negative one), untagged unions is something @graydon said he was happy to see rust 1.0 to ship without. All in all, my general opinion is that we should have some easier way to produce opaque untagged unions (essentially an opaque struct occupying as much space as necessary), perhaps implemented as a macro; but not the full blown way to do (and abuse) untagged unions. |
This comment has been minimized.
This comment has been minimized.
|
@nagisa Macro solutions are what I already use for unions, and they're a pain to write and use and ensure that they are correct. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ohAitch
commented
Jan 6, 2016
|
The @graydon post definitely suggests |
This comment has been minimized.
This comment has been minimized.
ohAitch
commented
Jan 6, 2016
|
(The library I'm to interface with has essentially a tagged |
This comment has been minimized.
This comment has been minimized.
I skimmed the conversation but don't feel I can really summarize it. Still, here are some posts at least that I found interesting:
I've not yet read the final RFC to see where it ended up though! |
This comment has been minimized.
This comment has been minimized.
|
Microsoft uses unions in two ways.
|
This comment has been minimized.
This comment has been minimized.
This is covered by footnote 95.
|
This comment has been minimized.
This comment has been minimized.
|
@mahkoh Thanks for looking up the exact reference in the C standard! That's exactly the behavior I'd expect in Rust as well. |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 6, 2016
|
As I was mentioned here, some opinions (caveat: still not a core-team member, just opinions):
|
This comment has been minimized.
This comment has been minimized.
|
@graydon I have no objection whatsoever to switching back to |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Jan 7, 2016
|
Didn't assume you did |
This comment has been minimized.
This comment has been minimized.
|
@graydon: I didn't assume you assumed I did. :) I'm just trying to reiterate that I really don't care what color the declaration-syntax bikeshed is painted, just the syntax and semantics for usage. Thus far, that declaration syntax seems like the bit that has produced the largest arguments. I appreciated your comment, and I found it a compelling argument. The only thing making me hesitate to change the RFC on the basis of that argument is that I've also received one vehement complaint (via IRC) against any syntax that uses |
This comment has been minimized.
This comment has been minimized.
I don't think empty unions should act that way; I would suggest that
Agreed. |
This comment has been minimized.
This comment has been minimized.
Except
And |
This comment has been minimized.
This comment has been minimized.
|
Specifically.. InstantiatingAn n-element union can be instantiated in n different ways by specifying one of its n fields.
A zero-element union can be instantiated in zero different ways. It's statically impossible to create one. Reading/WritingAn n-element union can be read/written in n different ways by accessing one of its n fields.
A zero-element union can never be read/written. Pattern matchingThe RFC doesn't specify whether matching zero elements is allowed. All the examples show matching on a single element which means matching on a zero-element union is impossible. The unanswered questions section asks whether we should allow matching on a number of elements other than one. If so then RepresentationThe RFC leaves representation open but lets assume a union can be thought of a chunk of memory with a size and alignment equal to the max size and max alignment of its elements.
The max of the empty set is the identity of the max operation on two elements, ie. negative infinity. This is, conceptually, the size and alignment of uninhabited types. Possible statesThe number of possible states of a union is, at most, the sum of the number of possible states of it's elements. Of course some of these states may overlap, but this at least gives us an upper bound.
A zero-element union can only be in one of at most zero possible states. Therefore it is uninhabited. |
This comment has been minimized.
This comment has been minimized.
|
I wish there were a reaction icon to indicate "nice comment" without necessarily implying "I agree". |
This comment has been minimized.
This comment has been minimized.
|
#1444 (comment) is still strongly based on the "enum for which we don't know the discriminant" interpretation of union, it can be easily rewritten on the basis of "a struct with overlapping zero-offset fields" interpretation to "prove" that the opposite behavior is correct. For example, the first statement: "An n-element union can be instantiated in n different ways by specifying one of its n fields." is incorrect in the "struct" interpretation in which the union can be instantiated by providing "sufficient number" of fields, which is 0 for empty unions. |
This comment has been minimized.
This comment has been minimized.
|
So the two possible rules look like:
The former rule seems far more elegant. I can see the argument that the struct interpretation would work, but it seems clearly uglier. (Note that for the latter rule you cannot say something like "at most one field value", because for non-empty unions you must provide a field. Well, you can say "sufficient number" like @petrochenkov, but then you have to define what that means, which is what my parenthetical in the rule is doing.) |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett @canndrew Perhaps one of you could open an issue to discuss empty unions? I fear that discussion on an already merged PR will be ignored or lost (I only noticed my ping because I was clearing out an email folder). |
This comment has been minimized.
This comment has been minimized.
|
@nrc I don't feel strongly about it one way or another, and I don't have a use case for empty unions. However, I'd be happy to review an RFC or issue about this, or to discuss it further with anyone who does have a use case. |
This comment has been minimized.
This comment has been minimized.
|
I was looking at the The real problem here though is that unions are implemented using |
This comment has been minimized.
This comment has been minimized.
|
@canndrew That will never be true for unions though, because |
This comment has been minimized.
This comment has been minimized.
Until the definition of struct EmptyStruct {
x: !,
y: u32,
}Then this union would get mis-detected as empty: union NonEmptyUnion {
x: !,
y: u32,
}And yes, we could check the |
This comment has been minimized.
This comment has been minimized.
|
@canndrew DRY prevails, so checking the |
bluss
referenced this pull request
Sep 11, 2016
Open
Unions interacting with Enum layout optimization #36394
This comment has been minimized.
This comment has been minimized.
|
Btw the RFC says that the feature is |
This comment has been minimized.
This comment has been minimized.
True. It'd be nice to either fix rustc to use the feature name |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett |
This comment has been minimized.
This comment has been minimized.
|
Oh, and I changed the lint name to match conventions too, this is more important. |
This comment has been minimized.
This comment has been minimized.
|
That seems fine. I do think we should update the RFC with those two changes. Do you want to write that patch or should I? |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett |
joshtriplett commentedJan 5, 2016
•
edited by mbrubeck
RFC: native C-compatible unions via contextually recognized keyword
unionEDIT: After extensive discussion, and grammar experiments by @nikomatsakis to verify feasibility, this RFC and pull request now proposes recognizing
unionas a "contextual keyword", allowingunionto introduce a union declaration while not breaking any existing code that usesunionas an identifier.As discussed in the alternatives section, proposals for unions in Rust have extensively explored possible variations on declaration syntax, including longer keywords (
untagged_union), built-in syntax macros (union!), compound keywords (unsafe union), pragmas (#[repr(union)] struct), and combinations of existing keywords (unsafe enum).Rendered
Discussion on rust-internals
(edited by @nrc to add old title)
(edited by @mbrubeck to link to final rendered version)