Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unions #1444

Merged
merged 14 commits into from Apr 8, 2016

Conversation

Projects
None yet
@joshtriplett
Copy link
Member

joshtriplett commented Jan 5, 2016

RFC: native C-compatible unions via contextually recognized keyword union

EDIT: After extensive discussion, and grammar experiments by @nikomatsakis to verify feasibility, this RFC and pull request now proposes recognizing union as a "contextual keyword", allowing union to introduce a union declaration while not breaking any existing code that uses union as an identifier.

As discussed in the alternatives section, proposals for unions in Rust have extensively explored possible variations on declaration syntax, including longer keywords (untagged_union), built-in syntax macros (union!), compound keywords (unsafe union), pragmas (#[repr(union)] struct), and combinations of existing keywords (unsafe enum).

Rendered

Discussion on rust-internals

(edited by @nrc to add old title)

(edited by @mbrubeck to link to final rendered version)


A union may have trait implementations, using the same syntax as a struct.

The compiler should warn if a union field has a type that implements the `Drop`

This comment has been minimized.

@sfackler

sfackler Jan 5, 2016

Member

Should this be a warning or an error? I assume that the destructor of the field would not run when the union is dropped, right?

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

I would prefer to make it an error, yes. However, Rust does not consider leaks or failing to run a destructor unsafe behavior, per the discussion that occurred around scoped threads. See the documentation of std::mem::forget.

So, I assumed that people would object to making this an error. If not, then I can quite happily change this.

This comment has been minimized.

@sfackler

sfackler Jan 5, 2016

Member

My preference would be to forbid Drop types for now. We can always change it to allow them later if there turn out to be compelling use cases.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

Done.

@nrc nrc added the T-lang label Jan 5, 2016

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Jan 5, 2016

It might be worth summarising some of the discussion from the internals thread - there is a lot of it and it's not easy to follow the threads of conversation.

I strongly agree that we should support untagged unions in Rust. However, I think that #[repr(union)] enum is the better way to do it - it is more flexible since the variants can have multiple fields, it saves us adding another data structure, and I think unions fit the enum intuition more closely than structs.

Unions as enums works particularly nicely if are able to use variants as types (which may well be a long way off or may never happen, sadly). In that case only the downcast from the enum type to the variant type has to be unsafe (which it would be for any enum, I imagine) and then other use of the variant can be in safe code. In this case, the only difference for repr(union)/unsafe enums is that you can't match them.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 5, 2016

See the mention of unsafe enum in the alternatives section; @retep998 wrote an RFC for that.

I can certainly see the argument for that, given that Rust enums represent tagged unions. However, modeling untagged unions on enums produces some syntactic challenges, though. How do you access a field of a union? Enum normally only supports pattern-matching syntax; since the pattern-matching requires unsafe code, pulling out a field F would require something like this: let f = unsafe { let MyUnion::F(f) = u; f };. That gets uglier if you have structs inside unions inside structs, which FFI interfaces actually use quite often (a struct of simultaneously valid fields, inside a union of many such structs, inside an outer struct containing a tag and other common fields). At that point, pattern matching for field access starts to approach the level of syntactic complexity introduced by using macros to define and access union fields; in particular, it does not naturally chain as well, and it requires mixing reading right-to-left with left-to-right ordering.

I suspect such syntax would also drive people to include more code in the unsafe block than necessary.

By contrast, field access syntax would simplify that to let f = unsafe { u.f }; And even with nested structures and unions, you'd have something like let f = unsafe { s.u.f.x };.

As discussed in the rust-internals thread and mentioned in the alternatives section of this RFC, you could potentially support struct field access syntax with unsafe enum, allowing access to the fields using dotted notation. However, that would make unsafe enum's syntax differ from the obvious expectation someone would have by looking at its definition, and dotted field notation doesn't make as much sense for safe, tagged enums. A new construct doesn't come with those syntactic expectations (and anyone coming from C will expect a union to use struct-like syntax).

An unsafe enum with multiple fields would also adds to the complexity of supporting field access syntax: if one constructor of the unsafe enum looks like foo(u32, f32), how would you name the two sub-fields? u.foo.0 and u.foo.1? Most FFI interfaces name the sub-fields, which would require using a separate struct anyway; in that case, would you have to write u.foo.0.subfieldname?

Writing to fields seems similarly more complicated with enums.

As a minor additional nit, Rust warns by default for enum constructors that start with a lowercase character; many FFI interfaces would end up needing to disable those warnings.

I think the case of defining an inline structure would work better with an RFC for anonymous struct and union types; I'd be quite happy to write such an RFC as well. Many FFI interfaces will want those anyway, for the common case of a struct containing an anonymous union. However, I don't think that should form part of this RFC; I would suggest a followup after resolving this one. In the meantime, it seems simple enough to define a struct (or tuple struct) and make that a field of the union.

All that said, I could live with unsafe enum or similar along with struct field access syntax; it doesn't seem as intuitive to me, but I'd take it over not having native support for unions at all. (Only having pattern matching, by contrast, seems like a major wart for ergonomic union usage.) As mentioned, I don't care that deeply about the syntax for declaring unions; I care a lot more about the syntax and semantics for using them.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 5, 2016

It might be worth summarising some of the discussion from the internals thread - there is a lot of it and it's not easy to follow the threads of conversation.

As far as I know, I've captured all the major threads of discussion (including alternatives raised and the reasons for them) in the alternatives section. If I've missed one, I can certainly add it.

The largest discussion was between unsafe enum and struct-like syntax (originally presented as #[repr(union)] struct, though I've now presented it here with a new keyword instead).


If a union contains multiple fields of different sizes, assigning to a field
smaller than the entire union must not change the memory of the union outside
that field.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

Why doesn't the memory of the rest simply become undef?

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

In particular, what happens here:

untagged_union X {
    a: u8,
    b: u16,
}

let mut x = X { b: 1 };
x.a = 1;
let y = x;

Does the compiler have to copy the unused part?

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

Copying a union into some other variable must always copy the entire memory of the union, unless the compiler can prove that nothing reads from other fields of the destination, in which case it could potentially elide moving some data around.

For instance, if you pass y to an FFI function, Rust can't know what parts of the union you intend to read, so it needs to copy the whole thing. On the other hand, if you pass y to a Rust function, and rustc can see that the called function only reads y.a, never y.b, then rustc could potentially elide the copy.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

Copying a union into some other variable must always copy the entire memory of the union

Why? Simply make accessing any variant but the one that was written to last undefined.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

That would break many valid usages. For instance, consider a union of a common_header struct and several structs that start with that header; writing to common_header should not invalidate the rest of the data. Ditto for many other common patterns used with unions.

Note that factoring the common header out of the union does not solve the problem. For instance, you might have different types of common headers used for subsets of other fields. And in general, moving fields into or out of a union could require platform-specific understanding of size and alignment.

This comment has been minimized.

@mahkoh

mahkoh Jan 6, 2016

Contributor

It would be good to have some examples of such code to see how unions must behave.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 6, 2016

Author Member

Trivial example:

struct S {
    header: COMMON_HEADER,
    otherfields: SOME_TYPE,
}

untagged_union U {
    header: COMMON_HEADER,
    s: S,
    // ...
}

Writing to u.header (or fields of u.header) should not invalidate u.s and in particular u.s.otherfields.

This comment has been minimized.

@mahkoh

mahkoh Jan 6, 2016

Contributor

I mean open source C code from well known projects.

This comment has been minimized.

@retep998

retep998 Jan 6, 2016

Member

As I mentioned in my other reply, MSVC does support writing to one variant and reading from another, which means that writing to one variant does not invalidate the non-overlapping bytes of other variants. So regardless of what the C standard dictates, we'd have to support this case on Windows at the very least, and I'm sure other major C compilers behave similarly.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 6, 2016

Author Member

@mahkoh ACPICA includes a union with that exact pattern; see "union acpi_object" and ACPI_OBJECT_TYPE in https://github.com/acpica/acpica/blob/master/source/include/actypes.h .


The compiler should consider a union uninitialized if declared without an
initializer. However, providing a field during instantiation, or assigning to
a field, should cause the compiler to treat the entire union as initialized.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

That doesn't seem to be very efficient.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

What do you mean? This shouldn't matter one way or another for efficiency; I wrote this to clarify under what circumstances the compiler should give an error about accessing uninitialized data.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

If that's what you mean by initialized the that's fine. But since many fields might not be initialized, why do you even require that one field is initialized before the union can be accessed?

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

Because it seems straightforward for the compiler to detect, and unambiguously an error. It makes sense to write one field and then pass the union to some function expecting to read that field; it never makes sense to read a field from a newly declared union that you've never written to or initialized at all.

Given your comment, I should update the RFC to clarify this paragraph to specifically reference compiler errors about accessing uninitialized variables.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

it never makes sense to read a field from a newly declared union that you've never written to or initialized at all.

It makes sense to pass a reference to a union to another function which then fills it.

This comment has been minimized.

@solson

solson Jan 6, 2016

Member

@mahkoh The same is true of (), but we don't allow let x: (); f(&mut x);. We shouldn't start allowing stuff like this now without a strong reason.

This comment has been minimized.

@mahkoh

mahkoh Jan 6, 2016

Contributor

() doesn't allow one to access uninitialized bytes. unions do.

This comment has been minimized.

@solson

solson Jan 6, 2016

Member

I don't see that as a compelling argument.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 6, 2016

Author Member

Right now, you can declare a mutable structure without any initializer, write to all of the fields, and then use the structure; you only get an error if you don't write to a field. Given the definition of a union, it seems completely equivalent to say that you can declare a mutable union without any initializer, write to one of its fields, and then use the union.

That said, if this proves a sticking point, I don't think dropping it would make uses of unions significantly more onerous.

This comment has been minimized.

@solson

solson Jan 6, 2016

Member

Right now, you can declare a mutable structure without any initializer, write to all of the fields, and then use the structure; you only get an error if you don't write to a field.

This doesn't seem to be true: http://is.gd/PGiVd6

I agree that dropping it doesn't make unions much worse.

behavior](https://doc.rust-lang.org/nightly/reference.html#behavior-considered-undefined).
In particular, Rust code must not use unions to break the pointer aliasing
rules with raw pointers, or access a field containing a primitive type with an
invalid value.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

Undef propagation seems to be the bigger problem because it can actually happen with the primitive types usually used in FFI unions.

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

If you mean the "Reads of undef (uninitialized) memory" item, yes, agreed; your unsafe code should avoid that. (Rust already has unsafe functions that would make it possible to read uninitialized memory.)

existing construct the `#[repr(union)]` attribute modifies).
- Use a compound keyword like `unsafe union`, while not reserving `union` on
its own as a keyword, to avoid breaking use of `union` as an identifier.
Potentially more appealing syntax, if the Rust parser can support it.

This comment has been minimized.

@mahkoh

mahkoh Jan 5, 2016

Contributor

Why not just parse it depending on the context?

This comment has been minimized.

@joshtriplett

joshtriplett Jan 5, 2016

Author Member

Interpreting "union" as both a keyword and as an identifier seems rather challenging to support, and potentially fragile for future parser changes.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 5, 2016

I would rather have tuple structs with a #[repr(union)] instead. The only thing that needs to be figured out is initialisation then.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 5, 2016

I am desperate to have untagged unions in some form to make my life easier, and I support this RFC being accepted and implemented as soon as possible.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 5, 2016

Since forget is safe, how about allowing Drop types but guaranteeing that the destructors are not run? Then the user can still use drop_in_place to manually drop the correct variant.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 5, 2016

@mahkoh: I'd like to at least see a warning about it, absent some explicit indication of "meant to do that". I don't deeply care whether it uses a warning or an error; @sfackler suggested making it an error for now, so I did.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 5, 2016

@nagisa For FFI purposes, we need named fields. A tuple version of untagged_union seems easy enough to support as well (using the same declaration syntax and field access syntax as a tuple struct), if you'd find that useful. However, I'd like to avoid expanding this RFC for a first pass unless there's a strong demand for "tuple unions"; perhaps a followup RFC after this one? The bigger this RFC gets and the more it adds, the more it takes to get accepted.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 5, 2016

@joshtriplett that was more of a “instead of” comment. IMHO you should not expose these untagged unions to safe rust code anyway, so either way is fine, I guess, but I’m not sold by the named fields argument.

Also, for additional point (assumingly, a negative one), untagged unions is something @graydon said he was happy to see rust 1.0 to ship without.


All in all, my general opinion is that we should have some easier way to produce opaque untagged unions (essentially an opaque struct occupying as much space as necessary), perhaps implemented as a macro; but not the full blown way to do (and abuse) untagged unions.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 6, 2016

@nagisa Macro solutions are what I already use for unions, and they're a pain to write and use and ensure that they are correct.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 6, 2016

@nagisa: I absolutely agree that we shouldn't expose untagged unions to safe Rust code; only unsafe code can do anything interesting with them.

I've seen the macros @retep998 refers to; we desperately need a portable, safe solution.

@ohAitch

This comment has been minimized.

Copy link

ohAitch commented Jan 6, 2016

The @graydon post definitely suggests unsafe union. Null pointers and array "overruns", after all, are also not difficult to achieve in unsafe code, being similarly necessary for C FFI.

@ohAitch

This comment has been minimized.

Copy link

ohAitch commented Jan 6, 2016

(The library I'm to interface with has essentially a tagged union { u31 direct; 0x80000000 | mpz_t* indirect }, where mpz_t is the GMP bigint struct. This RFC seems like the correct approach.)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Jan 6, 2016

@nrc

It might be worth summarising some of the discussion from the internals thread - there is a lot of it and it's not easy to follow the threads of conversation.

I skimmed the conversation but don't feel I can really summarize it. Still, here are some posts at least that I found interesting:

I've not yet read the final RFC to see where it ended up though!

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 6, 2016

Microsoft uses unions in two ways.

  1. A union inside a struct which has a tag field. In this case the user would only need to access the union from the variant that it was initialized with. There are no doubts about this having well specified behavior.
  2. A union that provides several representations of the same data. Such as having one variant be a UINT64 and the other variant a struct of four USHORTs. In this case, the user is expected to be able to write to one of the variants and read from the other variant. Even though the behavior of using a union like this isn't specified by some set of the C/C++ standards, it is specified by MSVC. Thus Rust would need to support it too.
@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

Even though the behavior of using a union like this isn't specified by some set of the C/C++ standards

This is covered by footnote 95.

If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 6, 2016

@mahkoh Thanks for looking up the exact reference in the C standard! That's exactly the behavior I'd expect in Rust as well.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 6, 2016

As I was mentioned here, some opinions (caveat: still not a core-team member, just opinions):

  • I'm fine with this as an unsafe FFI feature. I don't think anyone's proposing it as safe, so no problem.
  • I'd pretty strongly prefer doing it as an attribute like #[repr(union)] struct rather than a new keyword.
    • It is just a question of representation for a structure-with-fields. The representation is "the fields overlap". It's a weird fact of the representation, but memory changing underfoot is a fact of life in unsafe-land, for many reasons. Cf. multithreading, mmio.
    • Since they're inherently unsafe, the only function that ever sees them will be a wrapper that enforces safe access, and is very aware of their union-ness. Only the -sys code. They won't be drifting into safe library-user code. So I disagree that these will cause any user confusion
    • Yet more C-interop FFI tricks may yet wind up in repr(...); it's the right place for this. Cf. RFC 314, bitfields
    • I think the bar for new keywords is / should be extremely high, post-1.0
    • The question "what is the language versioning plan?" is interesting but unrelated since nobody asking for this feature would likely want to wait until 2.0, which might be years off. Meanwhile imo it's bad precedent to "allow new keywords, so long as they're suitably long/improbable/unwieldy".
@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 7, 2016

@graydon I have no objection whatsoever to switching back to #[repr(union)] struct rather than a new keyword.

@graydon

This comment has been minimized.

Copy link

graydon commented Jan 7, 2016

Didn't assume you did 😄 Just. Hrm. Sigh. Re-reading my words. I'm sorry, I need to figure out some way of participating in this community that doesn't come down like a ton of bricks / overstate my case all the time. I tried rewriting that 4 times and even still it's too pushy. Bleh. I realize my words carry more weight than they should. I haven't contributed a line of code in over 2 years now!

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Jan 7, 2016

@graydon: I didn't assume you assumed I did. :) I'm just trying to reiterate that I really don't care what color the declaration-syntax bikeshed is painted, just the syntax and semantics for usage.

Thus far, that declaration syntax seems like the bit that has produced the largest arguments.

I appreciated your comment, and I found it a compelling argument. The only thing making me hesitate to change the RFC on the basis of that argument is that I've also received one vehement complaint (via IRC) against any syntax that uses struct.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Aug 29, 2016

I do feel a bit uncomfortable having a type that can be written, but not instantiated

I don't think empty unions should act that way; I would suggest that union U {} has one value, U {}, just like the empty struct or empty tuple.

I think the way forward is to implement without empty unions and for a follow-up RFC to discuss them.

Agreed.

@canndrew

This comment has been minimized.

Copy link
Contributor

canndrew commented Aug 29, 2016

If we do allow them, I think it makes sense to treat them as having one possible value (U {}), rather than zero possible values.

Except U {} isn't a possible value. The RFC clearly states

A union instantiation uses the same syntax as a struct instantiation, except that it must specify exactly one field:

And U {} doesn't specify exactly one field. I think debating whether unions are more "enum-like" or "struct-like" is a red herring. Unions are union-like, if we want to know what the semantics are for a zero-element union just look at the semantics we've defined for an n-element union and set n to zero.

@canndrew

This comment has been minimized.

Copy link
Contributor

canndrew commented Aug 29, 2016

Specifically..

Instantiating

An n-element union can be instantiated in n different ways by specifying one of its n fields.

n = 2 n = 1 n = 0
union U2 { f0: T0, f1: T1, } union U1 { f0: T0, } union U0 {}
let u: U2 = U2 { f0: foo }
let u: U2 = U2 { f1: foo }
let u: U1 = U1 { f0: foo }

A zero-element union can be instantiated in zero different ways. It's statically impossible to create one.

Reading/Writing

An n-element union can be read/written in n different ways by accessing one of its n fields.

n = 2 n = 1 n = 0
union U2 { f0: T0, f1: T1, } union U1 { f0: T0, } union U0 {}
u.f0 = foo
u.f1 = foo
u.f0 = foo

A zero-element union can never be read/written.

Pattern matching

The RFC doesn't specify whether matching zero elements is allowed. All the examples show matching on a single element which means matching on a zero-element union is impossible. The unanswered questions section asks whether we should allow matching on a number of elements other than one. If so then U {} => ... would be allowed however this could only appear in dead code because U is impossible to create.

Representation

The RFC leaves representation open but lets assume a union can be thought of a chunk of memory with a size and alignment equal to the max size and max alignment of its elements.

n = 2 n = 1 n = 0
union U2 { f0: T0, f1: T1, } union U1 { f0: T0, } union U0 {}
max({T0, T1}) max({T0}) max({})

The max of the empty set is the identity of the max operation on two elements, ie. negative infinity. This is, conceptually, the size and alignment of uninhabited types.

Possible states

The number of possible states of a union is, at most, the sum of the number of possible states of it's elements. Of course some of these states may overlap, but this at least gives us an upper bound.

n = 2 n = 1 n = 0
union U2 { f0: T0, f1: T1, } union U1 { f0: T0, } union U0 {}
states(U2) = 0
+ states(T0)
+ states(T1)
states(U1) = 0
+ states(T0)
states(U0) = 0

A zero-element union can only be in one of at most zero possible states. Therefore it is uninhabited.

@glaebhoerl

This comment has been minimized.

Copy link
Contributor

glaebhoerl commented Aug 29, 2016

I wish there were a reaction icon to indicate "nice comment" without necessarily implying "I agree".

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 29, 2016

@canndrew

#1444 (comment) is still strongly based on the "enum for which we don't know the discriminant" interpretation of union, it can be easily rewritten on the basis of "a struct with overlapping zero-offset fields" interpretation to "prove" that the opposite behavior is correct.

For example, the first statement: "An n-element union can be instantiated in n different ways by specifying one of its n fields." is incorrect in the "struct" interpretation in which the union can be instantiated by providing "sufficient number" of fields, which is 0 for empty unions.

@solson

This comment has been minimized.

Copy link
Member

solson commented Aug 29, 2016

So the two possible rules look like:

  1. unions are always initialized by providing exactly one field value
  2. unions are initialized by providing exactly one field value (or zero for the special case of an empty union)

The former rule seems far more elegant. I can see the argument that the struct interpretation would work, but it seems clearly uglier.

(Note that for the latter rule you cannot say something like "at most one field value", because for non-empty unions you must provide a field. Well, you can say "sufficient number" like @petrochenkov, but then you have to define what that means, which is what my parenthetical in the rule is doing.)

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Aug 29, 2016

@joshtriplett @canndrew Perhaps one of you could open an issue to discuss empty unions? I fear that discussion on an already merged PR will be ignored or lost (I only noticed my ping because I was clearing out an email folder).

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Aug 29, 2016

@nrc I don't feel strongly about it one way or another, and I don't have a use case for empty unions. However, I'd be happy to review an RFC or issue about this, or to discuss it further with anyone who does have a use case.

@canndrew

This comment has been minimized.

Copy link
Contributor

canndrew commented Sep 5, 2016

I was looking at the is_uninhabited method in librustc/ty/sty.rs and found this line. That's kinda wrong. It'll work for now but once that FIXME(24885) gets fixed (which is something I went there to do) it'll cause things to break because a union with a single uninhabited member type will be seen as uninhabited.

The real problem here though is that unions are implemented using AdtDefData. AdtDefData is for defining algebraic data types and unions are in no way, shape or form an algebraic data type.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 5, 2016

@canndrew That will never be true for unions though, because .is_empty() is .variants.is_empty(), and unions always have one variant.
Sure, using AdtDefData for unions may seem wrong, but anything else would be anti-DRY in the compiler.
I'd rather rename AdtDef to DataDef or something instead of moving unions off of it.

@canndrew

This comment has been minimized.

Copy link
Contributor

canndrew commented Sep 5, 2016

That will never be true for unions though, because .is_empty() is .variants.is_empty(), and unions always have one variant.

Until the definition of .is_empty() changes to variants.iter().all(|v| v.is_empty()) so that it can detect that this struct is empty:

struct EmptyStruct {
    x: !,
    y: u32,
}

Then this union would get mis-detected as empty:

union NonEmptyUnion {
    x: !,
    y: u32,
}

And yes, we could check the IS_UNION flag in is_empty, but it seems bad for the compiler to be reusing the same datatype for both structs and unions in the first place. Structs and unions are only superficially similar in terms of syntax, the underlying types have completely different semantics.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Sep 5, 2016

@canndrew DRY prevails, so checking the AdtKind is the correct solution.
Also, it should probably be is_uninhabited instead of is_empty, maybe make it a Ty flag?
Come to think of it, you also have to handle cycles, not just union, so that'll be the bulk of the complexity.

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Sep 11, 2016

Btw the RFC says that the feature is union, but rustc says untagged_unions.

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Sep 16, 2016

@nox

Btw the RFC says that the feature is union, but rustc says untagged_unions.

True. It'd be nice to either fix rustc to use the feature name union, or fix the RFC. I'd prefer the former, since it's still hopefully early enough.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Sep 16, 2016

@joshtriplett
I chose the feature name in rustc to follow naming conventions + it's temporary and not really important + I wouldn't want to break people's code for nothing. I'd suggest to fix the RFC and forget about it (or simply forget about it).

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Sep 16, 2016

Oh, and I changed the lint name to match conventions too, this is more important.
#[allow(union_field_drop)] => #[allow(unions_with_drop_fields)]

@joshtriplett

This comment has been minimized.

Copy link
Member Author

joshtriplett commented Sep 16, 2016

That seems fine. I do think we should update the RFC with those two changes. Do you want to write that patch or should I?

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Sep 16, 2016

@joshtriplett
It's Friday evening here and I don't want to write any patches 😄

@alexbool alexbool referenced this pull request Nov 18, 2016

Closed

Support untagged unions #781

@petrochenkov petrochenkov referenced this pull request Feb 23, 2017

Closed

Unions 1.2 #1897

@dariost dariost referenced this pull request Sep 3, 2017

Merged

Introducing bindgen #695

xfix added a commit to xfix/rfcs that referenced this pull request Oct 18, 2017

petrochenkov added a commit that referenced this pull request Oct 18, 2017

Merge pull request #2181 from xfix/patch-1
Correct pull request URL in RFC #1444

petertodd added a commit to petertodd/rfcs that referenced this pull request Nov 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.