Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upUnions 1.2 #1897
Conversation
petrochenkov
referenced this pull request
Feb 12, 2017
Open
Untagged unions (tracking issue for RFC 1444) #32836
kennytm
reviewed
Feb 13, 2017
| - `Eq` (but not `PartialEq`), supported for unions with `Eq` fields. | ||
|
|
||
| Since accessing union fields reliably requires extra knowledge, traits trying to | ||
| do it (e.g. `PartialEq`) cannot be derived automatically. |
This comment has been minimized.
This comment has been minimized.
kennytm
Feb 13, 2017
Member
Do you mean one cannot derive PartialEq alone? I don't see how can you derive Eq without deriving PartialEq.
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
•
Author
Contributor
derive(PartialEq) needs to actually generate code while derive(Eq) is just an assertion like derive(Copy). Otherwise #1897 (comment)
| u = U { a: NoisyDrop }; // Silence | ||
| } | ||
| } | ||
| ``` |
This comment has been minimized.
This comment has been minimized.
kennytm
Feb 13, 2017
Member
What happens when we write to an inactive field with Drop, UB due to a read? Unresolved question?
union U {
a: NoisyDrop1,
b: NoisyDrop2,
}
fn main() {
unsafe {
let mut u = U { a: NoisyDrop1 };
u.b = NoisyDrop2; // calls `NoisyDrop2::drop(&mut u.b)` (UB)? or something else?
}
}
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
•
Author
Contributor
Calls NoisyDrop2::drop if u contains a valid value of NoisyDrop2, UB due to a read otherwise.
I will add this example to the text.
nagisa
reviewed
Feb 13, 2017
| used to introduce a union declaration. A declaration `fn union() {}` will not | ||
| produce such an error. | ||
| The key property of unions is that all fields of a union share common storage. | ||
| As a result writes to one field of a union can overwrite its other fields, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
Author
Contributor
If we write into an empty field, e.g. (), it will not overwrite anything (#1897 (comment)).
| - `Copy`, supported only for unions with `Copy` fields | ||
| - `Clone`, supported only for unions implementing `Copy`, which can be cloned | ||
| trivially | ||
| - `Eq` (but not `PartialEq`), supported for unions with `Eq` fields. |
This comment has been minimized.
This comment has been minimized.
nagisa
Feb 13, 2017
Contributor
I would expect Eq to be derivable just fine for union that implements PartialEq manually. This requirement seems overly restrictive to me – and also doesn’t really match behaviour of derive(Eq) elsewhere IIRC.
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
Author
Contributor
Maybe the wording is bad? This
I would expect Eq to be derivable just fine for union that implements PartialEq manually.
is exactly the case.
| more detail). | ||
|
|
||
| Write to a union field can potentially overwrite contents of its other fields. | ||
| Write to a union field should not modify any bits outside of the modified field |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
Author
Contributor
To support this.
(This guarantee was in the original RFC as well.)
This comment has been minimized.
This comment has been minimized.
| u.a = NoisyDrop; // BOOM! | ||
| u = U { a: NoisyDrop }; // Silence | ||
| } | ||
| } |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| ### Borrow checking | ||
|
|
||
| If struct field is borrowed (mutably or immutably), then the struct as a whole |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
petrochenkov
Feb 13, 2017
Author
Contributor
The next sentence is about unions, this one is actually about structs.
This comment has been minimized.
This comment has been minimized.
burdges
commented
Feb 13, 2017
|
Your
|
aturon
added
the
T-lang
label
Feb 13, 2017
This comment has been minimized.
This comment has been minimized.
sgrif
reviewed
Feb 13, 2017
|
|
||
| Either of the following: | ||
|
|
||
| - primitive scalar type (including references) or `str`, or |
This comment has been minimized.
This comment has been minimized.
sgrif
Feb 13, 2017
Contributor
Probably don't need the , or at the end of each item here since it's a list. I though there was a markdown rendering error at first.
This comment has been minimized.
This comment has been minimized.
|
Re the technical details, this is wonderful. I didn't follow the original thread very closely, and had no idea this much static analysis was in the works. There's a lot of "unions are like structs" phrasing however, which makes me worried. Don't get me wrong, I'm perfectly happy with the current syntax of unions as a intentional legacy homage / enabler of basic "copy-paste" FFI, but I fail to see how the semantics of union are at all like structs, except in that they are both "aggregate things" (and so are enums). To anyone learning Rust before C, I think this would be quite confusing. |
This comment has been minimized.
This comment has been minimized.
|
This feels... weird, as an RFC. There's a lot of language that makes me feel like it is supposed to be written in a standardese-y sort of way, and then it says stuff like
which... okay, I get what you're trying to go for here, but it's not well specified at all. I'd much rather go the C++ route of saying you can read the common initial subset, but otherwise you get unspecified behavior. |
This comment has been minimized.
This comment has been minimized.
|
And I realize most of that is due to the unsafe code guidelines team not having finished very much, but... this feels a bit like overstepping the bounds of guarantees. At most, we should guarantee what C or C++ guarantee, and we may add more guarantees later as part of the unsafe code guidelines team. |
This comment has been minimized.
This comment has been minimized.
|
And I realize it may have been in the original, but in the cases where the unsafe {
match u {
MyUnion { f1: 10 } => { println!("ten"); }
MyUnion { f2 } => { println!("{}", f2); }
}
}will always read an invalid member in cases where I'm also not certain of the tagged union case. I'd assume that matchers (conceptually) read the whole value, and only then test whether the thing holds. Therefore, if you match on a tagged union like the following: unsafe {
match v {
Value { tag: I, u: U { i: 0 } } => true,
Value { tag: F, u: U { f: 0.0 } } => true,
_ => false,
}
}In each branch, you're reading the entire thing, and then checking whether tag is |
This comment has been minimized.
This comment has been minimized.
|
The section "Active fragment and active field" absolutely squicks me out. This is nowhere near the rules that I would write: There should be one active variant of a union. There are two ways to change the currently active variant; The general case: A specific case, for trivially destructible types: Writes into a non-active variant (like |
This comment has been minimized.
This comment has been minimized.
If I understand what you mean by that correctly (which I may well not have): making that UB would break well-established use cases for unions. |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett If the current active variant is |
This comment has been minimized.
This comment has been minimized.
|
Also, this RFC requires an editor to go over the language used. It's unclear and/or ambiguous in some parts. |
This comment has been minimized.
This comment has been minimized.
|
@ubsan I agree about the "trivially destructible" part. However, I think this needs to work: #![feature(untagged_unions)]
struct X {
a: u32,
b: u32,
}
union U {
full: u64,
halves: X
}
fn main() {
let mut value = U { full: 0x0123456789ABCDEF };
unsafe { value.halves.a = 42; }
println!("{:#x} {:#x}", unsafe { value.halves.b }, unsafe { value.full });
}Runnable playground version: https://is.gd/GNHHgL Rough sketch of what makes that safe: every field impls Copy, none implement Drop, nothing reads padding, and all the types involved are "total" (every possible bit value has a safe interpretation without UB). |
This comment has been minimized.
This comment has been minimized.
comex
commented
Feb 13, 2017
|
@ubsan Both the C and C++ standards allow reading from the "wrong" field of a union as long as the memory contains a valid bit pattern for the type in question. (Whether it does depends on type representation, which is implementation defined, but that's no different from the case in this RFC.) There's no good reason for Rust to be stricter than C on the matter, especially given that the feature's main use case is FFI. The common initial subset rule in C that you allude to is separate, and kind of a mess / not actually implemented by compilers, but has to do with aliasing pointers to the field types in the union, not accesses through the union type itself. Though even there, I can't imagine a world where the Rust unsafe code guidelines end up forbidding pointer transmutes, given the amount of existing code that relies on them. Implied destructor calls should be treated no differently from any other call: valid if the memory has a valid bit pattern for the type being destructed. However, your point about match patterns is well taken. Also, there is one area where this RFC would make stronger guarantees than C: the effect of writes to union fields on unused bits (i.e. bits that correspond to other fields but not the field in question). This RFC says they're unaffected; C says they take unspecified values. It may be worth weakening the guarantee to match C for now. |
This comment has been minimized.
This comment has been minimized.
comex
commented
Feb 13, 2017
|
Sigh, again I have to correct myself about the C++ standard. It does arguably prohibit reading anything other than the common initial subsequence, a concept it uses in a totally different way from the C standard… Though IMO only arguably. In any case, C explicitly allows it, and in practice it's common for code written in both language to depend on it as one of the "blessed" ways to perform bitcasts (the other is memcpy). |
This comment has been minimized.
This comment has been minimized.
|
@comex I'll be honest, I care very little about C. My interest is in C++. Therefore, I write from a C++ mindset. |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett mmmmmmmmhhhhheeehghhgiheghe I can... kinda see how that might be useful, but it at least should be implementation defined. I'm made uncomfortable by it, but I can see the use for things like microsoft's // note: https://msdn.microsoft.com/en-us/library/windows/desktop/aa383713(v=vs.85).aspx |
This comment has been minimized.
This comment has been minimized.
Specifically for people with C++ mindset I wrote the Type punning section and listed examples that should work and should not be UB. This kind of stuff is used a lot in low-level code (emulation of hardware or interaction with it). |
This comment has been minimized.
This comment has been minimized.
|
Discussing stuff like compatible initial sequences and other layout compatibility at type level is the dead end IMO. |
This comment has been minimized.
This comment has been minimized.
comex
commented
Feb 13, 2017
|
@ubsan Put it another way: The main reason not to allow it would be if a compiler can make optimizations based on treating it as UB, which matters if either (a) it can improve performance of Rust code, or (b) Rust may want to use existing compiler backends that bake in those optimizations. The fact that ~every C++ compiler is also a C compiler, along with the fact that union type punning is common in C++ code too (even if possibly unsanctioned by the standard), strongly suggests that (b) does not apply. (a) is dubious to start with for a FFI- and low-level-focused feature, and is even less likely given that the C standard is mostly designed to maximize optimization potential (with exceptions that don't apply here). Further, as I said, allowing type punning through unions pretty clearly seems to be 'easier' than allowing it through pointer transmutes, yet the latter are effectively guaranteed in Rust at this point due to existing code depending on them. I suppose another reason to forbid it could be to allow sanitizers to complain about it, which could help code that uses unions for general storage purposes (not FFI) and doesn't intentionally type pun. But for that purpose, accesses of inactive variants should be prohibited altogether, without exceptions for initial common subsequences and such; and such a rule should be possible to opt out of, or more probably should be opt-in. |
This comment has been minimized.
This comment has been minimized.
|
“Trivial destructible” relies on
All point to that the restriction will be dropped if the implementation is fixed. I think we should spell out the assumption that |
This comment has been minimized.
This comment has been minimized.
|
Type punning through unions has to be legal enough for stuff like |
This comment has been minimized.
This comment has been minimized.
Mark-Simulacrum
commented on text/1444-union.md in 8f1a960
Feb 14, 2017
|
This feels somewhat inconsistent to me. Is there a specific reason for it? It also feels like this is hard to track, since passing a reference to a constant is reasonable, and IIRC, we eventually will lose the 'constness' of the reference. Either way it seems like an unnecessary limitation unless I'm missing something. |
This comment has been minimized.
This comment has been minimized.
Unsafety in pattern matching, unlike unsafety in expressions, is a new thing (unions are recently implemeted and unsafe borrowing of packed struct fields is still unimplemented) so it doesn't have any conveniences like |
This comment has been minimized.
This comment has been minimized.
Mostly limitations of current const eval, it cannot reinterpret memory. |
SimonSapin
referenced this pull request
May 20, 2017
Open
Unions interacting with Enum layout optimization #36394
This comment has been minimized.
This comment has been minimized.
gbutler69
commented
Jun 4, 2017
•
|
IGNORE: Missed the copy impl. |
petrochenkov
referenced this pull request
Aug 3, 2017
Closed
Rust mutable union allows write to field w/o needing unsafe block in stable ( Documentation issue? ) #2095
This comment has been minimized.
This comment has been minimized.
|
I feel bad about how little attention this has gotten from the lang team so far. Could use some help, so I'm raising the bat signal: @joshtriplett, @whitequark, @solson, @scottmcm, @Ixrec, any of you have time to catch up on this thread and summarize the status? |
This comment has been minimized.
This comment has been minimized.
|
I have absolutely no experience with anything FFI-related or union-related, but here goes nothing... With regards to making type punning legal, the high-level design presented here seems like the correct one. @joshtriplett's and @retep998's comments make it pretty clear that we need to "make unions like The problems I see are:
To expand on 2 a bit:
Currently, I think the type punning issue would be better off as its own self-contained RFC. In particular, I really liked the "Requirements and desirable properties" section of this PR, but the fact that it's 2/3rds of the way down means that a lot of the RFC feels unmotivated or doesn't make much sense until you get there. I'd rather see an RFC where "Motivation" describes the LARGE_INTEGER use case, then "Guide-Level Explanation" is mostly the same text as "Requirements and desirable properties", then "Reference-Level Explanation" lists exactly the requirements/guarantees we're adding to support these use cases and nothing else. Right now I think I know what those requirements are, but I'm not entirely sure because they're scattered throughout the RFC, and I'm not at all clear on whether they're supposed to apply to all unions or just #[repr(C)] unions. If we do that, the smaller changes like allowing empty unions, .. syntax, making trivially-destructible assignments safe, the clarification that any union field becoming (un)initialized makes all sibling fields (un)initialized, and the clarification that dynamically-sized types are not allowed in unions (those are all the smaller changes in this PR I'm confident are intentional) seem like they'd only be a few dozen lines in total, so we can probably handle that all of that as a single amendment. As far as I know, the only objection to any of them is the semver compatibility hazard pointed out on the union tracking issue, which seems like something we can hash out here. Smaller questions, some of which may become irrelevant if we do break this up:
|
This comment has been minimized.
This comment has been minimized.
I feel bad about how little attention this RFC has gotten from me in the recent months. I still need to incorporate the feedback given so far and address unresolved questions. |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov Honestly, I do feel like the feedback about this feeling like a full rewrite in a way that makes it hard to compare to the original is accurate. It's difficult to walk through this RFC and figure out what has changed, what is the same but clarified (e.g. things you discovered while implementing, or nuances that came up later), and what is just rewritten/reworded without any clear rationale for doing so. I feel like those are three independent goals, and doing them all in one giant change/rewrite makes it a challenge to review. I do absolutely think there's value in clarifying/strengthening the specification to match the code; that's one goal. I also think there's value in adding additional union features that haven't yet been implemented; that seems like it should be a separate step, and reviewed separately. |
Ixrec
referenced this pull request
Aug 10, 2017
Merged
Unnamed fields of struct and union type #2102
mystor
referenced this pull request
Oct 31, 2017
Merged
Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads #2195
nikomatsakis
added
the
I-nominated
label
Jan 25, 2018
This comment has been minimized.
This comment has been minimized.
|
I'm nominating this for discussion in the @rust-lang/lang meeting. I propose that we assign someone to do a deep read of this document and summarize the pros/cons. @petrochenkov, I'm sorry for the lack of attention this RFC has gotten. If it helps, I think the primary problem is that you did too good of a job! I printed it out once and found that it was going to take me some time to digest and never got back to it. |
This comment has been minimized.
This comment has been minimized.
|
@rfcbot fcp postpone We talked this over in the @rust-lang/lang meeting, and the sense was that this RFC is a great start, but it's been long enough that we ought to revisit the topic. In particular, the presentation of the RFC as edits and so forth makes it hard to understand precisely what is being proposed, so it might make sense to try and write a "from scratch" version that lays out the underlying philosophical approach to unions (about which there was some debate, iirc) and then digs more down into the details. |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 26, 2018
•
|
Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
rfcbot
added
the
proposed-final-comment-period
label
Jan 26, 2018
This comment has been minimized.
This comment has been minimized.
This text is mostly "from scratch" already, but I guess I'll just drop all the fragments from the old text and make it a new numbered RFC. |
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 31, 2018
|
|
2 similar comments
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 31, 2018
|
|
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Jan 31, 2018
|
|
rfcbot
added
final-comment-period
and removed
proposed-final-comment-period
labels
Jan 31, 2018
This comment has been minimized.
This comment has been minimized.
|
Oh wow, I had no idea there's a discussion about type punning happening for, like, a year now! This is clearly unsafe-code-guidelines related but somehow it didn't appear on my radar, sorry for that. My two cents: AFAIK, union type punning restrictions in C(++) are closely tied to type-based alias analysis. Essentially, there are two ways to get a "bad" pointer (pointing to data "of the wrong type") In C: Pointer manipulations (like casting or arithmetic), and unions. Both have to be restricted to make type-based alias analysis legal. Rust doesn't (want to) use TBAA, so we shouldn't need any magic type punning rules. |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung that seems entirely reasonable to me - although it's more equivalent to |
aturon
removed
the
I-nominated
label
Feb 8, 2018
This comment has been minimized.
This comment has been minimized.
rfcbot
commented
Feb 10, 2018
|
The final comment period is now complete. |
This comment has been minimized.
This comment has been minimized.
|
Closing as postponed. Thanks again @petrochenkov for your extensive work here; we look forward to a revised RFC! |
petrochenkov commentedFeb 12, 2017
•
edited
The existing implementation, guarantees and tradeoffs are described in more detail.
Future directions are outlined.
Few corner cases are permitted (empty unions, union patterns with zero fields and
..).This RFC also provides documentation necessary for stabilization of the feature.
The "Overview" section can be copy-pasted into the book and "Detailed design" into the reference.
Rendered