Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upUntagged unions (tracking issue for RFC 1444) #32836
Comments
nikomatsakis
added
B-RFC-approved
T-lang
B-unstable
labels
Apr 8, 2016
This comment has been minimized.
This comment has been minimized.
|
I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the
|
This comment has been minimized.
This comment has been minimized.
|
@sfackler My current understanding is that |
This comment has been minimized.
This comment has been minimized.
|
So an assignment to a variant is like an assertion that the field was previously "valid"? |
This comment has been minimized.
This comment has been minimized.
|
@sfackler For |
This comment has been minimized.
This comment has been minimized.
ohAitch
commented
Apr 8, 2016
|
Should a &mut union with Drop variants be a lint? On Friday, 8 April 2016, Scott Olson notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
On April 8, 2016 3:36:22 PM PDT, Scott Olson notifications@github.com wrote:
I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid. |
This comment has been minimized.
This comment has been minimized.
|
Yeah, that approach seems a bit less dangerous to me as well. |
This comment has been minimized.
This comment has been minimized.
|
Not dropping when assigning to a union field would make It's not a new problem, either; |
This comment has been minimized.
This comment has been minimized.
|
I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so. |
This comment has been minimized.
This comment has been minimized.
|
I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent. |
This comment has been minimized.
This comment has been minimized.
ohAitch
commented
Apr 9, 2016
|
I don't intend to use mutable references to unions, and probably On Friday, 8 April 2016, Peter Atashian notifications@github.com wrote:
|
This comment has been minimized.
This comment has been minimized.
|
Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.) |
This comment has been minimized.
This comment has been minimized.
|
And I'd like to reiterate that (NB: the drop doesn't happen when But I support having a default warning against |
This comment has been minimized.
This comment has been minimized.
|
@tsion this is not true for fn main() {
let mut x: (i32, i32);
x.0 = 2;
x.1 = 3;
}(though trying to print |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience. But I'm not sure I see the relevance of that example. Why is what I said not true for Say, if fn main() {
let mut x: (Box<i32>, i32);
x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
x.0 = Box::new(3); // x.0 destructor is called before writing new value
} |
This comment has been minimized.
This comment has been minimized.
|
Maybe just lint against that kind of write? |
This comment has been minimized.
This comment has been minimized.
|
My point is only that On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:
|
This comment has been minimized.
This comment has been minimized.
|
It runs the destructor if the drop flag is set. But I think that kind of write is confusing anyway, so why not just forbid it? You can always do |
This comment has been minimized.
This comment has been minimized.
@nikomatsakis I already mentioned that:
But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered. |
This comment has been minimized.
This comment has been minimized.
|
Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good. |
This comment has been minimized.
This comment has been minimized.
Daggerbot
commented
Apr 27, 2016
•
|
Should |
This comment has been minimized.
This comment has been minimized.
|
There is a valid use case for using a union to implement a |
This comment has been minimized.
This comment has been minimized.
|
As well as invoking such code manually via |
This comment has been minimized.
This comment has been minimized.
RumataEstor
commented
Jun 21, 2016
|
To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined. Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected. |
This comment has been minimized.
This comment has been minimized.
|
I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally. |
This comment has been minimized.
This comment has been minimized.
|
We don't think we have to pay that cost for unions since the bag-of-bits model doesn't give any new opportunities compared to enum-with-unknown-variant model. |
This comment has been minimized.
This comment has been minimized.
|
The property in question here is at least as much of a burden for unsafe code to uphold as it is a safeguard. There's no static analysis which can prevent all mistakes that could break this property since we do want to use unions for unsafe type punning1, so "enum with unknown variant" really means code handling unions has to be super careful with how it writes to the union or risk instant UB, without really reducing the unsafety involved in reading from the union, since reading already requires knowing (through channels the compiler doesn't understand) that the bits are valid for the variant you're reading. We can only actually warn users about a union that isn't valid for any of its variants is when running under miri, not in the vast majority of cases where it happens at runtime. 1 For example, assuming tuples are repr(C) for simplicity, |
This comment has been minimized.
This comment has been minimized.
Hey, that's my example :)
That's the point of RFC 1897's model, static checking ensures that no safe operation (like assignment or partial assignment) can turn the union into invalid state, so you don't need to be super careful all the time and don't get instant UB. On the other hand, without move checking, union can be put into invalid state very easily. let u: Union;
let x = u.field; // UB |
This comment has been minimized.
This comment has been minimized.
You can automatically recognize some kinds of writes as not violating the extra invariants imposed on unions, but it's still extra invariants that need to be upheld by writers. Since reading is still unsafe and requires manually ensuring that the bits will be valid for the variant that's read, this doesn't actually help readers, it just makes writers' lifes harder. Neither "bag of bits" nor "enum with unknown variant" helps solve the hard problem of unions: how to ensure it actually stores the kind of data you want to read. |
This comment has been minimized.
This comment has been minimized.
|
How would the fancier type-checking affect Dropping? If you create a union then pass it to C, which takes ownership, will rust try to free the data, perhaps causing a double-free? Or would you always implement edit it would be way cool if unions were like "enums where the variant is checked statically at compile time", if I've understood the suggestion edit 2 could unions start off as a bag of bits and then later allow safe access whilst being backwards-compatible? |
This comment has been minimized.
This comment has been minimized.
If we decide we want this to be valid, I think @oli-obk should update miri's checks to reflect that -- with #51361 merged, it would be rejected by miri. @petrochenkov The part I do not understand is what this buys us. We get extra complexity, in terms of implementation (static analysis) and usage (user still needs to be aware of the exact rules). This extra complexity adds up to fact that when unions are used, we are already in an unsafe context so things are naturally more complex. I think we should have a clear motivation for why this extra complexity is worth it. I do not consider "it violates the spirit of the language somewhat" to be a clear motivation. The one thing I can think of is layout optimizations. In a "bag of bits" model, a union has no niche, ever. However, I feel that is better addresses by giving the programmer more manual control over the niche, which would also be useful in other cases. |
This comment has been minimized.
This comment has been minimized.
|
I think I am missing something fundamental here. I agree with @rkruppe that
the hard problem with unions is making sure that the union currently stores
the data that the program wants to read.
But AFAIK this problem cannot be solved “locally” by static analysis. We
would at least nead whole program analysis, and even then it would still be
a hard problem to solve.
So... is there a solution for this problem on the table? Or, what does the
exact solutions being proposed actually buy us? Say I get an union from C,
without analyzing the whole Rust and C program, what can the proposed
static analyses actually guarantee for readers?
|
This comment has been minimized.
This comment has been minimized.
|
@gnzlbg I think the only guarantee we'd get is what @petrochenkov wrote above
Your proposal does not protect against bad reads either, I don't think that's possible. Also, I imagined some very basic "initialized" tracking along the lines of "writing to any field initializes the union". We'd need something anyway when
|
This comment has been minimized.
This comment has been minimized.
|
I think there's value in the bare-minimum move checking to see if a union is initialized. The original RFC explicitly specified that initializing or assigning to any union field makes the whole union initialized. Beyond that, though, rustc should not try to infer anything about the value in a union that the user doesn't explicitly specify; a union may contain any value at all, including a value that isn't valid for any of its fields. One use case for that, for instance: consider a C-style tagged union that's explicitly extensible with more tags in the future. C and Rust code reading that union must not assume it knows every possible field type. |
This comment has been minimized.
This comment has been minimized.
|
Perhaps I should start from the other direction. Should this code work 1) for unions 2) for non-unions? let x: T;
let y = x.field;For me the answer is obvious "no" in both cases, because this is a whole class of errors that Rust can and want to prevent, regardless of "union"-ness of This means move checker should have some kind of scheme in accordance to which it implements that support. Given that move checker (and borrow checker) generally work in per-field fashion, the simplest scheme for unions would be "same rules as for structs + (de)initialization/borrow of a field also (de)initializes/borrows its sibling fields". Then, the enum model is simply a consequence of the static checking described above + one more condition. This case from @joshtriplett , for example
would be much clearer for people reading code if the union explicitly had an extra field for "possible future extensions". Of course, we can keep the basic static initialization checking, but reject the second condition and allow writing arbitrary possibly invalid data to the union through some unsafe "third party" means without it being instant UB. Then we wouldn't have that dynamic people-targeted guarantee anymore, I just think that would be a net loss. |
This comment has been minimized.
This comment has been minimized.
Agreed, this level of checking for uninitialized values seems reasonable, and quite feasible.
Agreed so far, assuming I understand the rules for structs.
That additional condition isn't valid for unions.
That's not how C unions work, nor how Rust unions were specified to work. (And I'd question whether it'd be clearer, or simply whether it matches a different set of expectations.) Changing this would make Rust unions no longer fit for some of the purposes for which they were designed and proposed.
Those 'unsafe "third party" means' include "getting a union from FFI", which is a completely valid use case. Here's a concrete example: union Event {
event_id: u32,
event1: Event1,
event2: Event2,
event3: Event3,
}
struct Event1 {
event_id: u32, // always EVENT1
// ... more fields ...
}
// ... more event structs ...
match u.event_id {
EVENT1 => { /* ... */ }
EVENT2 => { /* ... */ }
EVENT3 => { /* ... */ }
_ => { /* unknown event */ }
}That's completely valid code that people can and will write using unions. |
This comment has been minimized.
This comment has been minimized.
Fine for me.
Woah. The struct rules make sense because they are all based on the fact that different fields are disjoint. You can't just invalidate that basic assumption and still use the same rules. The fact that you need an addendum to the rules show that. I would never expect unions to be checked similar to structs. If anything, one might expect them to be checked similar to enums -- but of course that cannot work, because enums can only be accessed via match.
I think it is extremely desirable for the basic validity assumptions to be dynamically checkable (given type information). Then we can check them during CTFE in miri, we can even check them during "full" miri runs (e.g. of a test suite), we can eventually have some kind of sanitizer or maybe a mode where Rust emits I can hardly overstate how important I think it is to have dynamically checkable rules. I think we should aim to have 0 uncheckable cases of UB. (We're not there yet, but it's the goal we should have.) That is the only responsible way to have UB in your language, everything else is a case of compiler/language authors making their life easier at the expense of everyone who has to live with the consequences. (I am currently working on dynamically checkable rules for aliasing and raw pointer accesses.) That said, I see no fundamental reason why this should not be checkable: For every byte in the union, go over all variants to see which values are allowed for that byte in this variant, and take the union (heh ;) ) of all of those sets. A sequence of bytes is valid for a union if every byte is valid according to this definition.
What does that guarantee buy us? Where does it actually help? Right now, all I see is that everyone has to work hard and be careful to uphold it. I don't see the benefit we, the people, get out of that.
The model proposed by @petrochenkov allows those usecases, by adding a |
This comment has been minimized.
This comment has been minimized.
A clarification: I meant uncheckable in "by default"/"in release mode", of course it can be checkable in "slow mode" with some extra instrumentation, but you already wrote about this better than I could. |
This comment has been minimized.
This comment has been minimized.
Yes, I understood that that was the proposal.
They could, but they'd have to systematically add it to every single union. I have yet to see an argument for why it makes sense to break primary use cases of unions in favor of some unspecified use case that depends on limiting what bit patterns they can contain. |
RalfJung
referenced this issue
Jul 23, 2018
Merged
Do a basic sanity check for all constant values #51361
petrochenkov
referenced this issue
Jul 29, 2018
Closed
dropck unsoundness: unions are ignored #52786
This comment has been minimized.
This comment has been minimized.
It's not obvious to me at all why this is the primary use case. |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov I didn't say "break the primary use case", I said "break primary use cases". FFI is one of the primary use cases of unions. |
This comment has been minimized.
This comment has been minimized.
There's certainly an attractive obviousness to a statement that "the possible values of a union are the union of the possible values of all its possible variants"... |
This comment has been minimized.
This comment has been minimized.
|
True. However, that's not the proposal -- we all agree that the following should be legal: union F {
x: (u8, bool),
y: (bool, u8),
}
fn foo() -> F {
let mut f = F { x: (5, false) };
unsafe { f.y.1 = 17; }
f
}Actually I think it is a bug that this even requires So, the union has to be taken bytewise, at least. |
This comment has been minimized.
This comment has been minimized.
I don't know about the new MIR-based unsafety-checker implementation, but in the old HIR-based one it was certainly a checker limitation/simplification - only expressions of the form |
This comment has been minimized.
This comment has been minimized.
|
Answering the comment in #52786 (comment): So the idea is that compiler still doesn't know anything about the I'm not sure though how exactly the part of First of all, regardless of fields being private or public, unexpected values cannot be written directly through those fields. You need something like a raw pointer, or code on the other side of FFI to do it, and it can be done without any field access, just by having a pointer to the whole union. So we need to approach this from some other direction than access to a field being restricted. As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant. This means that if a union has a single private field, then its implementer (but not compiler) can assume that no third party will write an unexpected value into that union. If some union wants to prohibit unexpected values while still providing @RalfJung How scenarios like this are treated? mod m {
union MyPrivateUnion { /* private fields */ }
extern {
fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
}
} |
This comment has been minimized.
This comment has been minimized.
No, that is not what I meant. There are multiple invariants. I do not know how many we will need, but there will be at least two (and I don't have great names for them):
It would be nice to align these two concepts, but I do not think it is practical. First of all, for some types (function pointers, dyn traits), the definition of the custom, semantic invariant actually uses the definition of UB in the language. This definition would be circular if we wanted to say that it is UB to ever violate the custom, semantic invariant. Secondly, I'd prefer if the definition of our language, and whether a certain execution trace exhibits UB, was a decidable property. Semantic, custom invariants are frequently not decidable.
Essentially, when a type chooses its custom invariant, it has to make sure that anything that safe code can do preserves the invariant. After all, the promise is that just using this type's safe API can never lead to UB. This is applies to both structs and unions. One of the things safe code can do is access public fields, which is where this connection comes from. For example, a public field of a struct cannot have a custom invariant that is different from the custom invariant of the field type: After all, any safe user could write arbitrary data into that field, or read form the field and expect "good" data. A struct where all fields are public can be safely constructed, placing further restrictions on the field. A union with a public field... well that's somewhat interesting. Reading union fields is unsafe anyway, so nothing changes there. Writing union fields is safe, so a union with a public field has to be able to handle arbitrary data which satisfies that field's type's custom invariant being put into the field. I doubt this will be very useful... So, to recap, when you choose a custom invariant, it is your responsibility to make sure that foreign safe code cannot break this invariant (and you have tools like private fields to help you achieve this). It is the responsibility of foreign unafe code to not violate your invariant when that code does something safe code could not do.
Correct. (panic-safety is a concern here but you are probably aware). This is just like, in let sz = self.size;
self.size = 1337;
self.size = sz;and there is no UB. mod m {
union MyPrivateUnion { /* private fields */ }
extern {
fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
}
}In terms of the syntactic layout invariant, |
This comment has been minimized.
This comment has been minimized.
|
I finally wrote that blog post about whether and when |
SimonSapin
referenced this issue
Oct 17, 2018
Open
Tracking issue for RFC 2514, "Union initialization and Drop" #55149
SimonSapin
referenced this issue
Nov 25, 2018
Merged
Use a union to avoid UB with uninitialized &mut T #6
jacobrosenthal
referenced this issue
Dec 14, 2018
Open
When to go 1.0 and what edition should 1.0 target? #6
This comment has been minimized.
This comment has been minimized.
|
Is there anything left to track here that’s not already covered by #55149, or should we close? |
nikomatsakis commentedApr 8, 2016
•
edited
Tracking issue for rust-lang/rfcs#1444.
Unresolved questions:
Copyfor a union? For example, what if some variants are of non-Copy type? All variants?Open issues of high import: