Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upICE when assigning to a union with fields implementing Drop #41073
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Should this drop |
This comment has been minimized.
This comment has been minimized.
|
That println is distracting, so I reworded the example a bit: {
let mut test = Test { a: A(3) }; // test.a is initialized -> test.b is initialized, test is initialized
{
let tmp_b = test.b; // test.b is uninitialized (moved out) -> test.a is uninitialized, test is uninitialized
// tmp_b is initialized
} // tmp_b is dropped, B is printed
test.b = B(0.5); // test.b is initialized -> test.a is initialized, test is initialized
// test.b is not dropped on assignment because it was uninitialized
} // test is dropped (noop) |
This comment has been minimized.
This comment has been minimized.
|
And if you move out a subfield? #![feature(untagged_unions)]
union U {
x: (Box<u32>, Box<u32>),
y: Vec<u32>,
raw: [u64; 4]
}
fn main() {
let mut u = U { raw: [0; 4] };
unsafe {
drop(u.x.1);
if true {
u.y = Vec::new(); // what does this drop?
} else if true {
u.x = (Box::new(0), Box::new(1));
} else {
u.x.0 = Box::new(0); // and this?
}
}
} |
This comment has been minimized.
This comment has been minimized.
#![feature(untagged_unions)]
union U {
x: (Box<u32>, Box<u32>),
y: Vec<u32>,
raw: [u64; 4]
}
fn main() {
// init u.raw -> init (u.x, u.y) -> init u
let mut u = U { raw: [0; 4] };
unsafe {
// uninit u.x.1 -> uninit u.x -> uninit (u.y, u.raw) -> uninit u
// u.x.0 is still initialized
drop(u.x.1);
if true {
// init u.y -> init (u.x, u.raw) -> init u
// u.y was uninitialized, not dropped on assignment
u.y = Vec::new(); // what does this drop?
} else if true {
// init u.x -> init (u.y, u.raw) -> init u
// u.x was uninitialized, not dropped on assignment
u.x = (Box::new(0), Box::new(1));
} else {
// assign u.x.0
// u.x.0 was initialized, dropped on assignment
u.x.0 = Box::new(0); // and this?
}
}
}I tried to describe the scheme in https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state and https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#move-and-initialization-checking |
This comment has been minimized.
This comment has been minimized.
|
That... sounds like something that would be fun to implement. |
arielb1
added
I-ICE
T-compiler
labels
Apr 6, 2017
This comment has been minimized.
This comment has been minimized.
|
I think people will be confused by these run-time semantics. Should we prohibit replacing into partially-initialized unions? |
Mark-Simulacrum
added
the
C-bug
label
Jul 22, 2017
shepmaster
referenced this issue
Mar 12, 2018
Closed
internal compiler error: librustc_mir/transform/elaborate_drops.rs — drop of untracked, uninitialized value #48962
nikomatsakis
added
the
NLL-sound
label
Mar 19, 2018
This comment has been minimized.
This comment has been minimized.
|
Quite likely a duplicate of #48962 |
pnkfelix
added
the
WG-compiler-nll
label
Apr 3, 2018
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis It seems not to be related to #48962 . I could take a look at this |
This comment has been minimized.
This comment has been minimized.
|
When I tried to minimise example I found out that if either of |
This comment has been minimized.
This comment has been minimized.
|
What is the backtrace where the ICE occurs? |
This comment has been minimized.
This comment has been minimized.
|
Though reading the back-and-forth between @petrochenkov and @arielb1, it sounds like there is some confusion also as to what we should do upon assignment to a union field. I guess this has to do with whether to drop the pre-existing data or not (in part because the compiler doesn't have visibility into whether or not the union is initialized). I don't know that we can blame this ICE on NLL per se. |
This comment has been minimized.
This comment has been minimized.
|
This is not an NLL problem - it occurs even without Note that the move semantics were removed from the newest version of the union spec. However, I'm not sure that implementing the specification as written is the best idea - these move semantics interact in a very weird and inconsistent-seeming way with assignments. union U {
a: (Box<u32>, Box<u32>),
b: (Box<u32>, Box<u32>),
}
let mut u = U { a: (Box::new(0), Box::new(1)) };
// u.a & u.b are both initialized
u.b.0 = Box::new(2); // this would call a destructor
drop(u.a.0);
// u.a.0 is uninitialized, u.b is also now uninitialized,
println!("{:?}", u.a.1); // prints `1`
u.a.1 = Box::new(3); // this would call a destructor
// u.b is *still* uninitialized, same as before.
println!("{:?}", u.b.1); //~ ERROR use of moved value
u.b.1 = Box::new(4); // this would *not* call a destructor, and would leak a `Box`I feel that implementing this "as specified" will lead to a situation where it will be hard to predict whether writing to a union field will call a destructor or not. I can think of 2 ways of fixing this inconsistency:
|
arielb1
removed
WG-compiler-nll
NLL-sound
labels
Apr 10, 2018
This comment has been minimized.
This comment has been minimized.
|
cc #32836 - this has to resolved one way or another before |
arielb1
referenced this issue
Apr 10, 2018
Open
Untagged unions (tracking issue for RFC 1444) #32836
This comment has been minimized.
This comment has been minimized.
|
@arielb1 I'm looking at the example you gave, and that seems like unsafe code doing something wrong, by accessing That said: this seems like it only applies to union fields that implement |
This comment has been minimized.
This comment has been minimized.
I'd rather prefer something like 1 with more errors to something like 2 with double frees, to keep "union newtypes" with a single field like |
This comment has been minimized.
This comment has been minimized.
|
FWIW, I still see the behavior of code in #41073 (comment) as logical and "following from the first principles", but I agree that nobody would want to deal with such puzzles in practice, so I'm all for generating more errors for sanity's sake even if they are "artificial" in some sense. |
This comment has been minimized.
This comment has been minimized.
I don't think union "newtypes" with a single field should behave like newtypes at all. Unions are inherently unsafe and only when accessed can we make any assumptions about what they contain. I think an "enum interpretation" is incorrect. Unions were added primarily for FFI purposes, and in C they most certainly do not have an "enum interpretation". For example, there are libraries that use unions as a way to be extensible -- as the library evolves, new union variants are added, and this is binary compatible with old clients that know about fewer variants. (IIRC it was @joshtriplett who brought this up.) A single-variant union just means that this is the only variant we know of, but unless we actually use that variant, we should not assume that this is the variant we are in. TBH the most reasonable behavior for assigning to union fields that I can come up with is to never drop anything. That's at least predictable, and assignment is unsafe after all. I think any scheme that attempts to track union initialization state is bound to backfire. And I am certainly extremely surprised that initializing a field makes its siblings initialized. I would have expected the exact opposite---initializing a union field deinitializes its siblings. Why does the RFC make this strange choice? That's even more surprising if I consider that you want to follow an "enum interpretation", where certainly initializing one variant deinitializes the others. One point I am missing from the RFC: What is the initialization state when a union goes into scope, e.g., when it is passed to a function? |
This comment has been minimized.
This comment has been minimized.
Well, to support transmuting with unions
rust-lang/rfcs#1897 contains a number of examples that serve as motivation for the currently implemented rules, but I never got enough time to push it over finish line.
To pass something (not necessarily a union) to a function you have to borrow or move/copy it and you can't borrow/move/copy something uninitialized, so all function parameters are in initialized state initially. (EDIT: here and above I'm talking about initialization state as it's tracked by the compiler statically (with move checker) and dynamically (with drop flags), not stuff like |
This comment has been minimized.
This comment has been minimized.
Ah, that's a good point. I hadn't thought of this because I did not really accept the "writing to a union field initializes the siblings" rule, but with that rule, this actually makes sense. However, doesn't that mean if I have two types that implement union Foo {
x: String,
y: Box<i32>
}and then I do something like let u = Foo { x: "abc".to_owned() };
u.y = Box::new(42);then everything explodes because this will cause The more I look at this, the more I think that maybe having union Foo {
x: ManuallyDrop<String>,
y: ManuallyDrop<Box<i32>>
}
Yeah, it's a huge mess :/ Also, I wasn't even ware of the extent of this RFC. Clearly I didn't do my research. Sorry for that! I hope to find some time to read it soon, that's lots of text. |
This comment has been minimized.
This comment has been minimized.
Yes, that's why assignments to u.y = Box::new(42); //~ ERROR assignment to non-`Copy` union field requires unsafe function or block |
This comment has been minimized.
This comment has been minimized.
Fair enough. This is still a footgun though, e.g. all this may happen in an |
This comment has been minimized.
This comment has been minimized.
|
I do think that unions with Drop fields are a massive footgun regardless. They do require |
This comment has been minimized.
This comment has been minimized.
|
I am reminded of what I recently said in the context of references to fields of a packed struct:
Notice that assigning to a union field is always safe in C; only reading it can yield UB. |
This comment has been minimized.
This comment has been minimized.
Unions in (standard) C/C++ have much stronger enum interpretation than in Rust, C and C++ have a notion of "active field" of a union and you can't even access non-active fields (i.e. transmute with unions) in general. (Of course, C/C++ have much looser static analysis so you can create and pass around uninitialized unions, but accessing them is still UB.) |
This comment has been minimized.
This comment has been minimized.
AFAIK the following is legal:
You cannot access non-active fields through pointers, but you are generally allowed to access it "through the union type". (The standard is somewhat ambiguous about this, but Wikipedia agrees and GCC explicitly says it is okay.) Above, we were talking about accessing the union directly, i.e. through the union type, in which case C/C++ does not apply restrictions based on the "active variant". The restriction for accessing the "wrong" field through a pointer is necessary to make type-based alias analysis work. I didn't check the old standards, but I'd guess before the aliasing restrictions were introduced, unions were purely access-based as well. |
This comment has been minimized.
This comment has been minimized.
|
@RalfJung rust-lang/rfcs#1897 proposes roughly the same model, i.e. "enum" interpretation is the base, so there's always an active field that's guaranteed to be valid, but you can also access inactive fields at your own risk as an extension. |
This comment has been minimized.
This comment has been minimized.
I actually disagree. I think the enum one exists solely to enable type-based alias analysis (which we decisively do not want in Rust); if it weren't for that, C would be just fine without any notion of an "active variant".
I'm not at all happy about this proposal. This extra hidden state makes things much more complicated. I think a model where unions are just bytes, and then on every read/write we decide what to do with these bytes, is much simpler. For example, imagine you'd want to extend miri to actually be able to fully check violations of these rules for accessing union fields. I think that is a very desirable feature to have---it's not just practically useful (we could use miri to test for UB violations in unsafe code), it also means that it is even possible to have such a checker. IMO the fact that such a checker is likely impossible for C is one of the biggest faults of the language, and a mistake I'd like to not repeat in Rust. Given that we don't want TBAA, I see no reason to adapt an "enum interpretation". It's a much more complicated model, and I don't see any tangible benefits spelled out in the RFC. |
Thinkofname commentedApr 4, 2017
Full backtrace can be seen on the playground
https://play.rust-lang.org/?gist=bc93f5b222b09b31d3fd286cfc10858a&version=nightly&backtrace=2