Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMatching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. #47412
Comments
nagisa
added
the
T-lang
label
Jan 13, 2018
This comment has been minimized.
This comment has been minimized.
|
Seems like unions elements ought to be inhabited. |
nikomatsakis
added
I-unsound 💥
I-nominated
T-compiler
labels
Jan 13, 2018
nikomatsakis
referenced this issue
Jan 13, 2018
Open
Untagged unions (tracking issue for RFC 1444) #32836
nikomatsakis
changed the title
Safe accesses to copy union fields allow invoking UB in safe code
Safe accesses to uninhabited (but Copy) union fields allow invoking UB in safe code
Jan 13, 2018
This comment has been minimized.
This comment has been minimized.
|
@nagisa points out that we can't tell if generic types are inhabited, so maybe such a fix is not viable. |
This comment has been minimized.
This comment has been minimized.
|
cc @rust-lang/lang -- a bit of a tricky thing to decide what we should disallow here, though I'm leaning towards "safe access to union fields", or at least restricting the cases further (e.g., to those cases where know more than copy, but also inhabited) |
This comment has been minimized.
This comment has been minimized.
|
Also: @eternaleye points out uninitialized values (e.g., on itanium) may trap if you read from them, and this union RFC would seem to allow access to them from safe code. Leaning more and more towards "union fields should never be safe to access". =) |
This comment has been minimized.
This comment has been minimized.
|
We've known for a long time that a bitcast between two types (with the same number of bits) is safe iff all possible bitpatterns of each of the two types are valid and they correspond to distinct values. |
This comment has been minimized.
This comment has been minimized.
|
Is this actually UB, and should it be? EDIT: Ah, nevermind, I understand. I tried this on the playground, and the entire function compiles down to a single undefined instruction ( Also, I don't recall us ever making union field reads safe. Does this somehow work outside an unsafe block because the match doesn't actually do any pattern-matching? It seems like the reference to (Also, good catch, @nagisa!) |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett: It's not that "accessing" it is UB; it's that accessing it produces a value whose mere existence is UB, because |
This comment has been minimized.
This comment has been minimized.
|
@eternaleye Thanks; yeah, took staring at it for a while to realize that was the problem here. |
This comment has been minimized.
This comment has been minimized.
|
Note that you can also get UB with fn main() {
union A { a: u8, v: u16 }
let a = A { a: 1 };
match a.v {
_ => println!("Congrats, it's a u16!")
}
}as this reads one of the |
This comment has been minimized.
This comment has been minimized.
|
@eternaleye That code won't compile; it'll generate After experimenting with this a fair bit, it looks like Attempting to compile @nagisa's original example should produce the same error E0133 at compile time. |
This comment has been minimized.
This comment has been minimized.
|
Wait, I'm getting errors for reading, but @nagisa's example doesn't do this (pattern-matching isn't by-value unless there is a pattern that needs it to be). |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett Yeah, I just noticed that - I'd been taking at face value that @nagisa's example was reading out. |
This comment has been minimized.
This comment has been minimized.
|
OK, so perhaps the problem is more narrow. |
This comment has been minimized.
This comment has been minimized.
|
In particular, matches with empty arms are somewhat special -- they act as an "assertion" of sorts that the path in question is valid. This probably means we forgot to account for that as a kind of read. UPDATE: Some discussion on IRC where I spell out a bit more of the background |
This comment has been minimized.
This comment has been minimized.
|
I think this is just a corner case that we didn't catch (or have a test for) in the union implementation, namely, that a match on a union field with no patterns wasn't treated as unsafe. (That said, on the off chance someone was relying on this, such as via some kind of generic code and macros, when we fix it we should probably do a crater run.) Here's a test case that should not compile:
It currently compiles and prints "should not be allowed"; it should not compile at all. |
This comment has been minimized.
This comment has been minimized.
|
The #![feature(untagged_unions)]
fn main() {
enum Void {}
union A { a: (), v: Void }
let a = A { a: () };
match a.v {
}
} |
This comment has been minimized.
This comment has been minimized.
|
It looks like this was changed between nightly-2017-09-23 and nightly-2017-09-29. Maybe caused by #44700? |
This comment has been minimized.
This comment has been minimized.
|
@joshtriplett Not just "no arms", but "no by-value arms" - this should also require fn main() {
union A { a: u8, v: u16 }
let a = A { a: 1 };
match a.v {
_ => println!("Congrats, it's a u16!")
}
} |
This comment has been minimized.
This comment has been minimized.
|
@eternaleye I'm fine with making that unsafe temporarily, however, it's not clear that it will eventually have to be. In particular, I think that fn main() {
let x = Box::new(22);
match x {
_ => { }
}
}As I wrote on IRC, I believe matches with no arms have to be considered somewhat special here. Admittedly this needs to be written up more formally and documented. |
This comment has been minimized.
This comment has been minimized.
|
Another way to look at it: with a match with no arms, there is nowhere to branch to! So if that code is ever reached, that is UB. But with a single |
This comment has been minimized.
This comment has been minimized.
|
Oh, wait, my example is bogus =) and actually the example I thought would compile didn't: fn main() {
let x = Box::new(22);
drop(x);
match x {
_ => { }
}
}Nonetheless, I think this can come up. I'll play around some more. =) |
This comment has been minimized.
This comment has been minimized.
|
OK, so, in the MIR-based borrowck, these examples do work as I expected: #![feature(nll)]
fn main() {
let pair = (Box::new(22), Box::new(22));
drop(pair);
match pair {
_ => { }
}
}):I will try to write up a more thorough "proposal" of some kind regarding this validity predicates. I've tried in the past but each time I get stuck trying to figure out how much background to give. |
This comment has been minimized.
This comment has been minimized.
|
Reflecting some discussion from IRC back here: my proposal to address this is that naming a union field in a match should always require an unsafe block, even if the match doesn't name the field value or apply any patterns to the field value. That includes only having a |
This comment has been minimized.
This comment has been minimized.
|
So, to clarify something that @joshtriplett alluded to but didn't make explicit: There are two interesting questions to clarify. At what point do we have UB, and when is unsafety required? Clearly, unsafety must be required for any case that could cause UB, but it may also be required more broadly. I think it's reasonable to require unsafe more broadly, especially to start. But I think we should also write up and nail down the cases where UB could occur. And I think we may find value in helping the user identify the intersection and calling special attention to those cases where UB could actually occur. |
This comment has been minimized.
This comment has been minimized.
|
@petrochenkov Does regression-from-stable-to-nightly apply here? The bug now exists in current stable. |
petrochenkov
added
regression-from-stable-to-stable
and removed
regression-from-stable-to-nightly
labels
Jan 13, 2018
This comment has been minimized.
This comment has been minimized.
|
Oops, wrong label. |
This comment has been minimized.
This comment has been minimized.
There is no MIR, even dead code, to generate for |
This comment has been minimized.
This comment has been minimized.
|
@eddyb wrote:
As part of fixing #27282 I am currently experimenting with adding MIR constructs that represent "start a (pseudo-)borrow of the discriminant for a Its possible we might leverage that work to represent the accesses in question here. |
This comment has been minimized.
This comment has been minimized.
|
triage: P-high Well, this is a regression. We ought to fix it. Assigning to pnkfelix and myself to figure out how to get this fixed. |
rust-highfive
added
P-high
and removed
I-nominated
labels
Jan 25, 2018
nikomatsakis
assigned
nikomatsakis and
pnkfelix
Jan 25, 2018
This comment has been minimized.
This comment has been minimized.
|
@nox just demonstrated this by using EDIT: can't we just always add just a dummy Lines 182 to 188 in 29c8276 And the new |
This comment has been minimized.
This comment has been minimized.
(Wait, why would that raw pointer dereference be allowed in safe code? Or is this an analogy for the effect that was achieved, and not itself the code that was used to to do it?) |
This comment has been minimized.
This comment has been minimized.
|
@glaebhoerl Because the check is done on MIR, and this entire issue is about the dereference/union field access not ending up in MIR because it's never read from/written to by |
This comment has been minimized.
This comment has been minimized.
|
(Ah I see, I was looking for the union field access in there and wasn't following the details closely enough to see the analogy.) |
This comment has been minimized.
This comment has been minimized.
I think that's basically what we need to add, yes. At least, it'd be good to do this for now, and maybe revisit later if we want to think about a more "elegant" fix. |
nikomatsakis
assigned
eddyb
and unassigned
nikomatsakis and
pnkfelix
Feb 8, 2018
This comment has been minimized.
This comment has been minimized.
|
Assigning to @eddyb to do something for now to close the gaping hole. |
This comment has been minimized.
This comment has been minimized.
|
@nikomatsakis Small problem, that approach also doubles up some MIR borrowck errors, e.g.: let mut x = 0;
let r = &mut x;
match *x { ... }
*r += 1;Because of the added dummy access, |
eddyb
changed the title
Safe accesses to uninhabited (but Copy) union fields allow invoking UB in safe code
Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code.
Feb 9, 2018
eddyb
referenced this issue
Feb 9, 2018
Merged
rustc_mir: insert a dummy access to places being matched on, when building MIR. #48092
This comment has been minimized.
This comment has been minimized.
|
@eddyb seems ok for now |
nagisa commentedJan 13, 2018
With the following code
it is possible to invoke undefined behaviour in safe code without using unstable features.