-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the values of a union type? (in particular, what is the validity invariant of a union) #438
Comments
As I was trying to figure out how to evaluate the possibilities, I started thinking that |
So... basically, let's imagine what could be done if we didn't have to support code like this? I first thought "not much". While validity might be defined as "any of the variants is valid", it becomes harder to define what happens on a typed copy. Consider a union of But there's actually a way to achieve that: we could say that a value of union type is a non-empty set of values of possible variants. So in Another option would be to angelically pick which variant to do the copy at, but I'd rather avoid angelic non-determinism if possible. That definition should give us maximal niche power. However, we also probably don't want it for Also, implementing this in Miri will be quite terrible... |
Could we have different |
I think one goal is for |
What optimizations can we get from For For niches, doing layout optimizations on unions does sound like it would have some plausible value in general, but I expect a significant fraction of union use either doesn't have niches or the union is implementing tag packing itself in which case the user doesn't mind if the union never exposes a niche. Is there more value to exposing niches than permitting enum layout optimizations? |
Is it? I don't think so. It's not my goal, anyway. ;)
When and how do you want safe code to be able to read union fields?
Hm, good question. That would depend on how smart LLVM is about these attributes, but of course we can always look to make it smarter. The problem @scottmcm ran into that made him ask for union |
XD I've mentioned this to you before Ralf. I thought that's what you were already talking about when you said "there is some desire to make unions have validity invariants". Many people want a future rust to support safe union field access when the fields of the union satisfy some sort of specific case that the compiler can automatically check by itself. Whatever is decided for unions now should hopefully allow for that to happen some day. Being intentionally vague, and specifically expecting the reader to interpret the following words in the most positive light, the assumption is usually something like: "if the safe-transmute project completes, and safe-transmute could convert all field types to each other, then that union should obviously be fine to use from safe code."; So something like |
I tried to explore this a bit, but I don't know how successful I was. https://llvm.godbolt.org/z/4K9q9d3Yq If I put a typed load in a function, define i32 @_ZN7example3foo17h02b60a93fb12ffb0E(i32 %0) {
start:
%x = alloca i32, align 4
store i32 %0, ptr %x, align 4
%_2 = load i32, ptr %x, align 4, !noundef !1
ret i32 %_2
} That just gets completely optimized out today, as far as I can tell define i32 @_ZN7example3foo17h02b60a93fb12ffb0E(i32 returned %0) {
ret i32 %0
} Though Alive says it would be legal for it to mark the return type as So I don't know if the typed load would be sufficient in practice to get |
@Lokathor IIRC my stance in that discussion was that that isn't even a validity invariant question, it is primarily a safety invariant question. Or at least, it is certainly the case that safety invariants are sufficient to achieve what you are asking for, since it's all about "how can we soundly expose operation X to safe code". So to make this a validity invariants requires stronger motivation.
I was mostly thinking of @scottmcm, and of people asking for niches in unions. @scottmcm that's not quite how the code would look, would it? If this is a getter Was there a specific optimization you were looking for in the iterator code that went missing, or was it more a general concern without concrete examples? |
When loading data at union type from one place and storing it in another -- what exactly is being required about the bytes, and what is preserved? In other words, what is the representation relation of a union type, and what is even the set of values that can be generated by a union-typed load?
Originally I intended the answer to be "the values are just lists of raw bytes as large as the union". IOW, a union has no validity requirements whatsoever, and all data inside it is perfectly preserved. However, reality disagrees. With the C ABI, some of the bytes in a union value do not always get preserved -- there can be padding. Also there is some desire to make unions have validity invariants, so that e.g. a
union { usize, *const T }
isnoundef
.On the other hand, it is certainly not the case that the value of a union is "a value of one of its fields". Apart from the problem about writing
decode
for such a specification, it can be violated in safe code.Note that for the purpose of this, we do assume that the LLVM type
iN
will perfectly preserve N bytes of memory. LLVM code generated from Rust can never put poison in memory (as that would not be preserved on a per-byte level, the entireiN
gets "infected" if any byte is poison), which is currently accurate (Rust code has UB for any situation where the generated LLVM code might produce poison). LLVMiN
values carrying provenance is incoherent but mostly works in practice and LLVM does not offer us a better option. We need a "byte" type to properly express our intent to LLVM; until that exist, we make do with what we have. If you want to discuss LLVM issues, please open a new thread -- this one here is solely about the Rust AM semantics.Possible designs
MiniRust decides that a value of union type is a list of "chunks" where the bytes are perfectly preserved, but there can be gaps between those chunks where data is lost of a copy. The extent of these chunks can be computed from a Rust union type layout. That is sufficient to model the reality of union padding. The effective validity invariant here is still "any list of bytes allowed".
The
noundef
property could be achieved if we refine this "chunk" idea a bit further: every byte in a union is either "discarded", "preserved", or "initialized". Bytes which are padding in all variants get "discarded"; bytes which are padding only in some variants get "preserved"; bytes which are "in the data part" for all variants get "initialized". A value of union type is a list of bytes the size of the union with the constraint that "discarded" bytes must beUninit
and "initialized" bytes must not beUninit
. The validity invariant requires that all "initialized" bytes must be initialized -- and that's it; "discarded" bytes get flushed toUninit
on adecode
. Then the Rust typeunion { usize, *const T }
would translate to a union type where all bytes are "initialized", and we could emitnoundef
attributes for such a type.(I don't want multiple distinct AM values to be equivalent, hence the 'value of union type' predicate demands that all "uninit" bytes are truly
Uninit
, but of course the validity invariant allows those bytes to be anything. Note that the validity invariant is a predicate over "list of bytes", whereas "what are the values of this type" is a predicate overValue
. The chunks idea achieves this by just not having the padding bytes in theValue
; we could do the same here but that would be formally more awkward to express I think. Maybe I should just link to a MiniRust patch here, that might be more clear than trying to say this in English.^^)We could in principle go further and tro to achieve something along the lines of "if a byte is non-null in each variant, it is also non-null in the union", but that becomes increasingly complicated -- in particular it requires the "generate union type description from Rust type" logic to encode more and more knowledge about validity invariants, when the entire intent of this "value representation" business was that the validity invariant is implicitly encoded by the representation relation.
What other designs should we consider? And are they worth the complexity?
The text was updated successfully, but these errors were encountered: