Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparent Unions and Enums #2645

Open
wants to merge 26 commits into
base: master
from

Conversation

Projects
None yet
@mjbshaw
Copy link

mjbshaw commented Feb 25, 2019

Let's allow #[repr(transparent)] on unions and univariant enums that have exactly one non-zero-sized field (just like structs).

Rendered.

Pre-RFC Discourse discussion on internals.rust-lang.org.

@mjbshaw mjbshaw changed the title Transparent Unions RFC Transparent Unions Feb 25, 2019

Show resolved Hide resolved text/0000-transparent-unions.md
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated
Show resolved Hide resolved text/0000-transparent-unions.md Outdated

Centril and others added some commits Feb 25, 2019

Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
Update text/0000-transparent-unions.md
Co-Authored-By: mjbshaw <github@mjb.io>
@mjbshaw

This comment has been minimized.

Copy link
Author

mjbshaw commented Feb 26, 2019

Thanks for the review, @Centril! I'm still working on getting to those last remaining items you mentioned, but in the meantime I implemented the feature (including for univariant enums). It probably needs some fixing, and definitely needs some tests, but it can serve as a starting point for what the feature might look like in real life.

@mjbshaw mjbshaw changed the title Transparent Unions Transparent Unions and Enums Feb 27, 2019

@Centril Centril added the A-enum label Feb 27, 2019

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 17, 2019

Thanks, it's good to know what LLVM currently does.

@mjbshaw

This comment has been minimized.

Copy link
Author

mjbshaw commented Mar 17, 2019

When an indeterminate value participates in an lvalue-expression, the behavior is undefined (EDIT: C11 6.2.6.1p5). Calling a function with an indeterminate value makes it participate in an lvalue expression.

That's the C standard, not the ABI. Hand-written assembly can follow the C ABI of a platform and still exhibit well-defined behavior when using an indeterminate value. But yes, passing an indeterminate value across FFI to code that isn't prepared for it can easily result in UB.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 17, 2019

@mjbshaw

Hand-written assembly can follow the C ABI of a platform and still exhibit well-defined behavior when using an indeterminate value.

I believe this is correct.

passing an indeterminate value across FFI to code that isn't prepared for it can easily result in UB.

I'm not sure.

The question is whether the Rust toolchain can assume that the code being interfaced with via extern "C" is actually C, or not.

I think Rust should be able to assume that. If the assumption does not hold, e.g., because the code was Rust or C++, then we should write extern "Rust" or extern "C++" instead.

That is, I believe the Rust toolchain should be able to optimize extern "C" FFI code as if the code being interfaced with is C.

Otherwise we have a big problem today, because we already support -C cross-lang-lto, which allows at link-time const propagation across FFI boundaries. That is, if we call a C function with an indeterminate value from Rust, the C optimizer is today, already able to see it and optimize accordingly.

Now, @RalfJung has mentioned that LLVM does not perform this optimization today, but it is a sound optimization for C code, so a future version of LLVM or a different backend is allowed to perform it.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 17, 2019

I therefore think that this code unconditionally has undefined behavior (as opposed to being easily able to trigger it). Otherwise, when using cross-lang LTO, we'd need to optimize C code not according to C rules, but according to some other rules, e.g., where calling a function with an indeterminate value, or where returning an indeterminate value from a function, would not be UB.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 17, 2019

The points I raised IMO show that this optimization is NOT sound today -- LLVM can produce code that passes indeterminate values as argument e.g. during uninlining.

Also, extern "C" is used to interface with all sorts of C-compatible languages. I don't think we can assume there is C code on the other side. extern "C" juts defines the calling convention, not the programming language.

I'd be very cautious about optimizations of this style. There's no consensus for how indeterminate values behave inside a C program, let alone how they pass across linking.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Mar 17, 2019

So, to be clear: IMO poison should be treated like any other value for FFI. We don't ask "is it UB to pass 0 to an extern function", because the answer is "it depends": if the function is defined as int foo(int x) { return 100 / x; }, then it is UB, but for other definitions it is not. The same is true, in general, for poison.

Now, you say it is impossible to write a function in C that actually can take a poison parameter. I think this is far from clear, because the status of poison in C is rather indeterminate (heh ;). Every implementation of memcpy relies on indeterminate lvalues, and hence is UB under a strict reading of that clause you quoted. But, also, I don't even think it matters here -- other languages that we talk to with the C ABI could make a different choice, or maybe we are talking to C code that was compiled with a compiler that does allow indeterminate values. We don't get to dictate the language or language dialect used by the other side of this FFI interface.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 17, 2019

re: passing uninitalized data through FFI, I agree that it is not automatically UB (@gnzlbg when we were talking on Discord earlier today, I believe I said otherwise, sorry for the brainfart). However, it is an extremely niche and minor optimizations, and it depends on a ton of detailed and brittle knowledge of the C side of things to be able to justify it. Plus, even in other languages where certain uses of uninitalized data are not immediate UB it is still most likely very easy to get UB from them, so you still need to audit exactly what the callee is doing with the argument. So if this is the only reason why someone would want to pass a MaybeUninit by value in FFI, I have to say: please don't do that for your own sake and also please don't change the language to be able to do it.

That leaves the question of memory layout compatibility. @mjbshaw has explained sufficiently why they have a preference for guaranteeing this layout compatibility by first adding repr(transparent) to the language and then marking MaybeUninit as such. I have written some about why I think using the existing tools is not just sufficient but preferable, but that is not the important part.

Instead I only want to highlight that repr(transparent) unions are not needed to achieve that: by now it seems uncontroversial that we can enable it without changing anything about the language or making MaybeUninit in any way special or magic. This is in contrast to the picture presented when the decision to merge was made. I therefore think the ongoing FCP should be cancelled and the RFC should be evaluated anew under this light.

Disclaimer: it's obvious that I personally do not want this RFC merged, but of course I would accept if the consensus was to merge. I just think that it's very likely some people ticked their boxes under a wrong impression of the benefits of this RFC.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 17, 2019

So I talked with @RalfJung for a while about this on Zulip.

I apologize. I was incorrectly reading extern "C" literally as "interfacing with something that behaves exactly like C" not only ABI-wise, but also at the semantic level (e.g. such code never returns an indeterminate value). @RalfJung (and I suppose @mjbshaw) correctly read it as interfacing via the C ABI of the platform, period. The ABI doesn't care about indeterminate values (bits are bits).

So basically, when I said above that something is UB, it is only UB if the code at the other side of the FFI is actually C. That is, code like this

extern "C" { fn foo() -> MaybeUninit<u32>; } 
let x = unsafe { foo() };

is useful and has defined behavior, it is just not code that one would write to interface with actual C (sorry @mjbshaw I think I was talking past you on this issue in some of the other comments).

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 17, 2019

The discussion about optimizations are a bit orthogonal to the rest, but when @RalfJung mentioned:

The points I raised IMO show that this optimization is NOT sound today -- LLVM can produce code that passes indeterminate values as argument e.g. during uninlining.

I think it is worth mentioning that while this optimization is not sound for all FFI code, it is sound for C as well as for Rust code interfacing with C. This is a major use case of Rust FFI.

@mjbshaw

This comment has been minimized.

Copy link
Author

mjbshaw commented Mar 17, 2019

@rkruppe

without [...] making MaybeUninit in any way special or magic.

Either some guarantees would have to be made about all unions, or they would have to be made about MaybeUninit in particular (which would make it special).

This is in contrast to the picture presented when the decision to merge was made.

I don't think the picture has changed. @RalfJung even called this out explicitly before this RFC was written (bold emphasis mine):

Once that attribute exists, adding it to MaybeUninit would be a no-brainer. And in fact the logic for this has already been implemented in rustc (MaybeUninit<T> de-facto is ABI-compatible with T, but we do not guarantee that.)

People already knew that MaybeUninit<T> and T are de-facto ABI-compatible even without repr(transparent). However, this is part of the problem. That ABI compatibility isn't guaranteed. This could be guaranteed without repr(transparent), but like I said, this would either have to be done for all union types, or for MaybeUninit in particular (again, making it special).

One advantage I see to repr(transparent) (even if MaybeUninit is the only union in all of eternity to ever use it) is that it's machine-readable. Tools like RLS can inform an IDE that MaybeUninit is transparent with T, which can be a useful bit of information to surface to developers. These tools won't have to hard-code MaybeUninit's ABI-compatibility with T.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 18, 2019

Either some guarantees would have to be made about all unions, or they would have to be made about MaybeUninit in particular (which would make it special).

As I said before, we can achieve memory layout compatibility without any special casing or language features by marking MaybeUninit as repr(C). Admittedly, I hadn't considered that it goes quite far in the other direction by ruling out ABI compatibility.

People already knew that MaybeUninit and T are de-facto ABI-compatible even without repr(transparent). However, this is part of the problem. That ABI compatibility isn't guaranteed. This could be guaranteed without repr(transparent), but like I said, this would either have to be done for all union types, or for MaybeUninit in particular (again, making it special).

That is not the change I was referring to (indeed it's not a change as you say). Instead, the aspect that was new in my opinion is that the use cases people actively argue for don't need ABI compatibility at all. If ABI compatibility was needed for something, repr(transparent) would be the natural way to guarantee it, but it turns out the code patterns that seem like good ideas and that people are willing to defend all only need layout compatibility.

... at least, until @gnzlbg pointed out to me on Discord that Rust-Rust communication can quite plausibly also happen through a C ABI (e.g., to get dynamic linking). While we still don't have a concrete example of real code needing this, it's a plausible enough sketch that I no longer want to claim that there are no use cases for transparent unions. Consider my objection to the FCP withdrawn. I'm still unconvinced by the arguments laid out in the RFC text and in this discussion, but it's good enough for me that I don't want to die on this hill.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 20, 2019

@rfcbot concern how-important-is-this

I am raising this concern to echo @rkruppe's post here. I've not been able to follow this discussion in detail, but they make a fairly convincing case to me that we should think twice here!

(@rkruppe, if you feel this is resolved, let me know and I will lift the concern.)

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Mar 20, 2019

Er, heh, I see the last paragraph of @rkruppe's post says this:

... at least, until @gnzlbg pointed out to me on Discord that Rust-Rust communication can quite plausibly also happen through a C ABI (e.g., to get dynamic linking). While we still don't have a concrete example of real code needing this, it's a plausible enough sketch that I no longer want to claim that there are no use cases for transparent unions. Consider my objection to the FCP withdrawn. I'm still unconvinced by the arguments laid out in the RFC text and in this discussion, but it's good enough for me that I don't want to die on this hill.

I'm going to go ahead and resolve the concern for now, but I .. hmm. Yeah, I think I still have some concerns, but it's probably about me not understanding deeply enough. I think I'll probably wind up coming to understand this better though as we incorporate some of this text into the UCG guidelines.

@rfcbot resolve how-important-is-this

This raises an interesting procedural question: I think that, going forward, RFCs like this should really be "executed" as part of the @rust-lang/wg-unsafe-code-guidelines process, and we should think about how that should work. But it's not a topic we have to discuss at this moment.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Mar 20, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 20, 2019

FWIW I consider that use case a very long stretch of our imagination about what this RFC could be useful for.

We explored many use-cases for a long time on Discord, and all of them could be solved by just guaranteeing the layout of these unions.

The Rust-Rust C FFI dynamic library case was what we invented to justify this, but you need to control Rust code at both sides to make it work, and if that's the case, why aren't you using the Rust ABI ? There might be good arguments for that, but I would prefer if this RFC was put on hold until we merge an RFC guaranteeing layout for these types first, and then see what value would this add on top of that.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Mar 20, 2019

@gnzlbg The Rust calling convention is unstable, so the C calling convention (or others not defined by rustc) are your only choice if you want to link together two pieces of Rust compiled with different compiler versions. Most Rust types don't have a stable ABI either, so you can't use them in those interfaces, but MaybeUninit could be one of the types that work just fine there. (Of course, making is repr(C) would also achieve that, but as we discussed there will be cases where ABI transparency is desirable.)

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 20, 2019

There might be good arguments for that, but I would prefer if this RFC was put on hold until we merge an RFC guaranteeing layout for these types first, and then see what value would this add on top of that.

@gnzlbg Note that if/when this RFC is merged the actual implementation will go through the usual stabilization process. If you want to file an RFC you will have plenty of time to do so.

@gnzlbg

This comment has been minimized.

Copy link
Contributor

gnzlbg commented Mar 20, 2019

two pieces of Rust compiled with different compiler versions.

@rkruppe my point was that for the Rust-Rust dynamic lib over C FFI to work you kind of have to been in control of the Rust code on both places to make sure they make use over the right Rust types (if they only assume the code at the other side of the ABI is C then none of this applies).

That is, if you are in control of the Rust code at both sides, you can often compile it with the same compiler version. Sure, this is not always the case, but it feels like if.. if.. if.. if.. then this is useful, which is what i mean by a stretch.

@vi

This comment has been minimized.

Copy link

vi commented Mar 21, 2019

Shall this be allowed?:

#[repr(transparent)]
enum StaringIntoTheVoid {
   Empty(!),
   Lifeline(i32),
   DoublyNothing(!,!),
}

What is interaction between #[repr(transparent)] and the never_type?

@mjbshaw

This comment has been minimized.

Copy link
Author

mjbshaw commented Mar 21, 2019

@vi Interesting, I hadn't thought of that case. Currently the plan is to allow only single-variant enums to be transparent, so that wouldn't be allowed in the current plan. I think we could safely allow that in the future if desired (though I wouldn't pursue it without a good answer to how we should handle it if we changed the code to DoublyNothing(u8,!)).

@vi

This comment has been minimized.

Copy link

vi commented Mar 21, 2019

though I wouldn't pursue it without a good answer to how we should handle it if we changed the code to DoublyNothing(u8,!)

Obviously, the same as DoublyNothing(!,!) or std::convert::Infallible or other uninhabited type.

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Mar 21, 2019

I think a consistent handling would look for singleton enums rather than "univariant" (i.e. semantic rather than syntactic approach). A semantic approach would make StaringIntoTheVoid legal. However, no one is going to realistically write StaringIntoTheVoid so I can't say I care unless consistency simplifies the definition/implementation.

@vi

This comment has been minimized.

Copy link

vi commented Mar 21, 2019

However, no one is going to realistically write ...

Macros and code generation may write such and other redundant things. Both #[repr(transparent)] and the uninhabited type may arrive though some macros or have associated #[cfg] trickery. Sometimes the enum would have two actual, inhabited variants, sometimes it would morph into a transparent wrapper around someting else. Using ! to delete enum's variant from within may be useful for macros/codegen.

@mjbshaw

This comment has been minimized.

Copy link
Author

mjbshaw commented Mar 22, 2019

My issue with enumerations like StaringIntoTheVoid are that:

I'm not opposed to these types of enums being transparent-able, but I think there are some open questions about how much work it is to do it and exactly what should and shouldn't be allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.