Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: don't use Abi::ScalarPair for unions when padding is involved. #62298

Open
wants to merge 1 commit into
base: master
from

Conversation

@eddyb
Copy link
Member

commented Jul 2, 2019

See #60405 (comment) for the discussion (no issue was opened for that bug).
Assuming that we want to treat unions as opaque bags of bytes and preserve all of them uniformly, this PR disables the "scalar pair" optimization when it would cause some of those bytes to be ignored.

In the example given by @gnzlbg, (u8, u32), bytes 1, 2 and 3 are interior padding and Abi::ScalarPair, even if treated as aggregate by non-Rust calling conventions, will cause Rust copies of the whole value to only copy the u8 and u32 components separately, losing the original values in those 3 padding bytes (effectively replacing them with undef).

r? @rkruppe or @nagisa cc @RalfJung

@nagisa

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

The implementation seems fine but I’m not too pumped about the test. It seems like an extremely convulted way to test what’s being tested. I’m partial to codegen tests and there should probably also be a test which checks that ScalarPair still gets used where we want it to be used.

@eddyb eddyb force-pushed the eddyb:unpair-unions branch from 708178b to 6b6f3da Jul 2, 2019

@eddyb

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

@nagisa The annoying thing about a codegen test is that the calling conventions can make the code wildly different, and if you rely on optimizations to normalize things, the bug can disappear.

FWIW, I've updated the test since, to be more thorough (it now tests both Rust and C ABIs, with more types, and every byte of every type is checked).

Also, ideally, I'd use something like the proposed LLVM instruction freeze, instead of black_box, since what I want is to be able to read any byte of the union without triggering UB if it's undef.

@Centril

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

Assuming that we want to treat unions as opaque bags of bytes and preserve all of them uniformly, this PR disables the "scalar pair" optimization when it would cause some of those bytes to be ignored.

Note that this is by no means a settled matter and people shouldn't start assuming it is.

It seems to me that having the optimizations rather than disabling them is good in terms of having people not assume things because they just "seems to work". (In general, I think where rustc can be aggressive while preserving soundness, it should -- but that is not to say that this is easy.)

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

It seems to me that having the optimizations rather than disabling them is good in terms of having people not assume things because they just "seems to work". (In general, I think where rustc can be aggressive while preserving soundness, it should -- but that is not to say that this is easy.)

The way I see it, this is part of an experiment to see if the semantics of unions being a bag of bytes is even achievable.


fn main() {
// NOTE(eddyb) we can't error for the `extern "C"` ABI in these cases, as
// the x86_64 SysV calling convention will drop padding bytes inside unions.

This comment has been minimized.

Copy link
@RalfJung

RalfJung Jul 2, 2019

Member

Why is there no .stderr file then if we expect warnings to be shown?

Also, would be nice to still test the first N bytes that we assume not to be part of the padding.

This comment has been minimized.

Copy link
@eddyb

eddyb Jul 3, 2019

Author Member

I don't think we capture runtime stderr, only rustc stderr.

This comment has been minimized.

Copy link
@eddyb

eddyb Jul 3, 2019

Author Member

Not to mention that only some targets will hit that.

@Centril

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

The way I see it, this is part of an experiment to see if the semantics of unions being a bag of bytes is even achievable.

@RalfJung I do like experimentation and data gathering! :) Could we perhaps add some explicit disclaimers around the code and the tests however to make the experimental nature clearer?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

Sure, that seems fair.

@rkruppe

This comment has been minimized.

Copy link
Member

commented Jul 7, 2019

So, considering rust-lang/unsafe-code-guidelines#156, do we still want to do this?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

I honestly don't know.^^ @eddyb @gnzlbg what difference does this make for rust-lang/unsafe-code-guidelines#156 ?

@gnzlbg

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

Without this PR, #[repr(Rust)] union U { x: (u8, u32) } has 3 bytes of padding that are not copied when the union is copied / moved.

This means that code that assumes that these bytes are copied on move is broken.

This is more restrictive than copying these bytes, and I think it is the conservative thing to do until rust-lang/unsafe-code-guidelines#156 is resolved. That is, I'd rather not merge this.


For #[repr(C)] union U { x: (u8, u32) }, the issue was trickier. C does not copy the padding, so in a round trip Rust->C->Rust it cannot be preserved. The open question is whether it is even possible to try to copy the padding here in the Rust side, e.g., when dealing with extern "C" fn foo(U) -> U { ... } definitions, that are potentially only ever called from Rust code. I recall @eddyb mentioning that the calling convention should not interact with this, and that maybe we could give these the same layout as a [u8; size_of::<U>()] to work around that, but I don't recall the details of why this was hard to achieve.

I think all of this should inform rust-lang/unsafe-code-guidelines#156 , but we should pick there the semantics that make most sense for Rust first, and worry about how to implement them later.

@RalfJung

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

but we should pick there the semantics that make most sense for Rust first, and worry about how to implement them later.

I'd love to do that, and fight fiercely for "bag of bits no matter the repr", but that seems to me like ignoring reality if we come up with an unimplementable semantics...

@JohnTitor

This comment has been minimized.

Copy link
Member

commented Jul 28, 2019

Ping from triage: any updates? @eddyb

@JohnTitor

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

Ping from triage: @eddyb and @rkruppe, any updates on this?

@rkruppe

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

This PR is sort-of blocked on the discussion of unions and paddings, and that discussion is currently stalled. We should probably close this PR and revisit when that discussion is settled. @RalfJung what do you think?

@RalfJung

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

I don't have a good handle on the union situation. Does rust-lang/unsafe-code-guidelines#156 mean we already know the result of the experiment?

But anyway this PR only affects MaybeUninit-like unions, if I understood the code correctly? Seems fair to close then. We first have to figure out some way to describe what happens for general unions (or lobby with LLVM to make our desired semantics implementable).

IIRC @eddyb mentioned another experiment where we basically add a [u8; size_of::<T>()] variant to the union, to "teach" LLVM the semantics we want. That sounds like a more interesting experiment to me. And maybe that needs something like this PR to make MaybeUninit-like unions not "slip through"?

@rkruppe

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

My understanding was that the main concern is that the desired "unions are bags of bits without padding" is not implementable at all, not just because C code may stomp over parts it considers padding, but also because some C ABIs (which pure Rust programs can use without any FFI or unsafe) may not even have a way to pass all the bytes of certain unions. The question to answer, then, is whether we want to give up on the dream entirely and have a less satisfying but reasonable uniform semantics of unions, or try to salvage unions-as-bags-of-bits in some subset of cases (presumably at the cost of more complicated semantics).

@RalfJung

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

Ah, right. Yes. So an option might be to make repr(Rust) unions bags-of-bits and do something more messy for others, or so. But that's indeed a topic for the UCG thread.

I think we also nee more data on what exactly does (not) get preserved for unions with more than one variant in C.

@joelpalmer

This comment has been minimized.

Copy link

commented Aug 12, 2019

Ping from Triage. Looks like this PR is still "sort-of blocked". I am going to change the label to waiting-on-author since a reviewer was the last to respond here. @RalfJung @rkruppe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.