Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of unions #73

Open
RalfJung opened this Issue Jan 10, 2019 · 11 comments

Comments

Projects
None yet
5 participants
@RalfJung
Copy link
Member

RalfJung commented Jan 10, 2019

Discussing the validity invariant of unions.

One possible choice here is "none, any bit pattern is allowed no matter which types the fields have, and including uninitialized bits".

We could also decide that e.g. a

union Foo { a: bool, b: (bool, u8) }

must start with the first byte being either the bit-pattern of false or the bit-pattern of true, because all fields agree on that invariant.

Notice that we cannot require the union to be valid for some field: for a union like

union Mix {
  f1: (bool, u8),
  f2: (u8, bool),
}

we want to allow a bit pattern like 0x3 0x3, which can occur from code like

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

There is no demonstrated benefit from disallowing such code, and this kind of code seems perfectly reasonable around unions.

Given that, any validity invariant that wants to restrict the set of allowed bit patterns will be rather complicated. However, such an invariant would enable us to e.g. layout-optimize Option<Foo>, whereas the "anything goes"-invariant would prohibit any kind of layout optimization around unions.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 10, 2019

My personal preference is to allow any bit pattern, mostly to keep things simple. Unions are already complex enough, and they only occur in unsafe code, we should make this as simple to use for the programmer as we can.

If we ever desparately need to layout-optimize Option<Foo>, I propose we add attributes that let us control this -- something like a stable version of rustc_layout_scalar_valid_range_start.

@nikomatsakis

This comment has been minimized.

Copy link
Collaborator

nikomatsakis commented Jan 31, 2019

My personal preference is to allow any bit pattern, mostly to keep things simple. Unions are already complex enough, and they only occur in unsafe code, we should make this as simple to use for the programmer as we can.

I feel like @joshtriplett and @cramertj have both, in the past, expressed strong opinions about this.

I think I agree regarding allowing any bit pattern. That said, I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

@nikomatsakis

This comment has been minimized.

Copy link
Collaborator

nikomatsakis commented Jan 31, 2019

@RalfJung

We could also decide ... must start with the first byte being either the bit-pattern of false or the bit-pattern of true, because all fields agree on that invariant.

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right? Personally, I sort of think we should just guarantee that the Rust compiler will do so. Particularly if we decide that unions are an opaque "bag of bits" from the perspective of the compiler, what is the motivation for the compiler to add extra padding into that bag?

(The same applies to your second example.)

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Jan 31, 2019

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right?

Yes, I've been assuming that to be the base.

I feel like @joshtriplett and @cramertj have both, in the past, expressed strong opinions about this.

Yeah I remember that as well. Also @petrochenkov expressed the opposite opinion, namely that unions should have a non-trivial invariant.

I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

I can imagine wanting to do many things. :) But I feel such needs are better served by an opt-in attribute, than enabled per default.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 1, 2019

I think I agree regarding allowing any bit pattern. That said, I can definitely imagine wanting to be able to create a union that is "as optimized" as the equivalent enum, but that it has no discriminant (because you know you can figure that out from other means).

Yep, I would prefer this for non-repr(C) unions. IMO the common case (the FFI stuff) should always use repr(C), so this should be a non-issue for most folks. We could even lint on non-repr(C) unions just as a "be sure you know what you're doing, and that this is uncommon" hint.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Feb 1, 2019

Yep, I would prefer this for non-repr(C) unions.

Could you spell out what you mean by "this"?

@gnzlbg

This comment has been minimized.

Copy link
Collaborator

gnzlbg commented Feb 4, 2019

It seems like this would be true only if the Rust compiler decided to lay the fields out at offset zero, right? Personally, I sort of think we should just guarantee that the Rust compiler will do so.

I also think that we should guarantee this, but @joshtriplett mentioned some reasons about why we might not want to do that in the discussion about the layout of unions (#13 (comment)). It's unclear to me whether that interchange achieved some consensus, but maybe we should open a different issue to discuss whether we might want to guarantee this particular thing ? That would need amending the layout of unions in the repo.

EDIT: for repr(C) unions, the fields always start at offset 0 AFAIK.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Feb 4, 2019

I also think that we should guarantee this, but @joshtriplett mentioned some reasons about why we might not want to do that in the discussion about the layout of unions (#13 (comment)). It's unclear to me whether that interchange achieved some consensus, but maybe we should open a different issue to discuss whether we might want to guarantee this particular thing ?

That question can't really be separated from what we're discussing here: these kinds of layout optimizations are only possible if unions have a non-trivial validity invariant relating to the validity invariants of the fields. If we achieve consensus here that unions are just bags of bits, then it should be uncontroversial that there's no reason to place union fields at nonzero offsets. Conversely, the desire for layout optimizations of unions is a reason to want some non-trivial validity invariant for unions. I believe that's why nothing was settled during the previous discussion of union layout.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 5, 2019

@RalfJung

Yep, I would prefer this for non-repr(C) unions.
Could you spell out what you mean by "this"?

Yeah-- I'd like the opportunity to optimize based on known-invalid bitpatterns for repr(Rust) unions.

@RalfJung

This comment has been minimized.

Copy link
Member Author

RalfJung commented Feb 5, 2019

@cramertj So what do you think about code like this, which violates the principle that at any time, some variant of the union is valid?

union Mix {
  f1: (bool, u8),
  f2: (u8, bool),
}

let m = Mix { f1: (false, 3) };
m.f2.0 = 3;

From what I recall, this is something we explicitly want to support. It is also rather hard to argue that this is UB because it never actually performs an operation that "sees" a bad value.

Given that unions are basically sugar for transmutes, I am really worried about automatically assuming that any of their bit patterns are invalid.

@cramertj

This comment has been minimized.

Copy link
Member

cramertj commented Feb 5, 2019

oh wow what a mess, good point! I'd still argue that the niches left over from overlapping all variants (e.g. union Mix { f1: NonZeroU32, f2: NonZeroU8 }) could be supported, but that is more complicated than what I originally imagined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.