Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Exotic Enum Layout Optimizations #1230

Closed
Gankra opened this issue Jul 31, 2015 · 69 comments
Closed

More Exotic Enum Layout Optimizations #1230

Gankra opened this issue Jul 31, 2015 · 69 comments
Labels
T-compiler Relevant to the compiler team, which will review and decide on the RFC.

Comments

@Gankra
Copy link
Contributor

Gankra commented Jul 31, 2015

There are several things we currently don't do that we could. It's not clear that these would have practical effects for any real program (or wouldn't negatively affect real programs by making llvm super confused), so all I can say is that these would be super cool.

Use Undefined Values in Other Primitives

  • bool is a byte but can only contain 0 or 1
  • char is 4 bytes but can only contain values in the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF]

Use Multiple Invalid-Value Fields

(&u8, &u8) can be the same size as Option<Option<(&u8, &u8)>>

Support More than a Single Bit

  • bool can support 254 enum variants
  • char can support tons

Use all other fields as free bits when there exists another forbidden field

(&u8, u64) supports 2^64 variants via the encoding (0, x)

Use the Fact that Void is Statically Unreachable

enum Void { } cannot be instantiated, so any variant the contains Void is statically unreachable, and therefore can be removed. So Option<Void> can only be None, and therefore can be zero-sized.

Support Enums that have More than One Non-C-Like Variant

enum Foo {
  A(&u8, &u8),
  B(&u8),
}

Can be represented without any tag as (&u8, *const u8) via the following packing:

  • self.1 is not null: A
  • self.1 is null: B

Treat any True Discriminant Tags as a Type with Undefined Values

Option<Option<u32>> can be "only" 8 bytes if the second Option stores its discriminant in the first Option's u32 tag.

@Kimundi
Copy link
Member

Kimundi commented Jul 31, 2015

👍 to any of these, even if just as an experiment to see if it would be worth it in practice.

@bluss
Copy link
Member

bluss commented Aug 3, 2015

rust issue: rust-lang/rust#5977

@bluss
Copy link
Member

bluss commented Aug 3, 2015

Be wary of optimizations that are incompatible with pointers to the payload.

For example

let mut x: Option<Option<u32>> = Some(None);
let y: &mut Option<u32> = x.as_mut().unwrap();

y is a reference to just an Option<u32>, so we can't fudge its discriminant from the outer layer.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 3, 2015

Option<u32> is isomorphic to (bool, u32) which should be null-pointer optimizable (well - 2'd pointer optimizable).

@bluss
Copy link
Member

bluss commented Aug 3, 2015

(bool, u32) is a nice way to think about it, it should work.

@Diggsey
Copy link
Contributor

Diggsey commented Aug 13, 2015

A general-purpose "ForbiddenValues" wrapper as you described in rust-lang/rust#27573 would be nice, but how exactly would you tell the compiler which values are forbidden? Even just for primitive types it's difficult, considering that as yet the type system has no support for variadics or constant parameters.

Supporting compound types is even trickier. To be fully general, you have to be able to describe any possible subset of any possible compound type, and do so in a concise way which the compiler can easily reason about, and do it all without relying on any particular layout choice. Short of using a compiler plugin, I have no idea how you do that.

Basically: maybe it's worth stabilizing simple cases like NonZero now - it may well be that in future it will simply be a type alias to a ForbiddenValues type, but that should be a backwards compatible change.

@rkjnsn
Copy link
Contributor

rkjnsn commented Aug 22, 2015

(&u8, &u8) can support 3 variants

It seems like one could also null the first field and use the second as a tag, allowing 2^64 (or 2^32) variants.

@Gankra
Copy link
Contributor Author

Gankra commented Aug 22, 2015

@rkjnsn Oh yeah, great point!

@Kimundi
Copy link
Member

Kimundi commented Aug 22, 2015

Another idea: Make use of padding bytes in nested enums:

enum A {
    Aa(u32, B),
    Ab(u32, B),
}
enum B {
    Ba(u32),
    Bb(u32)
}

Today, this leads to this kind of memory layout:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                                         [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [  Apadding  ] Atag [       u32       ] [  Bpadding  ] Btag [       u32       ]

Now, we can't just combine those two tag ranges and padding areas into one because we need to support interior references. But what we can do is store the A tag in the B padding area:

A [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
B                     [u8,  u8,  u8,  u8,  u8,  u8,  u8,  u8]
  [       u32       ] [ ABpad ] Atag Btag [       u32       ]

The idea here being that a B value never reads or writes its padding area, so its in principle free to be used by outer enums tags.

This would require that the compiler ensures that padding in a enum doesn't get overwritten with junk bytes though, which might be tricky or inefficient in combination with mutable references to such an enum (You couldn't just overwrite all bytes on reassignment).

@rustonaut
Copy link

You could also use the bit0/1 in pointers witch are always zero under some aligment conditions like
if bit0 is 1 then it's not a pointer and bit7-1 are the actual tag. This is a bit more efficient then just having a null pointer as additional marker. (e.g. in the (&u8, &u8) case)

@solson
Copy link
Member

solson commented Aug 28, 2015

@Naicode That may be more space efficient, but it's less time efficient because of the bit shifting and masking required. Rust probably shouldn't make that kind of change automatically.

It's a neat idea, though. Reminds me of Ruby's internal VALUE type which if the LSB is 0 then it's a pointer to an object and if the LSB is 1 then the rest of the bits encode an immediate value like true, false, nil, a symbol, or an integer (there is a further way to differentiate those). So Ruby values are like a super-compact Rust enum in a way.

@rustonaut
Copy link

It's true that such bit-fiddling slows down the program at runtime and therefor should never be applied implicitly.

If more exotic layout optimisations are implemented it might still be possible to opt-in to such optimisations on per-struct-basis e.g. a #[repr(compact_rust)] Attribute witch uses rust representation + space optimisations witch have a runtime cost.

(e.g. for usage of rust in embedded environment with small sized RAM, or a enum's allocated in massive amounts on the heap).

@huonw
Copy link
Member

huonw commented Aug 31, 2015

cc rust-lang/rust#14540

@mbrubeck
Copy link
Contributor

cc rust-lang/rust#24041

@soltanmm
Copy link

Another idea: compress nested enum variant ids (thanks Mutabah over IRC!).

enum A { A1, A2 }
enum B { BA(A), B1, B2 }

This currently requires space in B for the variant id of A, but one could assign BA+A1 = 0, BA+A2 = 1, B1 = 2, B2 = 3, and both save space and maintain the ability to take a reference to the stored A (because its payload may presumably be stored immediately following its variant id [which is also B's] as usual).

@Diggsey
Copy link
Contributor

Diggsey commented Jan 13, 2016

@soltanmm Those kind of optimizations are not generally possible, because 'A' must have the same layout regardless of whether it is used within another enum (otherwise references to the 'A' within 'BA' would have a different layout from normal references to 'A'.

Furthermore, you can't adjust the layout of all 'A's in advance because 'B' might be in a different crate.

What you can do is treat the determinant in 'A' as a restricted value (either 0 or 1 for A1/A2) in which case the 'B' is free to use the disallowed values as IDs for the other variants (B1, B2) but that kind of optimization has already been covered in this issue.

@huonw
Copy link
Member

huonw commented Jan 13, 2016

@Diggsey note that @soltanmm's suggestion doesn't adjust the layout of A (in both standalone and optimised form A1 is 0, and A2 is 1, although the particular values don't matter). AFAICT, your last paragraph is exactly what @soltanmm is saying.

@soltanmm
Copy link

@huonw EDIT: I just realized I might have interpreted that incorrectly.
Almost exactly, I think, but not quite (specifically the bit about this being already covered).

@Diggsey I believe the optimizations discussed most similar were using forbidden values in payloads or placing determinants in padding, as opposed to outright summing the determinants in the same location. Having the determinant be a value for which forbidden values might be used wasn't mentioned, and this has slightly different space-time trade-offs to @Kimundi's example with the determinants being split into different sequences of bit positions.

Arguably, though, everything in this discussion is some variant of, "Use forbidden values to encode variants," so as long as everyone's kicking the horse I guess I concede that to @Diggsey. :-)

@arthurprs
Copy link

arthurprs commented Oct 14, 2016

Would it be possible to enable NonZero for the common Rust enum? Like

enum Status {
    Alive, // discriminant would start at 1 instead of 0
    Suspect,
    Dead,
}

size_of::<Status>() == size_of::<Option<Status>>()

@eddyb
Copy link
Member

eddyb commented Oct 14, 2016

@arthurprs No, that's not a good choice, first off you can't depend on knowing whether Option<Status> is used in the program, and secondly what if I have Option<Result<Status, ()>>?
The general niche optimization would use values above those of the original enum, which scales.

@solson
Copy link
Member

solson commented Oct 14, 2016

The general optimization scales better, but @arthurprs's suggestion seems like it would technically work fine. You don't need to depend on knowing about Option<Status>, you can just always start from 1, unless we have existing reasons to prefer starting from 0.

@arthurprs
Copy link

Rust rely so much on these patterns that any gain would be huge.

Option<Result<Status, ()>>

Can't it be the usual 2 discriminants + Status?

@cuviper
Copy link
Member

cuviper commented Oct 14, 2016

If you manually wrote enum Status { Alive=1, ... }, could it decide this is NonZero? But generally speaking, I would agree that broader Forbidden Value analysis is more promising.

@ahicks92
Copy link

ahicks92 commented Jul 9, 2017

@SimonSapin
I think doing this automatically may be difficult because you are effectively changing the type by removing a level of pointer indirection only in some highly specific cases.

It feels like this would be on the order of struct field reordering in terms of the kinds of weird bugs it'd end up with and the effort required, with not nearly as much impact.

A more general optimization here would be to realize that the refcount of Rc is never zero and that the first variant never extends to cover it, but this relies on Rc not being reordered which isn't guaranteed anymore.

@SimonSapin
Copy link
Contributor

(FWIW the refcount is not stored in Rc<T> directly, it’s in the heap-allocated RcBox<T> that Rc<T> points to.)

@ahicks92
Copy link

Yeah, you're right. My bad. I still think my point stands, though.

@eddyb
Copy link
Member

eddyb commented Nov 18, 2017

So in rust-lang/rust#45225 (comment) I mentioned some hard cases, but @trentj on IRC brought it up again and I've realized there's a simpler way to look at things, that is not type-based, as

enum E {
    A(ALeft..., Niche, ARight...),
    B(B...),
    C(C...)
}

can be seen as a generalization of the case where sizeof B and sizeof C are 0.

The challenge is to fit B and C in ALeft and/or ARight, which doesn't have to be too clever if Niche has either the smallest (e.g. bool, Option<ZST>, etc.) or largest (e.g. &T, Option<usize>, etc.) alignment in E, because we can hint the field reordering to place it at one end of the variant, giving B and C only one run of bytes to fit, instead of two.

EDIT: now over at rust-lang/rust#46213.

@fstirlitz
Copy link

One possible optimisation that hasn't been mentioned yet: NaN tagging. It's impossible to implement with current Rust types, however; floating-point types treat all possible NaN bit patterns as valid. Doing this would require adding NonZero-like wrapper types or some equivalent mechanism to restrict valid representations of floating-point values.

I've actually got something written up on this topic, I might submit it as an RFC soon...

@eddyb
Copy link
Member

eddyb commented Nov 19, 2017

@fstirlitz With const generics we could have a generalization of NonZero and use it to implement a type like NonNaN, by removing a specific range of values for f32 and another for f64.

@fstirlitz
Copy link

@eddyb Will it be possible to use const generics to disallow NaNs with certain payloads, but not others? You'd have to have to_bits() be const fn, which is currently impossible, because it's implemented with transmute. (Use case for this: an interpreter for a dynamically-typed language which supports NaNs, but doesn't expose payloads.)

And even if sufficiently expressive const generics do arrive, it will be beneficial to have a canonical form of this feature in the standard library.

@eddyb
Copy link
Member

eddyb commented Nov 20, 2017

I'm not talking about CTFE to compute the layout, but a wrapper type with two integer parameters expressing a range of values that the inner type can't use but optimizations can.

@fstirlitz
Copy link

You mean, like...

// possibly wrapped in an opaque struct;
// could probably be generic over float types
// by means of associated consts and types,
// but using f64 for readability
enum NaNaN64 {
	Positive(IntInRange<u64, 0x0000000000000000, 0x7ff0000000000000>),
	Negative(IntInRange<u64, 0x8000000000000000, 0xfff0000000000000>),
}

impl From<NaNaN64> for f64 { /* f64::from_bits */ }
/* other impls */

That's... not especially convenient, is it?

Plus, without compiler support it won't be able to additionally take advantage of LLVM's fast-math annotations.

@eddyb
Copy link
Member

eddyb commented Nov 20, 2017

More like:

struct NaNaN64 {
    float: WithInvalidRange<WithInvalidRange<f64,
        0x7ff0000000000000, 0x7fffffffffffffff>,
        0xfff0000000000000, 0xffffffffffffffff>
}

But internally we can't represent the two ranges anyway for now, so for NaN-boxing you'd have to choose only one of them, in the near future.
As for fast-math... It could work if LLVM actually understood range metadata as "can't possibly be a NaN" and optimizing based on that.

It's much more straightforward when everything is offsets and bitpatterns and ranges than "types".

@Gankra
Copy link
Contributor Author

Gankra commented Nov 20, 2017

I'm pretty sure that a safe "forbidden NaN" float type would undermine its own benefits with NaN-masking-cmov's (or worse) everywhere.

@Yoric
Copy link

Yoric commented Nov 20, 2017

Note: If someone is willing to mentor me on this bug, I'm interested in tackling it.

@eddyb
Copy link
Member

eddyb commented Nov 20, 2017

@gankro How are we going to track that most of the optimizations mentioned this have been implemented, and the various tricks required to make any further changes?

@Gankra
Copy link
Contributor Author

Gankra commented Nov 20, 2017

@eddyb are any of the optimizations in the OP not implemented? It looks like at least most are, and if so we might want to turn this into a metabug that points at sub-issues for the remainder, or just close it.

@fstirlitz
Copy link

@gankro: re NaNs, you mean checking after arithmetic operations whether the result was NaN? Maybe. On the other hand, such types could at least implement Ord and Eq; and a 'designated canonical NaN' type might even elide most of these checks (if the canonical NaN is chosen wisely), thanks to NaN propagation semantics of IEEE 754.

@SimonSapin
Copy link
Contributor

@eddyb Could Cow<str> be represented on three words? Cow::Owned(String) would be String (with a non-zero pointer, assuming it is first) and Cow::Borrowed(&str) would be (0, &str). I half expected rust-lang/rust#45225 to do this but it doesn’t.

@eddyb
Copy link
Member

eddyb commented Nov 21, 2017

Yeah, I only figured out how to later. Also, it requires rustc_trans to generate LLVM constants from miri allocations, for us to be able to combine the niche with additional data.

@Kixunil
Copy link

Kixunil commented Nov 23, 2017

@SimonSapin another alternative: since capacity > len, &str can be represented exactly as String with capacity == 0. The con is we can't distinguish between zero-sized &str or zero-sized String. The great thing is for all practical purposes we don't have to. My suggestion also keeps pointer non-zero, so even Option<MyCow> has the same layout.

I've actually started writing such crate for fun.

Same holds for Vec

@SimonSapin
Copy link
Contributor

But the compiler doesn’t know about capacity > len. It does already know that NonZero<*mut u8> inside String is non-null, though.

@Kixunil
Copy link

Kixunil commented Nov 23, 2017

Yes. The compiler will probably never be able to do the optimization using capacity >= len because of the weird empty case, which we know doesn't matter but explaining that to the compiler would be a nightmare.

@eddyb
Copy link
Member

eddyb commented Nov 23, 2017

Since a part of the original desired optimizations have been implemented in rust-lang/rust#45225, and most of the remaining ones have various trade-offs that need to be explored by RFC for each category separately (with the exception of rust-lang/rust#46213), I'm going to close this central issue.

@eddyb eddyb closed this as completed Nov 23, 2017
djrenren pushed a commit to djrenren/compiletest that referenced this issue Aug 26, 2019
Mark enums with non-zero discriminant as non-zero

cc rust-lang/rfcs#1230
r? @eddyb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests