New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize enums by niche-filling even when the smaller variants also have data. #46213

Open
eddyb opened this Issue Nov 23, 2017 · 28 comments

Comments

Projects
None yet
@eddyb
Copy link
Member

eddyb commented Nov 23, 2017

The current implementation only handles enums of roughly the form:

enum E {
    A(ALeft..., Niche, ARight...),
    B,
    C
}

This can be seen as a special-case, where B and C occupy 0 bytes, of:

enum E {
    A(ALeft..., Niche, ARight...),
    B(B...),
    C(C...)
}

As long as B and C can fit before or after A's Niche, we can still apply the optimization.
Also see rust-lang/rfcs#1230 (comment) for the initial description.

@Gankro

This comment was marked as resolved.

Copy link
Contributor

Gankro commented Nov 23, 2017

cc me

@est31

This comment has been minimized.

Copy link
Contributor

est31 commented Nov 23, 2017

what is "niche" in this case?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Nov 23, 2017

A niche is a location in the type where some bit patterns aren't valid.

For instance struct Foo(u32, bool, u32) has a niche where the bool is, because only 0 and 1 are valid bools. Option<Foo> can therefore use Foo.1 == 2 to represent None.

@est31

This comment has been minimized.

Copy link
Contributor

est31 commented Nov 23, 2017

@Gankro thanks, that makes sense.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 23, 2017

We probably need better terminology, "invalid" values and "invalid reuse" optimization?

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Nov 23, 2017

Here is a test case we currently fail to optimize, but this issue would fix:

Result<Result<u32, u32>, u32>

The compiler sees:

enum Result<Result<u32, u32>, u32>  {
  Ok({ tag: 0u32 | 1u32, payload: u32 }),
  Err(u32),
}

which should lower to:

{ tag: 0u32 | 1u32 | 2u32, payload: u32 }

because the Err payload fits after the niche (Ok.tag).

@leonardo-m

This comment has been minimized.

Copy link

leonardo-m commented Nov 23, 2017

Before merging this PR I think we should see benchmarks of both memory saved and run-time saved for some real program, like the rust compiler and Servo.

@leonardo-m

This comment has been minimized.

Copy link

leonardo-m commented Nov 23, 2017

I have yet to see similar benchmaks for #45225 .

@Gankro

This comment has been minimized.

Copy link
Contributor

Gankro commented Nov 23, 2017

Some slightly more complex cases from webrender that would also benefit from this:

// Use the Opacity.0.tag niche
pub enum FilterOp {
    Blur(f32),
    Brightness(f32),
    Contrast(f32),
    Grayscale(f32),
    HueRotate(f32),
    Invert(f32),
    Opacity(PropertyBinding<f32>, f32),
    Saturate(f32),
    Sepia(f32),
}

pub enum PropertyBinding<T> {
    Value(T),
    Binding(PropertyBindingKey<T>),
}

pub struct PropertyBindingKey<T> {
    pub id: PropertyBindingId,
    _phantom: PhantomData<T>,
}

pub struct PropertyBindingId {
    namespace: IdNamespace,
    uid: u32,
}

pub struct IdNamespace(pub u32);
// use the RoundedRect.1.mode.tag niche
pub enum LocalClip {
    Rect(LayoutRect),
    RoundedRect(LayoutRect, ComplexClipRegion),
}

#[repr(C)]
pub struct ComplexClipRegion {
    pub rect: LayoutRect,
    pub radii: BorderRadius,
    pub mode: ClipMode,
}

#[repr(C)]
pub struct BorderRadius {
    pub top_left: LayoutSize,
    pub top_right: LayoutSize,
    pub bottom_left: LayoutSize,
    pub bottom_right: LayoutSize,
}

#[repr(C)]
pub enum ClipMode {
    Clip,
    ClipOut,
}

#[repr(C)]
struct LayoutRect(f32, f32, f32, f32);

#[repr(C)]
struct LayoutSize(f32, f32);
@abonander

This comment has been minimized.

Copy link
Contributor

abonander commented Dec 17, 2017

A simpler case that I think fits this optimization (or a specialization of it as a starting point):

// my use-case, a single value is most common
enum OccupiedSmallVec {
     Single(Foo),
     Multiple(Vec<Foo>),
}

If Foo is 2 usizes or smaller, then OccupiedSmallVec should be the same size as Vec, as if it was implemented via this union:

#![feature(untagged_unions)]
union OccupiedSmallVec {
    // where `single.0 == 0`
    single: (usize, Foo),
    multiple: Vec<Foo>,
}

However, it currently requires a separate tag and padding to align the pointers, wasting basically another usize.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Dec 17, 2017

@abonander Not really, that's the entire optimization, assuming Vec<Foo> starts with a non-null pointer, and you're packing Foo in the leftover space after that pointer.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 20, 2018

From IRC (cc @nox): we could extend this to always compute the size of the tagged layout (which would likely be larger than A itself) and allow using more space around A's fields up to that size, to fit other variants, if we generally want to avoid overlapping variant data where possible.

Then Result<&T, E> and similar would have the layout of (Option<&T>, E).

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Mar 20, 2018

@killercup

This comment has been minimized.

Copy link
Member

killercup commented Jun 4, 2018

IIUC, this optimization could also work to make Cow<str> and Cow<[T]> 3 words, right? (cf. discussion on reddit)

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Jun 4, 2018

Yes.

@newpavlov

This comment has been minimized.

Copy link
Contributor

newpavlov commented Aug 24, 2018

Another example which can be optimized is:

pub enum Bar {
    A = 1,
    B = 2,
    C = 3,
}

pub enum Baz {
    D = 4,
    E = 5,
    F = 6,
}

pub enum Foo {
    Bar(Bar),
    Baz(Baz),
}

Currently Foo takes 2 bytes, while it can be trivially represented as 1.

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Aug 24, 2018

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Aug 24, 2018

Another example which can be optimized is:

This cannot happen, unless we make Bar and Baz types that have size less than byte, which is not something we can do (for variety reasons). Consider that it is possible to take an internal reference into data stored within some variant of Foo (i.e. either &Bar or &Baz) and this reference, when dereferenced should return the original data. To achieve that, the only way to do so is to use some of the "unused" bits to store the active variant of the enum, but that would then require bit twiddling when dereferencing any &Bar or &Baz, even if those weren’t stored within the enum.

@RalfJung

This comment has been minimized.

Copy link
Member

RalfJung commented Aug 24, 2018

@nagisa Bar and Baz have disjoint sets of valid values, though. So it could be done.

It doesn't fit the "niche" scheme, though.

@newpavlov

This comment has been minimized.

Copy link
Contributor

newpavlov commented Aug 24, 2018

@nox I think it can be handled as a special case, i.e. if all enum elements are field-less enums, check if tags do not overlap and fit into a minimal repr. Yes, making it more generic will be significantly more involved, but even with field-less constraint this optimization will help a lot for FFI and parsing cases. (currently you often have to keep huge amount of consts, unwieldy enums or sacrifice performance)

@nagisa Ah, I should've explicitly mentioned, I want it foremost for field-less enums. (see linked issue before my first message) More generic handling would be nice, but less important.

@nox

This comment has been minimized.

Copy link
Contributor

nox commented Aug 26, 2018

It makes mem::discriminant way more complex. Currently computing the discriminant of any enum value is at most a comparison and a subtraction. Such optimisations would make the computation more complex, for example what would if let Foo::Bar(_) = foo { ... } compile to?

@alercah

This comment has been minimized.

Copy link
Contributor

alercah commented Sep 11, 2018

Optimizing for discriminant extraction seems, to me, far less valuable on average than optimizing for space. Optimizing for the ease of writing an intrisinc for it seems even less valuable. In the provided example, if let Foo::Bar(_) = foo { .. } is if *std::mem::transmute::<&u8>(&foo) <= 3 { ... }, which doesn't even have a subtraction.

@alercah

This comment has been minimized.

Copy link
Contributor

alercah commented Sep 11, 2018

I just now realized that std::mem::discriminant is, in general, not required to correspond to the specified discriminant. The Discriminant type does not directly expose the actual value of the discriminant, and like all intrinsics, the underlying intrinsic is unstable. Only the Debug and Hash instances expose it indirectly, and I don't think those should be considered guaranteed to be stable.

In fact, I believe that the only way you can observe the discriminant of a variant of a #[repr(Rust)] enum is via numeric cast, and you can't do that for an enum with fields. So I would propose that, at least for now, explicit discriminants only be permitted on enums with other representations.

@oli-obk

This comment has been minimized.

Copy link
Contributor

oli-obk commented Sep 11, 2018

You need to distinguish between the tag and the discriminant. The tag is an arbitrary bit pattern that stores a value that can be used to extract the discriminant. The discriminant exists even for types like Option<&u32>, which have no tag.

tag extraction really is just a field access. Getting the correct discriminant from that can technically be arbitrarily complex. In practice it is desirable to keep the complexity low though.

@alercah

This comment has been minimized.

Copy link
Contributor

alercah commented Sep 11, 2018

Oh, I put that message in the wrong thread, oops.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Nov 21, 2018

As long as B and C can fit before or after A's Niche, we can still apply the optimization.

Should struct field reordering be tweaked to try to push a niche to the start or end of a struct layout (if possible without increasing the struct size) in order to make this optimization more effective when that struct is used in an enum?

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 21, 2018

Optimizing for discriminant extraction seems, to me, far less valuable on average than optimizing for space.

Note that complicated discriminant extraction can have second-order effects on LLVM otpimizations and can slow down code more than the speedups you get from slightly smaller type sizes.

@rhendric

This comment has been minimized.

Copy link

rhendric commented Dec 21, 2018

I think frunk's Coproducts would also benefit from this optimization; currently, a Coprod!(i32, bool, i64) is bulkier than the equivalent enum (24 bytes vs. 16). If Coproducts can have performance parity with enums, that would make it easier to sell them as a way to experiment with ad-hoc sum types before any of the relevant RFCs land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment