Use the niche optimization if other variant are small enough #70477

ogoffart · 2020-03-27T15:19:07Z

Reduce the size of enums which have one bigger variant with a niche

I guess this should maybe be behind a compiler flag at first?
cc @eddyb

rust-highfive · 2020-03-27T15:19:10Z

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

eddyb · 2020-03-27T23:26:24Z

r? @eddyb

eddyb · 2020-03-27T23:30:21Z

src/librustc/ty/layout.rs

+            // Rotate index array to put the largest niche first.
+            // Since it is already the first amongst the types with the same alignement,
+            // this will just move some of the potential padding within the structure.


Heh this is pretty clever.

eddyb · 2020-03-27T23:41:15Z

This seems to address #46213, or at least partially?
I'm really happy someone is taking this on, I keep not getting to it 😞.

Something I've been meaning to do since last year is to solve #63866 by first computing the unoptimized enum layout, and only then, the niche one, and only using the niche one if it's strictly smaller or is equal in size but has strictly more available values in its largest niche.

That way we don't have to come up with clever conditions for when we should use a niche layout, as we will simply not use it if it's not helpful in size or in terms of its largest niche.

Something else to keep in mind is that a niche layout has more complex codegen than a tagged one, so it might optimize worse, even if they are otherwise equal on size and largest niche.

@ogoffart So would you be interested in changing the code to work like that (probably in a separate PR, that we can test on its own for perf regressions), i.e. only use an optimized (niche) layout if it's a benefit over the unoptimized (tagged) one?

I think it might allow this PR to be simpler, or at the very least, reduce the risk of accidentally increasing some types' sizes, to 0.

ogoffart · 2020-03-27T23:51:32Z

src/librustc/ty/layout.rs

-                                    dataful_variant = None;
-                                    break 'variants;
+                        if !max_size.is_aligned(align.abi) {
+                            // Don't perform niche optimisation if there is padding anyway


I believe this fixes #63866. If it detects that the variant would contains padding, the padding will be used for the discriminant.

Perhaps, but it seems harder to reason about than having two candidates for a layout and picking the best one out of them.

I'm thinking computing the second candidate would be a waste of time.

ogoffart · 2020-03-27T23:58:55Z

right, this fixes #63866 and #46213, I did not look for these issue before.

So would you be interested in changing the code to work like that (probably in a separate PR, that we can test on its own for perf regressions), i.e. only use an optimized (niche) layout if it's a benefit over the unoptimized (tagged) one?

I am not sure I understand this. I believe the current implementation will only use the Niche if it reduces the size of the enum.

Anyway, I think there is more to do in the codegen because many test are failling. They passes in stage 1 (which i was running locally) but fail in stage 2 (on the CI) valgrind reports some use of uninitialized value. I need to investigate.

ogoffart · 2020-03-28T12:39:36Z

I adjusted the ABI to fix the miscompilation issue, and added a test for issue #63866

bors · 2020-03-30T08:33:31Z

☔ The latest upstream changes (presumably #70536) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2020-04-01T23:44:28Z

☔ The latest upstream changes (presumably #70672) made this pull request unmergeable. Please resolve the merge conflicts.

eddyb · 2020-04-02T09:19:02Z

I am not sure I understand this. I believe the current implementation will only use the Niche if it reduces the size of the enum.

Perhaps, but there's no formal proof of this and it's impossible to exhausively test, so we'd have to rely on convincing ourselves, and that's the main reason I kept putting this off: the conditions were too complex and hard to reason about, and they were also very easy to invalidate by other changes.

I want to land this, but I'm not comfortable with it unless one of these two things happen:

the implementation gets a formal model, where we can prove things about all possible cases
we check instead of assuming the conditions are sufficient

I'm thinking computing the second candidate would be a waste of time.

We can measure this, it's not uncommon in rustc to check all possibilities.
Bonus: it would remove the need for complex logic that tries to approximate the comparison between the two candidates, and we could focus on things like using SmallVec instead of Vec, instead.

ogoffart · 2020-04-02T09:41:18Z

With this patch (when struct_reordering_opt is true,) we only perform the niche optimization if the sum of the size of all fields of the bigger variant is a multiple of the alignment. The ordering of member make sure there is not going to be any padding between these members, so therefore there is no padding place for the discriminant if not using the niche. So using the niche will always lead to a smaller size for the enum.

Maybe i should add a debug_assert that this is the case?

ogoffart · 2020-04-02T10:55:22Z

(I added an assert that there was no padding in the layout, and so that the niche must be smaller than not using a niche)

eddyb · 2020-04-03T12:05:06Z

Ah so the logic here is that you're being conservative about niches and only using them when it would be impossible to add a tag without increasing the size of the type?

I think this makes sense in the common case of byte tags, but what about this example?

Am I right that this PR makes both Enum<u16> and Enum<NicheU16> have 6 bytes?

ogoffart · 2020-04-03T16:47:52Z

Am I right that this PR makes both Enum<u16> and Enum<NicheU16> have 6 bytes?

Yes, that was right. I fixed that now.
I indeed forgot to consider cases where there are more than 256 variants.

Now I understand better what you mean by "impossible to exhausively test". But I think it is fine to leave some corner case such as enum with huge amount of variants.

For example, an Option<(NicheU16, u8)> would still not use the niche so its the bigger niche would "only" have 254 entries available.

eddyb · 2020-04-03T17:12:21Z

For example, an Option<(NicheU16, u8)> would still not use the niche so its the bigger niche would "only" have 254 entries available.

(NicheU16, u8) is the single field of Option<(NicheU16, u8)>'s Some variant and it has size 4, alignment 2, meaning it would use the niche, not add a tag, with your logic (so same outcome as today).

eddyb · 2020-04-03T17:22:12Z

src/librustc_middle/ty/layout.rs

+                        if padding_bits >= 32 || count >> padding_bits == 0 {
+                            // Don't perform niche optimisation if there is enough padding bits for the discrimiment
+                            dataful_variant = None;
+                        }


So this is a more correct version of the "only use niche if it's physically impossible to fit a tag anywhere", right?

It's still conservative in that it uses the tag layout unless it knows for sure that the niche layout would be beneficial, right?

Yes that's right.
I changed it again to be sure than the padding is actually bigger than the biggest niche so we'd still use the niche in Option2<NicheU16, u8> (which i meant instead of Option<(NicheU16, u8)> from my previous comment which is its own type as you said)

eddyb · 2020-04-03T17:36:38Z

@ogoffart Okay, can you try this example?

I think this is the first example that actually tackles the differences in alignment sorting between structs and enum variants (note that with Dataful(u8, u8, X),, both are 8).

This is the kind of thing that I would have a hard time reasoning about in my head when I was looking into this, and why I believe it to be much saner to just try both layouts and pick the best.

ogoffart · 2020-04-03T18:06:56Z

size_of::<Enum<u32>>() = 12
size_of::<Enum<NicheU32>>() = 8

Same as before the patch. We use the niche because 32 bit is bigger than the 16 bits padding.

Note that size_of::<Enum<u32>>() is not 8 because we indeed disable the reordering optimisation if the discriminant need alignement. Why is that? That's a problem.

src/librustc_middle/ty/layout.rs

eddyb · 2020-04-03T18:08:09Z

If you want to keep things simple to reason about, we could limit niche optimizations to situations in which the tag would be a single byte, because we know the fields are sorted ascending in that case, and so any padding would "absorb" the tag, so it's easier to reason about than the general case.

It seems like a practical compromise to say that we don't optimize enums with too many variants (specifically >= 256), although I worry it might result in downstream (e.g. codegen or miri) hidden assumptions that will cause bugs if we ever generalize it again.

cc @rust-lang/compiler we might want to discuss this more broadly.

eddyb · 2020-04-03T18:13:00Z

Same as before the patch. We use the niche because 32 bit is bigger than the 16 bits padding.

Why "32-bit"? Are you saying it's because the niche has a lot more invalid values?
In that case, try this example, I only wanted to increase the size of that field, not change the niche.

nikomatsakis · 2020-04-03T18:55:56Z

@eddyb can you summarize a bit what you'd like input on? There's a lot of backscroll here =) It sounds like it would make things easier to limit niche optimizations to cases with <256 variants, and you're asking if that's ok?

We don't currently make many guarantees about the niche optimization that I'm aware of, but the unsafe-code-optimizations draft did require that "Option-like" enums would be optimized, but that's it. Option-like means exactly 2 variants.

ogoffart · 2020-04-12T06:53:39Z

Note that the pr 71045 only compute the two layout for enum with one single data varient, While this compute them also for enum with several data variant. Also it first need to perform the estimate.

ogoffart · 2020-04-12T11:10:17Z

There are some performance regressions. The difference could be attributed to:

the extra cost of estimating what is the largest and second largest variant.
the extra cost of computing the second layout including for the case where there is a second largest variant.
generally the fact that using this niche optimization for more structure makes the whole compiler slower.

If the reason are step 1 or 2, it can be mitigated or live with.
But if this is because the niche optimization actually make things slower it is perhaps not worth it, unless there might be do something really wrong in the generated code and that could be improved.

ogoffart · 2020-04-12T11:15:44Z

Could also be

the niche optimization cause the use of different code path in llvm(or rustc) that enable different set of optimizations which may make the compile time slower but may improve the speed of the resulting binary (or not)

trying some benchmark on the generated code would give us more information about which of these hypotheses is right.

eddyb · 2020-04-12T11:26:19Z

So the results on that separate PR are in, and they're neutral: #71045 (comment).

~~@ogoffart I did do something you didn't, to make it cheaper: I avoided calling tcx.intern_layout at all until the very end. Can you try the same, maybe that alleviates some of the difference?~~

ogoffart · 2020-04-12T11:46:22Z

@ogoffart I did do something you didn't, to make it cheaper: I avoided calling tcx.intern_layout at all until the very end

Since i do early return, the intern_layout is only done at the very end (on the actual return line, for both case).
So i don't see how that can make any difference.

What could makes a difference is that i compute the niche layout for more enums, while your test pr only computes it for the enum for which we were already doing niche optimization before (the ones with a single data variant)

Reduce the size of enums which have one bigger variant with a niche

…ggest niche

Since we do not care much about the case where we can use the niche, but that the fields cannot be re-ordered

Instead of using estimation euristic to find out if we should do one or the other layout, always perform the non-niche layout and, when possible, the niche layout, and compare the resulting layout. In my opinion, this is less efficient, but was requested in the review because it is hard to proof that the estimation is correct in every case.

ogoffart · 2020-04-16T08:58:07Z

I did some micro benchmark to understand the difference between the niche and the tag.
matching a enum with a niche. When there is only two variant, there is practically no difference as llvm is able to optimize the extra check away.
But if there are more than two match arms, LLVM won't optimize the code that compute the discriminant, and that has an impact on performance.

The code to get the discriminant from a bool niche has this prologue:

mov    (%rdi),%al
add    $0xfe,%al
movzbl %al,%ecx
lea    0x1(%rcx),%rdx
xor    %eax,%eax
cmp    $0x2,%cl
cmovb  %rdx,%rax

compared to simply movzbl (%rdi),%eax when not using the niche.

But the cmovb and the "normalization" of the discriminant is useless.

This could be something that LLVM could be improved on.
But i'm not sure this alone could explain the few percent regression we observe on some bechmark.

Maybe the best would be to not generate a "get discriminant" when doing a match. This would also give simpler code to LLVM to optimize with less instructions (which may also explain some of the slowdown. But the result does not seem to be worse for optimized build than for check).
This should probably be changed in librustc_mir_build/build/matches/test.rs in Builder::perform_test but at this point i don't know if we already have the layout and can generate a different code for the niche.

This does not yet realize the space savings it promises in the test case due to missing optimizations[1][2] in Rust, but should eventually manage to be barely larger than the payload arrays. [1]: rust-lang/rust#46213 [2]: rust-lang/rust#70477

eddyb · 2020-05-01T05:25:19Z

But i'm not sure this alone could explain the few percent regression we observe on some bechmark.

The benchmarks are compile times, not runtime. So any slowdowns they detect are the compiler doing more work, not necessarily any codegen'd code being inefficient.

(I'm really sorry I haven't come back to this already, the amount of different things happening in Rust got really overwhelming over the past few weeks)

Maybe the best would be to not generate a "get discriminant" when doing a match.

We used to have a variant-based Switch terminator in MIR, before we replaced it with a SwitchInt one. Maybe codegen could be clever and special-case SwitchInt(Discriminant(...)).
But I don't see how switching over a niche can be simplified except for a 2-case SwitchInt, anything else couldn't use a single comparison.

ogoffart · 2020-05-01T06:38:05Z

The benchmarks are compile times, not runtime. So any slowdowns they detect are the compiler doing more work, not necessarily any codegen'd code being inefficient.

I beg to differ. I believe it is both. AFAIK, the compiler is compiled with itself, so if the change include some big regression (or improvement) in code generation, it would therefore also be slower (or faster) to compile.

Maybe codegen could be clever and special-case SwitchInt(Discriminant(...)).
But I don't see how switching over a niche can be simplified except for a 2-case SwitchInt, anything else couldn't use a single comparison.

Yes, having a SwitchInt(Discriminant(...)) would allow the codegen to be smarter:

currently, this is generated (pseudo code)

let discriminent = if (niche in [begin...end]) { niche - offset } else { data_variant };
match(discriminent) {
   data_variant => ...,
   other_variant1 => ...
   other_variant2 => ...
}

And llvm is not good enough to remove the if or the -offset in all case.

What could be generated is directly

match(niche) {
   other_variant1+offset => ...
   other_variant2+offset => ...
    _ => ...   // (for the data_variant)
   // begin..end  =>  ...    // alternative if the switch is not exhaustive
}

(where other_variantx+offset is a constant)

Then the switch looks almost exactly the same as for the tag variant, (and maybe would also be easier or LLVM to process)

Anyway, my free time period is over and i won't have time to continue on this patch.
Feel free to close this PR.

rust-highfive assigned matthewjasper Mar 27, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 27, 2020

rust-highfive assigned eddyb and unassigned matthewjasper Mar 27, 2020

eddyb reviewed Mar 27, 2020

View reviewed changes

ogoffart commented Mar 27, 2020

View reviewed changes

ogoffart force-pushed the enum-layout-optim branch from 11f27ef to 75f667c Compare March 30, 2020 11:47

ogoffart force-pushed the enum-layout-optim branch from 75f667c to e2b268f Compare April 2, 2020 08:27

eddyb reviewed Apr 3, 2020

View reviewed changes

ogoffart commented Apr 3, 2020

View reviewed changes

src/librustc_middle/ty/layout.rs Outdated Show resolved Hide resolved

eddyb added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 3, 2020

ogoffart added 7 commits April 16, 2020 10:17

Put the largest niche first

23d815c

Fix the re-ordering of the field to put the niche first

059b19a

Use the niche optimisation if other enum variant are small enough

d58273b

Reduce the size of enums which have one bigger variant with a niche

Add test from rust-lang#63866

823d5b7

Only use the padding for the discriminent if it is bigger than the bi…

3ff98d9

…ggest niche

Simplify the detection of the largest variant

39e9dae

Since we do not care much about the case where we can use the niche, but that the fields cannot be re-ordered

ogoffart force-pushed the enum-layout-optim branch from 2b5f9a7 to 0d43af1 Compare April 16, 2020 08:34

crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2020

chrysn mentioned this pull request Apr 27, 2020

Avoid extra usize array by unioning next pointer into elements bugadani/Slots#1

Merged

joelpalmer removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 12, 2020

joelpalmer closed this May 12, 2020

ogoffart mentioned this pull request Aug 24, 2020

Resurrect #70477: "Use the niche optimization if other variant are small enough" #75866

Closed

ogoffart mentioned this pull request Oct 11, 2020

[DO NOT MERGE] Experiment: disable the niche optimization for most enums #77816

Closed

ogoffart mentioned this pull request Dec 13, 2021

RFC: Alignment niches for references types. rust-lang/rfcs#3204

Open

ogoffart mentioned this pull request Feb 18, 2022

Use niche-filling optimization even when multiple variants have data. #94075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the niche optimization if other variant are small enough #70477

Use the niche optimization if other variant are small enough #70477

ogoffart commented Mar 27, 2020

rust-highfive commented Mar 27, 2020

eddyb commented Mar 27, 2020

eddyb Mar 27, 2020

eddyb commented Mar 27, 2020

ogoffart Mar 27, 2020

eddyb Mar 27, 2020 •

edited

Loading

ogoffart Mar 28, 2020

ogoffart commented Mar 27, 2020

ogoffart commented Mar 28, 2020

bors commented Mar 30, 2020

bors commented Apr 1, 2020

eddyb commented Apr 2, 2020

ogoffart commented Apr 2, 2020

ogoffart commented Apr 2, 2020

eddyb commented Apr 3, 2020

ogoffart commented Apr 3, 2020 •

edited by pnkfelix

Loading

eddyb commented Apr 3, 2020

eddyb Apr 3, 2020

ogoffart Apr 3, 2020

eddyb commented Apr 3, 2020

ogoffart commented Apr 3, 2020 •

edited by eddyb

Loading

eddyb commented Apr 3, 2020

eddyb commented Apr 3, 2020

nikomatsakis commented Apr 3, 2020

ogoffart commented Apr 12, 2020

ogoffart commented Apr 12, 2020

ogoffart commented Apr 12, 2020

eddyb commented Apr 12, 2020 •

edited

Loading

ogoffart commented Apr 12, 2020

ogoffart commented Apr 16, 2020

eddyb commented May 1, 2020

ogoffart commented May 1, 2020

Use the niche optimization if other variant are small enough #70477

Use the niche optimization if other variant are small enough #70477

Conversation

ogoffart commented Mar 27, 2020

rust-highfive commented Mar 27, 2020

eddyb commented Mar 27, 2020

eddyb Mar 27, 2020

Choose a reason for hiding this comment

eddyb commented Mar 27, 2020

ogoffart Mar 27, 2020

Choose a reason for hiding this comment

eddyb Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

ogoffart Mar 28, 2020

Choose a reason for hiding this comment

ogoffart commented Mar 27, 2020

ogoffart commented Mar 28, 2020

bors commented Mar 30, 2020

bors commented Apr 1, 2020

eddyb commented Apr 2, 2020

ogoffart commented Apr 2, 2020

ogoffart commented Apr 2, 2020

eddyb commented Apr 3, 2020

ogoffart commented Apr 3, 2020 • edited by pnkfelix Loading

eddyb commented Apr 3, 2020

eddyb Apr 3, 2020

Choose a reason for hiding this comment

ogoffart Apr 3, 2020

Choose a reason for hiding this comment

eddyb commented Apr 3, 2020

ogoffart commented Apr 3, 2020 • edited by eddyb Loading

eddyb commented Apr 3, 2020

eddyb commented Apr 3, 2020

nikomatsakis commented Apr 3, 2020

ogoffart commented Apr 12, 2020

ogoffart commented Apr 12, 2020

ogoffart commented Apr 12, 2020

eddyb commented Apr 12, 2020 • edited Loading

ogoffart commented Apr 12, 2020

ogoffart commented Apr 16, 2020

eddyb commented May 1, 2020

ogoffart commented May 1, 2020

eddyb Mar 27, 2020 •

edited

Loading

ogoffart commented Apr 3, 2020 •

edited by pnkfelix

Loading

ogoffart commented Apr 3, 2020 •

edited by eddyb

Loading

eddyb commented Apr 12, 2020 •

edited

Loading