Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resurrect #70477: "Use the niche optimization if other variant are small enough" #75866

Closed
wants to merge 2 commits into from

Conversation

erikdesjardins
Copy link
Contributor

Fixes #46213, fixes #66029.

r? @eddyb

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 24, 2020
@erikdesjardins
Copy link
Contributor Author

This needs a perf run, since results in the original PR were negative.

@ogoffart
Copy link
Contributor

Last time i had a look at this issue, the benchmark showed performances regressions.
As explained here: #70477 (comment) , my hypotheses is that this change caused a runtime regression because the generated MIR code for match on enum with niche was not as optimal as it could be.

@bors
Copy link
Contributor

bors commented Aug 30, 2020

☔ The latest upstream changes (presumably #74862) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514 jyn514 added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-layout Area: Memory layout of types labels Sep 8, 2020
@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 25, 2020
@Dylan-DPC-zz
Copy link

let's do a perf run first:

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Sep 25, 2020

⌛ Trying commit ef282bb1cd28a6a4ce0b8dda66dc4a3250930a77 with merge d85bea00451112d262ee8d42afe2819e1153be8e...

@bors
Copy link
Contributor

bors commented Sep 25, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: d85bea00451112d262ee8d42afe2819e1153be8e (d85bea00451112d262ee8d42afe2819e1153be8e)

@rust-timer
Copy link
Collaborator

Queued d85bea00451112d262ee8d42afe2819e1153be8e with parent 10ef7f9, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (d85bea00451112d262ee8d42afe2819e1153be8e): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

ogoffart and others added 2 commits September 26, 2020 20:02
* Put the largest niche first

* Add test from issue 63866

* Add test for enum of sized/unsized

* Prefer fields that are already earlier in the struct
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Sep 27, 2020

Perf is about the same as the original PR, regressions up to 4-5% on a lot of benchmarks.

@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Sep 27, 2020

Pushed another commit to disable the optimization and just reorder fields (tests will fail but the build should pass), let's do another perf run to see if that's the problem.

@Dylan-DPC-zz
Copy link

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Oct 1, 2020

⌛ Trying commit d5ef4b6 with merge 4cdfe129f8f8acf6532f8252997ece91030dd135...

@bors
Copy link
Contributor

bors commented Oct 1, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 4cdfe129f8f8acf6532f8252997ece91030dd135 (4cdfe129f8f8acf6532f8252997ece91030dd135)

@rust-timer
Copy link
Collaborator

Queued 4cdfe129f8f8acf6532f8252997ece91030dd135 with parent ef663a8, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (4cdfe129f8f8acf6532f8252997ece91030dd135): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@ogoffart
Copy link
Contributor

ogoffart commented Oct 2, 2020

The result seems to indicate that the problem is not caused by the re-ordering if the field and the change in the padding.
This seems to confirm what i wrote #70477 (comment) that the problem is because match when using the niche optimization is slower. It could either be slower for LLVM to compile, or the resulting code might be slower as well as there are more branches.

Either way, we could verify this hypotheses if we try to disable completely the niche optimization optimization (or only for type bigger than 8 bytes or so) and see if that improve the compile time performance.

I guess the fix would be to improve the code generation for the match.
I foresee a quite big improvement (even without the niche optimization) if doing what i try to explain in #70477 (comment)

In the perform_test function, we should not create a TeminatorKind::SwitchInt over Rvalue::Discriminant because the getting the discriminant will do a few branching and subtractions, before doing the next switch.

self.cfg.push_assign(block, source_info, discr, Rvalue::Discriminant(place));
assert_eq!(values.len() + 1, targets.len());
self.cfg.terminate(
block,
source_info,
TerminatorKind::SwitchInt {
discr: Operand::Move(discr),
switch_ty: discr_ty,
values: From::from(values),
targets,
},
);

Instead, there should probably be a new TerminatorKind::Switch which maps the TestKind::Switch and keep the information about the variant, So codegen has more information to produce optimal code.

Currently, codegen generates something that looks like this (pseudo_code):

// this condition produces quite some instruction that llvm then need to try optimize. 
let discriminent = if (niche in [begin...end]) { niche - offset } else { data_variant };
match(discriminent) {
   data_variant => ...,
   other_variant1 => ...
   other_variant2 => ...
}

Instead, the codegen should do this:

match(niche) {
   other_variant1+offset => ...
   other_variant2+offset => ...
    _ => ...   // (for the data_variant)
   // begin..end  =>  ...    // alternative if the switch is not exhaustive
}

@ogoffart
Copy link
Contributor

#77816 did a performence test to disable the niche optimization, showing small performence improvements. So I would conclude that the problem is indeed that the code generated for the niche enum is not optimal, and we should improve it before enabling the niche optimization in more cases.

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 30, 2020
@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2020
@Dylan-DPC-zz
Copy link

Dylan-DPC-zz commented Dec 8, 2020

@erikdesjardins any updates? (on the CI failure)

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 8, 2020
@erikdesjardins
Copy link
Contributor Author

This can't be merged in its current state anyways, it's too much of a perf regression. It's likely that a codegen change (as described in #75866 (comment)) will be necessary in order to land this. I won't have time to look into that personally in the near future, so I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: Memory layout of types S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet