Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use try_fold instead of try_for_each to reduce compile time #64885

Merged
merged 1 commit into from Oct 2, 2019

Conversation

@andjo403
Copy link
Contributor

commented Sep 28, 2019

as it was stated in #64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from #64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost

@Centril

This comment has been minimized.

Copy link
Member

commented Sep 28, 2019

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

Copy link

commented Sep 28, 2019

Awaiting bors try build completion

@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2019

⌛️ Trying commit fc87c00 with merge 4a55e7b...

bors added a commit that referenced this pull request Sep 28, 2019
[WIP] use try_fold instead of try_for_each to reduce compile time

as it was stated in #64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from #64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

this PR contains the changes from #64600 to make the compare fair with #64572

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost
@bors

This comment has been minimized.

Copy link
Contributor

commented Sep 29, 2019

☀️ Try build successful - checks-azure
Build commit: 4a55e7b (4a55e7b6a6a7beddaf5a2f71ee4d06f3a829524e)

@rust-timer

This comment has been minimized.

Copy link

commented Sep 29, 2019

Queued 4a55e7b with parent 488381c, future comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented Sep 29, 2019

Finished benchmarking try commit 4a55e7b, comparison URL.

@andjo403

This comment has been minimized.

Copy link
Contributor Author

commented Sep 29, 2019

some procent better than only #64600 but still some way to go until the diffs from #64572 .
do not know how god it is to compare the procent changes for all the perf runs but as there is multiple bases for the PRs it is not possible to compare directly.

@nnethercote

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2019

@andjo403: #64600 has now landed. Could you rebase and update the code here? I think only the first commit will be necessary now, and we can get a fair comparison. Thanks.

removes two functions to inline by combining the check functions and extra call to try_for_each
@andjo403 andjo403 force-pushed the andjo403:iter branch from fc87c00 to 8737061 Oct 1, 2019
@andjo403

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2019

@nnethercote rebased and removed the second commit

@nnethercote

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2019

Thanks!

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

Copy link

commented Oct 1, 2019

Awaiting bors try build completion

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2019

⌛️ Trying commit 8737061 with merge 40a3c41...

bors added a commit that referenced this pull request Oct 1, 2019
[WIP] use try_fold instead of try_for_each to reduce compile time

as it was stated in #64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from #64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

this PR contains the changes from #64600 to make the compare fair with #64572

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost
@bluss

This comment has been minimized.

Copy link
Member

commented Oct 1, 2019

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.

Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2019

☀️ Try build successful - checks-azure
Build commit: 40a3c41 (40a3c41fdfde051926f256564c247e2ce94a667e)

if f(x) { LoopState::Continue(()) }
else { LoopState::Break(()) }
}
}

self.try_for_each(check(f)) == LoopState::Continue(())
self.try_fold((), check(f)) == LoopState::Continue(())

This comment has been minimized.

Copy link
@bluss

bluss Oct 1, 2019

Member

Thoughts on equality check vs pattern matching here, can it have an effect or none at all?

This comment has been minimized.

Copy link
@andjo403

andjo403 Oct 1, 2019

Author Contributor

made a quick diff in godbolt and there is less code to inline so that is something that I can do
ZN72$LT$example..LoopState$LT$C$C$B$GT$$u20$as$u20$core..cmp..PartialEq$GT$2eq17h37dbcaf2df999e09E is a lot to inline

This comment has been minimized.

Copy link
@scottmcm

scottmcm Oct 1, 2019

Member

I would hope it has no effect, since LoopState<(),()> is an i1 in LLVM...

...and it is in -O, but very different in debug: https://rust.godbolt.org/z/LKOpZ7

Looks like the PartialEq::eq that gets generated is pretty bad, and it's still bad removing the generics: https://rust.godbolt.org/z/o6Nuaw Could there be a "this is a field-less enum so just compare the discriminants" path in the derive? It looks, unfortunately, like as u8 == 1 is the shortest-emitted-IR way to do these checks. And we're avoiding the derives in other places too, like

rust/src/libcore/cmp.rs

Lines 632 to 638 in 702b45e

#[stable(feature = "rust1", since = "1.0.0")]
impl Ord for Ordering {
#[inline]
fn cmp(&self, other: &Ordering) -> Ordering {
(*self as i32).cmp(&(*other as i32))
}
}

This comment has been minimized.

Copy link
@bluss

bluss Oct 1, 2019

Member

Oh interesting. So we could improve here just by implementing PartialEq manually, or even adding a separate method for just discriminant comparison. But then pattern matching works well too. Like, just a method for ".is_continue()"

This comment has been minimized.

Copy link
@andjo403

andjo403 Oct 1, 2019

Author Contributor

but the pattern match avoids having a function to inline completely as long as a function is used the llvm-ir will contain a call and a function

@nnethercote

This comment has been minimized.

Copy link
Contributor

commented Oct 1, 2019

My rust-timer command above didn't work. Let's try doing it a different way:

@rust-timer build 40a3c41

@rust-timer

This comment has been minimized.

Copy link

commented Oct 1, 2019

Queued 40a3c41 with parent 42ec683, future comparison URL.

@andjo403

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2019

Assuming we are using try_fold etc everywhere, we can still manually desugar to structs implementing FnMut instead of using closures.
Not the best abstraction level, but doesn't it look like we could save one generic item per iterator method then? Where we currently have the check functions.

I do not understand can you show some example?

@bluss

This comment has been minimized.

Copy link
Member

commented Oct 1, 2019

@andjo403 I have an example of a before-after change like that, that I made as PoC. Rust -Zprint-mono-items=lazy tells me this uses 1 less generic function (Before we use check<T> and the closure in the check body, after we use only Fun::call_mut (call_once is never used).) Regrettably it's from a smaller similar iterator, not the exact code in libcore.

Code here https://gist.github.com/b94c565bc5ba37206112c150b8b1cc20

It doesn't look great - maybe a macro could improve that? In fact the code looks so bad, I'm unsure we'd want to do that. 🙂

@andjo403

This comment has been minimized.

Copy link
Contributor Author

commented Oct 1, 2019

thanks @bluss for the example and yes that code was hard to understand

@bluss

This comment has been minimized.

Copy link
Member

commented Oct 1, 2019

It is equivalent to desugaring the original closure, without the "check(f) hack", but also without capturing extraneous type parameters. So a regular closure would be the same, when #46477 is fixed.

@rust-timer

This comment has been minimized.

Copy link

commented Oct 1, 2019

Finished benchmarking try commit 40a3c41, comparison URL.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

commented Oct 2, 2019

Crazy bots! I think I know what's wrong though, will try and fix in a bit, and silence bot for now.

@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-lang rust-lang deleted a comment from rust-timer Oct 2, 2019
@rust-timer

This comment has been minimized.

Copy link

commented Oct 2, 2019

Finished benchmarking try commit 40a3c41, comparison URL.

@nnethercote

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2019

The results are good: up to 7.5% win for clap, and lots of sub-1% wins. Really good for such a simple change!

@scottmcm scottmcm self-assigned this Oct 2, 2019
@scottmcm scottmcm changed the title [WIP] use try_fold instead of try_for_each to reduce compile time use try_fold instead of try_for_each to reduce compile time Oct 2, 2019
@scottmcm

This comment has been minimized.

Copy link
Member

commented Oct 2, 2019

I'm happy with this as-is (we can explore other things like #64885 (comment) in a follow-up PR), so

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2019

📌 Commit 8737061 has been approved by scottmcm

tmandry added a commit to tmandry/rust that referenced this pull request Oct 2, 2019
use try_fold instead of try_for_each to reduce compile time

as it was stated in rust-lang#64572 that the biggest gain was due to less code was generated I tried to reduce the number of functions to inline by using try_fold direct instead of calling try_for_each that calls try_fold.

as there is some gains with using the try_fold function this is maybe a way forward.
when I tried to compile the clap-rs benchmark I get times gains only some % from rust-lang#64572

there is more function that use eg. fold that calls try_fold that also can be changed but the question is how mush "duplication" that is tolerated in std to give faster compile times

can someone start a perf run?

cc @nnethercote @scottmcm @bluss
r? @ghost
bors added a commit that referenced this pull request Oct 2, 2019
Rollup of 11 pull requests

Successful merges:

 - #64649 (Avoid ICE on return outside of fn with literal array)
 - #64722 (Make all alt builders produce parallel-enabled compilers)
 - #64801 (Avoid `chain()` in `find_constraint_paths_between_regions()`.)
 - #64805 (Still more `ObligationForest` improvements.)
 - #64840 (SelfProfiler API refactoring and part one of event review)
 - #64885 (use try_fold instead of try_for_each to reduce compile time)
 - #64942 (Fix clippy warnings)
 - #64952 (Update cargo.)
 - #64974 (Fix zebra-striping in generic dataflow visualization)
 - #64978 (Fully clear `HandlerInner` in `Handler::reset_err_count`)
 - #64979 (Update books)

Failed merges:

 - #64959 (syntax: improve parameter without type suggestions)

r? @ghost
@bors bors merged commit 8737061 into rust-lang:master Oct 2, 2019
5 checks passed
5 checks passed
homu Test successful
Details
pr Build #20191001.15 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (LinuxTools) LinuxTools succeeded
Details
@andjo403 andjo403 deleted the andjo403:iter branch Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.