Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify effects of lto, thinlto and codegen-units #48518

Open
matthiaskrgr opened this issue Feb 24, 2018 · 6 comments
Open

clarify effects of lto, thinlto and codegen-units #48518

matthiaskrgr opened this issue Feb 24, 2018 · 6 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@matthiaskrgr
Copy link
Member

There seems to be a lot of confusion about performance implications of lto, thinlto, codegen-units and default optimizations of build targets, maybe we can clarify this somehow.

Where would be the best place for this?

@estebank estebank added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Feb 25, 2018
@teiesti
Copy link
Contributor

teiesti commented Feb 28, 2018

I really appreciate the idea to improve documentation on this front. My current sources of information are

Maybe, we can use this issue to collect all the places where information about optimizing Rust executable can be found. In the end we could probably write a section (an appendix?) for TRPL.

We could also collect questions, e.g.

  1. What is the difference between ThinLTO and LTO? How do they interact?
  2. Do I need to enable ThinLTO or is it done by default?
  3. How does ThinLTO work?

(I've actually long wanted to find an answer to these questions.)

Edit: I've just found the answers. See also.

@teiesti
Copy link
Contributor

teiesti commented Mar 1, 2018

After rethinking it, I don't believe a section in TRPL would be a good idea!

@matthiaskrgr
Copy link
Member Author

I think the important items (and their relations) to explain are

lto thinlto
codegen-units = 1
codegen-units = n
opt-level = {0-3, ("s", "z")}

It's also notable that releases (1.24, 1.25, 1.26) have different behaviours/bugs (for example #48163 sped up monolithic lto link time (compiletime) significantly).

From what I understand, monolithic lto merges all the object files into one huge translation unit and the just pretends we have the entire program inlined into a single file while running its optimizations on everything at once sequentially.

Thinlto, while compiling, writes interesting metadata for modules/functions (lets call it snippets) into an index.
While doing the link time optimizations, it optimizes snippets in parallel while only loading into ram related snippets metadata (and not everything) which makes it use less memory than monolithic lto (no need to load everything at once) and scalable (optimize N snippets at a time).
In the future we might even get incremental thinlto (only reoptimize snippets that or whose dependencies changed, see #47660 ).
During a talk on thinlto it was said thinlto only performs a subset of the optimizations that monolithic lto is doing, however since it is lean in memory usage and optimizes in parallel, it can do its optimizations more aggressively without noticeable increase in compiletime (or out of memory exceptions :P )

By default cargo build --release builds with opt-level=3, however lto may also be desired when we want to have very small binary sizes, to combine opt-level="z" with lto=true.
We should probably mention this as well.

[profile.release]
lto=x
codegen-units=1
opt-level=y

size of cargo binary in bytes

monolithic lto thinlto
opt-level = 3 11709976 12637656
opt-level = "s" 10059336 11355552
opt-level = "z" 10315192 11501104

(iirc "z" should actually optimize for size even more aggressively than "s" so looks like something is a bit weird. :/ )

Last but not least we have the codegen units, and split up a crate into parts and compile it in parallel (before (thin)lto).
I guess codegen-units = 1 is a bit like monolithic lto and codegen-units > 1 is a bit like thinlto.
There are tickets out there which seem to indicate that several several codegen-units worsens performance (#47665 , #47745 ..)
Currently it seems by default we split up every crate into several codegen units and compile it in parallel while at the same time compiling several crates in parallel.
(This is kind of unnecessary parallelism in my opinion but chosing a more reasonable number of codegen-units taking the host machines cpu core count into account will mean we will only have reproducible builds on machines with identical core numbers which is also bad.... :( )

Please correct me if I'm wrong!!

Interesting links:
thinlto: https://www.youtube.com/watch?v=p9nH2vZ2mNo
thinlto: http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html
lto: https://llvm.org/docs/LinkTimeOptimization.html

@jkordish jkordish added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Mar 7, 2018
@frewsxcv
Copy link
Member

The documentation team has been talking about creating new guide for the rustc CLI, similar to the CLI section in the Rustdoc book. Tracking issue: rust-docs/team#11. This might be a good place to talk about lto and codegen-units.

@steveklabnik
Copy link
Member

Update: this has now been merged, and lives here: https://github.com/rust-lang/rust/tree/master/src/doc/rustc

@steveklabnik steveklabnik added the E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. label May 28, 2018
@frewsxcv
Copy link
Member

frewsxcv commented Jun 2, 2018

Update: this has now been merged, and lives here: https://github.com/rust-lang/rust/tree/master/src/doc/rustc

In fact, the codegen-units and lto options are already mentioned in the rustc book:

But after skimming through the previous comments in this issue, seems like there's room to expand the descriptions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants