Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upcodegen-units + ThinLTO is not as good as codegen-units = 1 #47745
Comments
nagisa
added
I-slow
C-tracking-issue
labels
Jan 25, 2018
This comment has been minimized.
This comment has been minimized.
|
Thanks for filing this @nagisa =) |
nikomatsakis
added
the
T-compiler
label
Jan 25, 2018
This comment has been minimized.
This comment has been minimized.
|
Indeed thanks! I'll try to take a closer look at this when we've upgraded LLVM |
japaric
referenced this issue
Jan 26, 2018
Closed
ThinLTO bloats size of bare metal programs by up to 1200% #47770
alexcrichton
referenced this issue
Feb 12, 2018
Open
Performance regressions in nightly btwn nightly-2017-12-25 and nightly-2017-12-26 #48155
This comment has been minimized.
This comment has been minimized.
|
By the way there is a great talk about how thinlto is designed here: https://www.youtube.com/watch?v=p9nH2vZ2mNo in case people are curious. :) |
pietroalbini
referenced this issue
Feb 20, 2018
Open
Slower performance caused only by using LTO #48371
This was referenced Feb 22, 2018
matthiaskrgr
referenced this issue
Mar 1, 2018
Open
clarify effects of lto, thinlto and codegen-units #48518
This comment has been minimized.
This comment has been minimized.
robsmith11
commented
Mar 3, 2018
|
Matrix multiplication is slower with thinlto + multiple codege-units using https://github.com/bluss/matrixmultiply . I can create a minimal example if needed. |
This comment has been minimized.
This comment has been minimized.
|
I've always thought that there should be another Cargo profile, something like: # The publish profile, used for `cargo build --publish`.
[profile.publish]
# (...) everything else the same as profile.release except:
lto = true # Enable full link-time optimization.
codegen-units = 1 # Use only 1 codegen-unit to enable full optimizations.Because I feel like there should be a distinction between release builds the developer compiles on their local machine during development (not debug builds, but "fast" release builds) and truly publishable builds (like, for example the version of Firefox that is released for public consumption) in which case sacrificing build time once is more acceptable. I realize the status-quo for C/C++ is to also not enable LTO by default, but it just seems strange to me to have to opt into these kinds of performance enhancements when the cost (for published binaries) is a one-time compile time cost. |
This comment has been minimized.
This comment has been minimized.
HadrienG2
commented
Dec 15, 2018
•
|
I think "publish" is uncomfortably close to "release". But I could get behind a "debug/optimize/release" terminology proposal. |
killercup
referenced this issue
Jan 8, 2019
Closed
Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437
This comment has been minimized.
This comment has been minimized.
forrestthewoods
commented
Jan 29, 2019
|
Historically I've used debug, internal, release, retail. Plus a few variations with "add-ons" such as "Retail-Logging" or "Retail-Instrumented". For Rust instead of 'Retail' I'd propose MaxSpeed. Whatever it's called, a profile with lto=true and codegen-units=1 is definitely a good idea! |
brson
referenced this issue
Jan 31, 2019
Open
Consider ThinLTO vs LTO vs no LTO with respect to compile time and runtime performance #4163
This comment has been minimized.
This comment has been minimized.
|
@johnthagen I agree that today's 'release' profile seems to have two use cases that want different configurations. Is it possible to create custom cargo profiles? Is there an upstream cargo issue for this? |
This comment has been minimized.
This comment has been minimized.
|
@brson It looks like it's not yet implemented, but it is has been discussed for several years.
Perhaps @matklad has some more up-to-date information on this? |
This comment has been minimized.
This comment has been minimized.
|
My understanding is that "custom profiles" are pretty far-away at this moment (we need to do profile overrides first), however we do have config profiles nightly features, which allows overriding profile via |
nagisa commentedJan 25, 2018
•
edited
We recently had a fair amount of reports about code generation quality drop. One of the recent causes for the quality drop is the enablement of codegen-units and ThinLTO.
It seems that ThinLTO is not capable of producing results matching those obtained by compiling without codegen-units in the first place.
The list of known reports follows:
Improvements to ThinLTO quality are inbound with the soon-to-happen LLVM upgrade(s), however those do not help sufficiently, it would be nice to figure out why ThinLTO is not doing good enough job.
cc @alexcrichton @nikomatsakis