codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

nagisa · 2018-01-25T17:27:22Z

We recently had a fair amount of reports about code generation quality drop. One of the recent causes for the quality drop is the enablement of codegen-units and ThinLTO.

It seems that ThinLTO is not capable of producing results matching those obtained by compiling without codegen-units in the first place.

The list of known reports follows:

Improvements to ThinLTO quality are inbound with the soon-to-happen LLVM upgrade(s), however those do not help sufficiently, it would be nice to figure out why ThinLTO is not doing good enough job.

cc @alexcrichton @nikomatsakis

nikomatsakis · 2018-01-25T17:34:37Z

Thanks for filing this @nagisa =)

alexcrichton · 2018-01-26T04:13:46Z

Indeed thanks! I'll try to take a closer look at this when we've upgraded LLVM

matthiaskrgr · 2018-02-15T21:45:07Z

By the way there is a great talk about how thinlto is designed here: https://www.youtube.com/watch?v=p9nH2vZ2mNo in case people are curious. :)

robsmith11 · 2018-03-03T03:52:56Z

Matrix multiplication is slower with thinlto + multiple codege-units using https://github.com/bluss/matrixmultiply .

I can create a minimal example if needed.

johnthagen · 2018-12-15T02:41:44Z

I've always thought that there should be another Cargo profile, something like:

# The publish profile, used for `cargo build --publish`.
[profile.publish]
# (...) everything else the same as profile.release except:
lto = true        # Enable full link-time optimization.
codegen-units = 1 # Use only 1 codegen-unit to enable full optimizations.

Because I feel like there should be a distinction between release builds the developer compiles on their local machine during development (not debug builds, but "fast" release builds) and truly publishable builds (like, for example the version of Firefox that is released for public consumption) in which case sacrificing build time once is more acceptable.

I realize the status-quo for C/C++ is to also not enable LTO by default, but it just seems strange to me to have to opt into these kinds of performance enhancements when the cost (for published binaries) is a one-time compile time cost.

HadrienG2 · 2018-12-15T11:00:31Z

I think "publish" is uncomfortably close to "release". But I could get behind a "debug/optimize/release" terminology proposal.

forrestthewoods · 2019-01-29T19:06:58Z

Historically I've used debug, internal, release, retail.

Plus a few variations with "add-ons" such as "Retail-Logging" or "Retail-Instrumented".

For Rust instead of 'Retail' I'd propose MaxSpeed. Whatever it's called, a profile with lto=true and codegen-units=1 is definitely a good idea!

brson · 2019-01-31T02:11:56Z

@johnthagen I agree that today's 'release' profile seems to have two use cases that want different configurations. Is it possible to create custom cargo profiles? Is there an upstream cargo issue for this?

johnthagen · 2019-01-31T11:53:32Z

@brson It looks like it's not yet implemented, but it is has been discussed for several years.

Issue Request (from 2015): Support custom profiles cargo#2007
@aturon wrote in Feb 2018 about deprecating profiles and using "workflows" instead: Cargo profile dependencies rfcs#2282 (comment)
Another related blog post by @aturon: https://aturon.github.io/2018/04/05/workflows/

Perhaps @matklad has some more up-to-date information on this?

matklad · 2019-01-31T12:01:27Z

My understanding is that "custom profiles" are pretty far-away at this moment (we need to do profile overrides first), however we do have config profiles nightly features, which allows overriding profile via .cargo/config. This might be used, for example, to specify codegen-units=1 on the build-server which produces release artifacts.

brson · 2019-02-10T02:08:23Z

Thanks @johnthagen @matklad for the leads!

pnkfelix · 2022-05-27T14:19:20Z

Visiting for T-compiler backlog bonanza, since it was tagged as C-tracking-issue (perhaps erroneously)

I think at this point the disparity between codegen-units + ThinLTO vs codegen-units=1 is, to some degree, something that we are accepting as a "fact of life"
We do have a problem in that people are surprising that --release does not produce the most optimal code possible while benchmarking. But, again, assuming that the aforementioned disparity is "fact of life", we then would have to address the "--release surprise" via other means; we cannot make --release imply codegen-units=1 without severely regressing compilation performance for many users.
It would probably be good if we had benchmark data that tracking the disparity between codegen-units+ThinLTO vs codegen-units=1, just so we have some idea of how big the problem is, and whether it is getting better or worse
It would also be good to have official documentation on how to tune your settings for "best object performance" vs "usable compilation times"

pnkfelix · 2022-05-27T14:20:44Z

@rustbot label: -C-tracking-issue

pnkfelix · 2022-05-27T14:22:16Z

Also, given that the original point of the issue was to determine why ThinLTO didn't seem to do a good enough job, that seems like a question that is well-suited for wg-llvm.

@rustbot label: A-llvm

nagisa added I-slow Issue: Problems and improvements with respect to performance of generated code. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jan 25, 2018

nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 25, 2018

japaric mentioned this issue Jan 26, 2018

ThinLTO bloats size of bare metal programs by up to 1200% #47770

Closed

ollie27 mentioned this issue Feb 4, 2018

Update RELEASES.md for 1.24.0 #47286

Merged

alexcrichton mentioned this issue Feb 12, 2018

Performance regressions in nightly btwn nightly-2017-12-25 and nightly-2017-12-26 #48155

Closed

pietroalbini mentioned this issue Feb 20, 2018

Slower performance caused only by using LTO #48371

Open

This was referenced Feb 22, 2018

CPU usage regression for ripgrep #48257

Closed

codegen-units=16 doubles compile times for Firefox #48233

Closed

matthiaskrgr mentioned this issue Mar 1, 2018

clarify effects of lto, thinlto and codegen-units #48518

Open

est31 mentioned this issue Aug 31, 2018

35% performance regression in generated code since 1.24 #53833

Open

est31 mentioned this issue Oct 28, 2018

38x performance regression in lewton since rust 1.26 #55446

Closed

killercup mentioned this issue Jan 8, 2019

Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437

Closed

brson mentioned this issue Jan 31, 2019

Consider ThinLTO vs LTO vs no LTO with respect to compile time and runtime performance tikv/tikv#4163

Closed

johnthagen mentioned this issue Jan 31, 2019

Support custom profiles rust-lang/cargo#2007

Closed

brson mentioned this issue Feb 10, 2019

Figure out how to add a second "release" profile for "dev+optimized" builds tikv/tikv#4189

Closed

mati865 mentioned this issue Jan 30, 2020

Strange perforamnce drops with const literals in closures #68632

Closed

benvanik mentioned this issue Nov 20, 2020

Use LTO for LLVM AOT to avoid the need for code generation in iree-opt/iree-translate iree-org/iree#3736

Open

3 tasks

Absolucy mentioned this issue Jun 6, 2021

Improve optimization in release profile Putnam3145/auxmos#9

Merged

This was referenced Mar 13, 2022

limit codegen-units to get the maximum optimization openethereum/openethereum#628

Open

limit codegen-units to get the maximum optimization use-ink/ink#1179

Closed

limit codegen-units to get the maximum optimization paritytech/smoldot#2138

Merged

rustbot removed the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label May 27, 2022

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label May 27, 2022

hussein-aitlahcen mentioned this issue Dec 6, 2022

CU-3bfmmke - XCVM Gateway ComposableFi/composable#2396

Merged

link2xt mentioned this issue Jun 8, 2023

size increase of core116 deltachat/deltachat-core-rust#4463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

nagisa commented Jan 25, 2018 •

edited

Loading

nikomatsakis commented Jan 25, 2018

alexcrichton commented Jan 26, 2018

matthiaskrgr commented Feb 15, 2018

robsmith11 commented Mar 3, 2018

johnthagen commented Dec 15, 2018 •

edited

Loading

HadrienG2 commented Dec 15, 2018 •

edited

Loading

forrestthewoods commented Jan 29, 2019

brson commented Jan 31, 2019 •

edited

Loading

johnthagen commented Jan 31, 2019 •

edited

Loading

matklad commented Jan 31, 2019

brson commented Feb 10, 2019

pnkfelix commented May 27, 2022

pnkfelix commented May 27, 2022

pnkfelix commented May 27, 2022 •

edited

Loading

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

Comments

nagisa commented Jan 25, 2018 • edited Loading

nikomatsakis commented Jan 25, 2018

alexcrichton commented Jan 26, 2018

matthiaskrgr commented Feb 15, 2018

robsmith11 commented Mar 3, 2018

johnthagen commented Dec 15, 2018 • edited Loading

HadrienG2 commented Dec 15, 2018 • edited Loading

forrestthewoods commented Jan 29, 2019

brson commented Jan 31, 2019 • edited Loading

johnthagen commented Jan 31, 2019 • edited Loading

matklad commented Jan 31, 2019

brson commented Feb 10, 2019

pnkfelix commented May 27, 2022

pnkfelix commented May 27, 2022

pnkfelix commented May 27, 2022 • edited Loading

nagisa commented Jan 25, 2018 •

edited

Loading

johnthagen commented Dec 15, 2018 •

edited

Loading

HadrienG2 commented Dec 15, 2018 •

edited

Loading

brson commented Jan 31, 2019 •

edited

Loading

johnthagen commented Jan 31, 2019 •

edited

Loading

pnkfelix commented May 27, 2022 •

edited

Loading