Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

Open
nagisa opened this issue Jan 25, 2018 · 14 comments
Open

codegen-units + ThinLTO is not as good as codegen-units = 1 #47745

nagisa opened this issue Jan 25, 2018 · 14 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nagisa
Copy link
Member

nagisa commented Jan 25, 2018

We recently had a fair amount of reports about code generation quality drop. One of the recent causes for the quality drop is the enablement of codegen-units and ThinLTO.

It seems that ThinLTO is not capable of producing results matching those obtained by compiling without codegen-units in the first place.

The list of known reports follows:

Improvements to ThinLTO quality are inbound with the soon-to-happen LLVM upgrade(s), however those do not help sufficiently, it would be nice to figure out why ThinLTO is not doing good enough job.

cc @alexcrichton @nikomatsakis

@nagisa nagisa added I-slow Issue: Problems and improvements with respect to performance of generated code. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jan 25, 2018
@nikomatsakis
Copy link
Contributor

Thanks for filing this @nagisa =)

@nikomatsakis nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 25, 2018
@alexcrichton
Copy link
Member

Indeed thanks! I'll try to take a closer look at this when we've upgraded LLVM

@matthiaskrgr
Copy link
Member

By the way there is a great talk about how thinlto is designed here: https://www.youtube.com/watch?v=p9nH2vZ2mNo in case people are curious. :)

@robsmith11
Copy link

Matrix multiplication is slower with thinlto + multiple codege-units using https://github.com/bluss/matrixmultiply .

I can create a minimal example if needed.

@johnthagen
Copy link
Contributor

johnthagen commented Dec 15, 2018

I've always thought that there should be another Cargo profile, something like:

# The publish profile, used for `cargo build --publish`.
[profile.publish]
# (...) everything else the same as profile.release except:
lto = true        # Enable full link-time optimization.
codegen-units = 1 # Use only 1 codegen-unit to enable full optimizations.

Because I feel like there should be a distinction between release builds the developer compiles on their local machine during development (not debug builds, but "fast" release builds) and truly publishable builds (like, for example the version of Firefox that is released for public consumption) in which case sacrificing build time once is more acceptable.

I realize the status-quo for C/C++ is to also not enable LTO by default, but it just seems strange to me to have to opt into these kinds of performance enhancements when the cost (for published binaries) is a one-time compile time cost.

@HadrienG2
Copy link

HadrienG2 commented Dec 15, 2018

I think "publish" is uncomfortably close to "release". But I could get behind a "debug/optimize/release" terminology proposal.

@forrestthewoods
Copy link

Historically I've used debug, internal, release, retail.

Plus a few variations with "add-ons" such as "Retail-Logging" or "Retail-Instrumented".

For Rust instead of 'Retail' I'd propose MaxSpeed. Whatever it's called, a profile with lto=true and codegen-units=1 is definitely a good idea!

@brson
Copy link
Contributor

brson commented Jan 31, 2019

@johnthagen I agree that today's 'release' profile seems to have two use cases that want different configurations. Is it possible to create custom cargo profiles? Is there an upstream cargo issue for this?

@johnthagen
Copy link
Contributor

johnthagen commented Jan 31, 2019

@brson It looks like it's not yet implemented, but it is has been discussed for several years.

Perhaps @matklad has some more up-to-date information on this?

@matklad
Copy link
Member

matklad commented Jan 31, 2019

My understanding is that "custom profiles" are pretty far-away at this moment (we need to do profile overrides first), however we do have config profiles nightly features, which allows overriding profile via .cargo/config. This might be used, for example, to specify codegen-units=1 on the build-server which produces release artifacts.

@brson
Copy link
Contributor

brson commented Feb 10, 2019

Thanks @johnthagen @matklad for the leads!

@pnkfelix
Copy link
Member

Visiting for T-compiler backlog bonanza, since it was tagged as C-tracking-issue (perhaps erroneously)

  • I think at this point the disparity between codegen-units + ThinLTO vs codegen-units=1 is, to some degree, something that we are accepting as a "fact of life"
  • We do have a problem in that people are surprising that --release does not produce the most optimal code possible while benchmarking. But, again, assuming that the aforementioned disparity is "fact of life", we then would have to address the "--release surprise" via other means; we cannot make --release imply codegen-units=1 without severely regressing compilation performance for many users.
  • It would probably be good if we had benchmark data that tracking the disparity between codegen-units+ThinLTO vs codegen-units=1, just so we have some idea of how big the problem is, and whether it is getting better or worse
  • It would also be good to have official documentation on how to tune your settings for "best object performance" vs "usable compilation times"

@pnkfelix
Copy link
Member

@rustbot label: -C-tracking-issue

@rustbot rustbot removed the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label May 27, 2022
@pnkfelix
Copy link
Member

pnkfelix commented May 27, 2022

Also, given that the original point of the issue was to determine why ThinLTO didn't seem to do a good enough job, that seems like a question that is well-suited for wg-llvm.

@rustbot label: A-llvm

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests