Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upMoving function into a separate crate results in a more effective code for some reason #41894
Comments
This comment has been minimized.
This comment has been minimized.
|
Citing doener:
|
dotdash
added
A-LLVM
A-optimization
O-x86_64
I-slow
labels
May 11, 2017
This comment has been minimized.
This comment has been minimized.
samlh
commented
Jun 1, 2017
|
Possible LLVM fix: https://reviews.llvm.org/rL303333 |
Mark-Simulacrum
removed
the
A-optimization
label
Jun 23, 2017
Mark-Simulacrum
added
the
C-enhancement
label
Jul 26, 2017
This comment has been minimized.
This comment has been minimized.
|
The issue is still reproducible on Nightly 2018-10-28. |
newpavlov commentedMay 10, 2017
•
edited
While working on optimizations for crypto-hashes I've notice a very strange behaviour described in the title. I've isolated the relevant code into this repository, so you can run it yourself.
Enabling lto produces the same slow result for separate case as for in-crate one. Optimal code generated only if
#[inline]or#[inline(always)]used for compress function. Also generated assembly for two cases is quite different despite the identical code.Probably it's due to some mis-optimization which gets turned off when function is in a different crate, but available for inlining.
UPD: It was reported on the reddit that 64-bit ARM shows the same performance for both cases.