-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Closed
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)
Description
While working on optimizations for crypto-hashes I've notice a very strange behaviour described in the title. I've isolated the relevant code into this repository, so you can run it yourself.
Enabling lto produces the same slow result for separate case as for in-crate one. Optimal code generated only if #[inline] or #[inline(always)] used for compress function. Also generated assembly for two cases is quite different despite the identical code.
Probably it's due to some mis-optimization which gets turned off when function is in a different crate, but available for inlining.
UPD: It was reported on the reddit that 64-bit ARM shows the same performance for both cases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)