Skip to content

Moving function into a separate crate results in a more effective code for some reason #41894

@newpavlov

Description

@newpavlov

While working on optimizations for crypto-hashes I've notice a very strange behaviour described in the title. I've isolated the relevant code into this repository, so you can run it yourself.

Enabling lto produces the same slow result for separate case as for in-crate one. Optimal code generated only if #[inline] or #[inline(always)] used for compress function. Also generated assembly for two cases is quite different despite the identical code.

Probably it's due to some mis-optimization which gets turned off when function is in a different crate, but available for inlining.

UPD: It was reported on the reddit that 64-bit ARM shows the same performance for both cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions