Codegen significantly worse when using u128 rather than two u64 #123627
Labels
A-codegen
Area: Code generation
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-bug
Category: This is a bug.
I-slow
Issue: Problems and improvements with respect to performance of generated code.
S-has-mcve
Status: A Minimal Complete and Verifiable Example has been found for this issue
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
I tried this code:
This replaces the original code which was emulating the same operation using two u64:
I expected to see this happen: no conditional code should be generated, resulting in the same performances as the u64 code.
Instead, this happened: the codegen is significantly worse. For a BC7 decoding crate the blocks are decoded 20 ns slower when using u128 instead of the existing high and low u64 (163.48 ns per block → 145.83 ns per block, -10.749% including randomness overhead). The same happens with an ASTC decoding crate. This is a followup of #122252.
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: