Codegen significantly worse when using u128 rather than two u64 #123627

linkmauve · 2024-04-08T11:49:26Z

I tried this code:

fn read_bits(data: &mut u128, n_bits: u32) -> u32 {
    let ret = *data & ((1 << n_bits) - 1);
    *data >>= n_bits;
    ret as u32
}

This replaces the original code which was emulating the same operation using two u64:

struct Bitstream {
    low: u64,
    high: u64,
}

impl Bitstream {
    fn read_bits(&mut self, num_bits: u32) -> u32 {
        let mask = (1 << num_bits) - 1;
        let bits = self.low & mask;
        self.low >>= num_bits;
        self.low |= (self.high & mask) << (u64::BITS as u64 - num_bits as u64);
        self.high >>= num_bits;
        bits as u32
    }
}

I expected to see this happen: no conditional code should be generated, resulting in the same performances as the u64 code.

Instead, this happened: the codegen is significantly worse. For a BC7 decoding crate the blocks are decoded 20 ns slower when using u128 instead of the existing high and low u64 (163.48 ns per block → 145.83 ns per block, -10.749% including randomness overhead). The same happens with an ASTC decoding crate. This is a followup of #122252.

Meta

rustc --version --verbose:

rustc 1.79.0-nightly (9d5cdf75a 2024-04-07)

The text was updated successfully, but these errors were encountered:

joboet · 2024-04-08T18:51:42Z

The operation is not actually the same. The behaviour of the u128 version is different from the custom version when bits is more than 64, as it correctly shifts bits from the upper 64 bits into the lower ones, whereas in the custom version, you get some very weird behaviour.

linkmauve · 2024-04-09T00:53:42Z

This function is always inlined in functions where num_bits is way less than 64, usually at most 4, and codegen actually gets worse when I add assert!(num_bits < 64);. I would expect the compiler to be able to optimize based on the constant the function gets called with, since it is always known at compile time.

tylanphear · 2024-04-18T05:42:04Z

@linkmauve at least for obvious examples where n_bits is a compile-time constant, it does appear that the compiler avoids cmov sequences when it can also prove n_bits < 64. Even where it is not known at compile-time, if you explicitly mask n_bits by 63, this also allows it to avoid cmov sequences (godbolt).

The fact that assert!() doesn't help here is unfortunate (I even tried using unreachable_unchecked() to force the generation of a llvm.assume but that also didn't help) -- most likely there is room for improvement in LLVM's lowering of shr on types with bitwidth > 64.

linkmauve added the C-bug Category: This is a bug. label Apr 8, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 8, 2024

jieyouxu added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 8, 2024

bjorn3 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Apr 8, 2024

jieyouxu added S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen significantly worse when using u128 rather than two u64 #123627

Codegen significantly worse when using u128 rather than two u64 #123627

linkmauve commented Apr 8, 2024

joboet commented Apr 8, 2024

linkmauve commented Apr 9, 2024

tylanphear commented Apr 18, 2024 •

edited

Loading

Codegen significantly worse when using u128 rather than two u64 #123627

Codegen significantly worse when using u128 rather than two u64 #123627

Comments

linkmauve commented Apr 8, 2024

Meta

joboet commented Apr 8, 2024

linkmauve commented Apr 9, 2024

tylanphear commented Apr 18, 2024 • edited Loading

tylanphear commented Apr 18, 2024 •

edited

Loading