Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More instructions generated for Ord::clamp than manual max(X).min(Y) for saturating truncating cast from i32 to u8 #125738

Open
okaneco opened this issue May 29, 2024 · 1 comment
Labels
A-autovectorization Area: Autovectorization, which can impact perf or code size A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@okaneco
Copy link
Contributor

okaneco commented May 29, 2024

Originally reported here
rust-lang/rust-clippy#12826
Related PR spawned from that issue
#125455

cc @blyxyas

Clamping and casting from i32 to u8, using clamp(0, 255) as u8 produces unnecessary instructions compared to .max(0).min(255) as u8 when a loop is autovectorized.

clippy's manual_clamp lint in the beta toolchain warns on this pattern to use clamp instead which can regress performance.

Minimal example

#[inline(never)]
pub fn manual_clamp(input: &[i32], output: &mut [u8]) {
    for (&i, o) in input.iter().zip(output.iter_mut()) {
        *o = i.max(0).min(255) as u8;
    }
}

#[inline(never)]
pub fn clamp(input: &[i32], output: &mut [u8]) {
    for (&i, o) in input.iter().zip(output.iter_mut()) {
        *o = i.clamp(0, 255) as u8;
    }
}

https://rust.godbolt.org/z/zf73jsqjq

Manual clamp
.LBB0_4:
        movdqu  xmm0, xmmword ptr [rdi + 4*r8]
        packssdw        xmm0, xmm0
        packuswb        xmm0, xmm0
        movdqu  xmm1, xmmword ptr [rdi + 4*r8 + 16]
        packssdw        xmm1, xmm1
        packuswb        xmm1, xmm1
        movd    dword ptr [rdx + r8], xmm0
        movd    dword ptr [rdx + r8 + 4], xmm1
        add     r8, 8
        cmp     rsi, r8
        jne     .LBB0_4
`Ord::clamp`
.LBB0_4:
        movdqu  xmm6, xmmword ptr [rdi + 4*r8]
        movdqu  xmm5, xmmword ptr [rdi + 4*r8 + 16]
        pxor    xmm3, xmm3
        pcmpgtd xmm3, xmm6
        packssdw        xmm3, xmm3
        packsswb        xmm3, xmm3
        pxor    xmm4, xmm4
        pcmpgtd xmm4, xmm5
        packssdw        xmm4, xmm4
        packsswb        xmm4, xmm4
        movdqa  xmm7, xmm6
        pxor    xmm7, xmm0
        movdqa  xmm8, xmm1
        pcmpgtd xmm8, xmm7
        pand    xmm6, xmm8
        pandn   xmm8, xmm2
        por     xmm8, xmm6
        packuswb        xmm8, xmm8
        packuswb        xmm8, xmm8
        pandn   xmm3, xmm8
        movdqa  xmm6, xmm5
        pxor    xmm6, xmm0
        movdqa  xmm7, xmm1
        pcmpgtd xmm7, xmm6
        pand    xmm5, xmm7
        pandn   xmm7, xmm2
        por     xmm7, xmm5
        packuswb        xmm7, xmm7
        packuswb        xmm7, xmm7
        pandn   xmm4, xmm7
        movd    dword ptr [rdx + r8], xmm3
        movd    dword ptr [rdx + r8 + 4], xmm4
        add     r8, 8
        cmp     rsi, r8
        jne     .LBB0_4

Real code examples from functions in the image-webp crate
https://rust.godbolt.org/z/3rnY8d94v
https://rust.godbolt.org/z/53T7n9PGx

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 29, 2024
@okaneco
Copy link
Contributor Author

okaneco commented May 31, 2024

@rustbot label A-autovectorization C-optimization

@rustbot rustbot added A-autovectorization Area: Autovectorization, which can impact perf or code size C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels May 31, 2024
@jieyouxu jieyouxu added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-autovectorization Area: Autovectorization, which can impact perf or code size A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants