[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

easyaspi314 · 2023-05-10T20:14:18Z

In certain scenarios, passing the result of bsl/bit/bif to another bsl/bit/bif when using literal masks will result in expansion to and and orr. It seems to be adjusting the masks (as seen in the ARMv7-A equivalent) but it ends up being too aggressive.

Example code:

#include <arm_neon.h>

uint32x4_t foo(uint32x4_t a, uint32x4_t b, uint32x4_t c)
{
    uint32x4_t x = vbslq_u32(vdupq_n_u32(0xFF000000), a, b);
    uint32x4_t y = vbslq_u32(vdupq_n_u32(0xFFFF0000), x, c);
    return y;
}

Expected code: something like

foo:
    movi    v3.2d, #0xFF000000FF000000
    bsl     v3.16b, v0.16b, v1.16b
    movi    v0.2d, #0xFFFF0000FFFF0000
    bsl     v0.16b, v3.16b, v2.16b
    ret

Or based on the masked transformation it optimized it to:

foo:
    movi    v3.2d, #0xFF000000FF000000
    bsl     v3.16b, v0.16b, v1.16b
    movi    v0.2d, #0x0000FFFF0000FFFF
    bsl     v0.16b, v2.16b, v3.16b
    ret

Actual (clang 16.0.2 -O3)

foo:
    movi    v3.2d, #0xff000000ff000000
    movi    v4.2d, #0x00ff000000ff0000
    movi    v5.2d, #0x0000ffff0000ffff
    and     v0.16b, v0.16b, v3.16b
    and     v1.16b, v1.16b, v4.16b
    orr     v0.16b, v1.16b, v0.16b
    and     v1.16b, v2.16b, v5.16b
    orr     v0.16b, v0.16b, v1.16b
    ret

This only seems to affect AArch64 mode. ARMv7-a still emits bsl despite the mask transformation, but that seems to be because unlike aarch64 which is expanded to bitwise operations in LLVM, ARMv7 uses llvm.arm.neon.vbsl.v16i8. With the expanded LLVM IR ARMv7-a emits the same thing albeit slightly reordered.

The text was updated successfully, but these errors were encountered:

llvmbot · 2023-05-10T21:26:25Z

@llvm/issue-subscribers-backend-aarch64

v01dXYZ · 2023-05-11T07:26:55Z

Using the optimisation level 0 emits the bsl instruction. It seems the emitted LLVM IR doesn't use arm intrinsics but rather and, not, or LLVM IR instructions. bsl %dst, %src0, %src1 <=> %dst := (or (and %dst, %src0) (and (not %dst), %src1)) See https://github.com/llvm/llvm-project/blob/2713781b0cdc1af647048ec97d40101664673dee/clang/lib/CodeGen/CGBuiltin.cpp#LL11172C1-L11172C1

Which means this is an issue with the instruction lowering which doesn't want to combine (I've got to find where the DAG pattern is defined in the tbl gen sorry only found the cpp stuff for now) :

llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Line 16554 in 2713781

// (or (and a b) (and (not a) c)) => (bsl a b c)

Anyway, when using -O3 and looking at the generated llvm IR, the xor dst, src, -1 disappears. Maybe some constant folding passes destroy the expected pattern.

easyaspi314 · 2023-05-11T15:04:06Z

That would explain the last mask being inverted.

pranavk · 2023-05-17T22:19:42Z

AArch64 instruction lowering can recognize or(and(a, mask), and(b, ~mask)) and successfully convert it into bitselect instructions. The problem here is that InstCombine (eg: this file https://github.com/llvm/llvm-project/blob/691927c904ede183461610387402f5c19dbb3de0/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp) changes the sequence and makes it unrecognizable to AArch64 backend.

I think the potential fix will go into InstCombine to prevent it from messing with the sequence.

github-actions bot added the new issue label May 10, 2023

easyaspi314 changed the title ~~[aarch64] Double vbsl/vbit/vbif intrinsics with literal masks are converted to and/orr~~ [aarch64] Double vbsl intrinsics with literal masks are converted to and/orr May 10, 2023

EugeneZelenko added backend:AArch64 and removed new issue labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

easyaspi314 commented May 10, 2023 •

edited

llvmbot commented May 10, 2023

v01dXYZ commented May 11, 2023 •

edited

easyaspi314 commented May 11, 2023

pranavk commented May 17, 2023

[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

Comments

easyaspi314 commented May 10, 2023 • edited

llvmbot commented May 10, 2023

v01dXYZ commented May 11, 2023 • edited

easyaspi314 commented May 11, 2023

pranavk commented May 17, 2023

easyaspi314 commented May 10, 2023 •

edited

v01dXYZ commented May 11, 2023 •

edited