Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aarch64] Double vbsl intrinsics with literal masks are converted to and/orr #62642

Open
easyaspi314 opened this issue May 10, 2023 · 4 comments

Comments

@easyaspi314
Copy link

easyaspi314 commented May 10, 2023

In certain scenarios, passing the result of bsl/bit/bif to another bsl/bit/bif when using literal masks will result in expansion to and and orr. It seems to be adjusting the masks (as seen in the ARMv7-A equivalent) but it ends up being too aggressive.

Example code:

#include <arm_neon.h>

uint32x4_t foo(uint32x4_t a, uint32x4_t b, uint32x4_t c)
{
    uint32x4_t x = vbslq_u32(vdupq_n_u32(0xFF000000), a, b);
    uint32x4_t y = vbslq_u32(vdupq_n_u32(0xFFFF0000), x, c);
    return y;
}

Expected code: something like

foo:
    movi    v3.2d, #0xFF000000FF000000
    bsl     v3.16b, v0.16b, v1.16b
    movi    v0.2d, #0xFFFF0000FFFF0000
    bsl     v0.16b, v3.16b, v2.16b
    ret

Or based on the masked transformation it optimized it to:

foo:
    movi    v3.2d, #0xFF000000FF000000
    bsl     v3.16b, v0.16b, v1.16b
    movi    v0.2d, #0x0000FFFF0000FFFF
    bsl     v0.16b, v2.16b, v3.16b
    ret

Actual (clang 16.0.2 -O3)

foo:
    movi    v3.2d, #0xff000000ff000000
    movi    v4.2d, #0x00ff000000ff0000
    movi    v5.2d, #0x0000ffff0000ffff
    and     v0.16b, v0.16b, v3.16b
    and     v1.16b, v1.16b, v4.16b
    orr     v0.16b, v1.16b, v0.16b
    and     v1.16b, v2.16b, v5.16b
    orr     v0.16b, v0.16b, v1.16b
    ret

This only seems to affect AArch64 mode. ARMv7-a still emits bsl despite the mask transformation, but that seems to be because unlike aarch64 which is expanded to bitwise operations in LLVM, ARMv7 uses llvm.arm.neon.vbsl.v16i8. With the expanded LLVM IR ARMv7-a emits the same thing albeit slightly reordered.

@easyaspi314 easyaspi314 changed the title [aarch64] Double vbsl/vbit/vbif intrinsics with literal masks are converted to and/orr [aarch64] Double vbsl intrinsics with literal masks are converted to and/orr May 10, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented May 10, 2023

@llvm/issue-subscribers-backend-aarch64

@v01dXYZ
Copy link

v01dXYZ commented May 11, 2023

Using the optimisation level 0 emits the bsl instruction. It seems the emitted LLVM IR doesn't use arm intrinsics but rather and, not, or LLVM IR instructions. bsl %dst, %src0, %src1 <=> %dst := (or (and %dst, %src0) (and (not %dst), %src1)) See https://github.com/llvm/llvm-project/blob/2713781b0cdc1af647048ec97d40101664673dee/clang/lib/CodeGen/CGBuiltin.cpp#LL11172C1-L11172C1

Which means this is an issue with the instruction lowering which doesn't want to combine (I've got to find where the DAG pattern is defined in the tbl gen sorry only found the cpp stuff for now) :

// (or (and a b) (and (not a) c)) => (bsl a b c)

Anyway, when using -O3 and looking at the generated llvm IR, the xor dst, src, -1 disappears. Maybe some constant folding passes destroy the expected pattern.

@easyaspi314
Copy link
Author

That would explain the last mask being inverted.

@pranavk
Copy link
Contributor

pranavk commented May 17, 2023

AArch64 instruction lowering can recognize or(and(a, mask), and(b, ~mask)) and successfully convert it into bitselect instructions. The problem here is that InstCombine (eg: this file https://github.com/llvm/llvm-project/blob/691927c904ede183461610387402f5c19dbb3de0/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp) changes the sequence and makes it unrecognizable to AArch64 backend.

I think the potential fix will go into InstCombine to prevent it from messing with the sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants