[X86] combinePTESTCC - fold PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets #165676

RKSimon · 2025-10-30T08:25:02Z

If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant.

Fixes some codegen identified in #156233

… AVX targets If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant. Fixes some codegen identified in llvm#156233

llvmbot · 2025-10-30T08:25:35Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant.

Fixes some codegen identified in #156233

Full diff: https://github.com/llvm/llvm-project/pull/165676.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+20)
(modified) llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll (+18-66)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 624cff24ddf03..60ee2291d47e5 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -48859,6 +48859,26 @@ static SDValue combinePTESTCC(SDValue EFLAGS, X86::CondCode &CC,
     if (ISD::isBuildVectorAllOnes(Op1.getNode()))
       return DAG.getNode(EFLAGS.getOpcode(), SDLoc(EFLAGS), VT, Op0, Op0);
 
+    // Attempt to convert PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets.
+    if (EFLAGS.getOpcode() == X86ISD::PTEST && Subtarget.hasAVX()) {
+      KnownBits KnownOp1 = DAG.computeKnownBits(Op1);
+      assert(KnownOp1.getBitWidth() == 64 &&
+             "Illegal PTEST vector element width");
+      if (KnownOp1.isConstant()) {
+        const APInt &Mask = KnownOp1.getConstant();
+        if (Mask.isSignMask()) {
+          MVT FpVT = MVT::getVectorVT(MVT::f64, OpVT.getSizeInBits() / 64);
+          Op0 = DAG.getBitcast(FpVT, Op0);
+          return DAG.getNode(X86ISD::TESTP, SDLoc(EFLAGS), VT, Op0, Op0);
+        }
+        if (Mask.isSplat(32) && Mask.trunc(32).isSignMask()) {
+          MVT FpVT = MVT::getVectorVT(MVT::f32, OpVT.getSizeInBits() / 32);
+          Op0 = DAG.getBitcast(FpVT, Op0);
+          return DAG.getNode(X86ISD::TESTP, SDLoc(EFLAGS), VT, Op0, Op0);
+        }
+      }
+    }
+
     // TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y)
     // TODO: Add COND_NE handling?
     if (CC == X86::COND_E && OpVT.is128BitVector() && Subtarget.hasAVX()) {
diff --git a/llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll b/llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
index 9816fa7c83560..044327d94c0ef 100644
--- a/llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
+++ b/llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll
@@ -875,28 +875,12 @@ define i1 @mask_v8i32(<8 x i32> %a0) {
 ; SSE41-NEXT:    sete %al
 ; SSE41-NEXT:    retq
 ;
-; AVX1-LABEL: mask_v8i32:
-; AVX1:       # %bb.0:
-; AVX1-NEXT:    vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
-; AVX1-NEXT:    sete %al
-; AVX1-NEXT:    vzeroupper
-; AVX1-NEXT:    retq
-;
-; AVX2-LABEL: mask_v8i32:
-; AVX2:       # %bb.0:
-; AVX2-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372039002259456,9223372039002259456,9223372039002259456,9223372039002259456]
-; AVX2-NEXT:    vptest %ymm1, %ymm0
-; AVX2-NEXT:    sete %al
-; AVX2-NEXT:    vzeroupper
-; AVX2-NEXT:    retq
-;
-; AVX512-LABEL: mask_v8i32:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372039002259456,9223372039002259456,9223372039002259456,9223372039002259456]
-; AVX512-NEXT:    vptest %ymm1, %ymm0
-; AVX512-NEXT:    sete %al
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX-LABEL: mask_v8i32:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vtestps %ymm0, %ymm0
+; AVX-NEXT:    sete %al
+; AVX-NEXT:    vzeroupper
+; AVX-NEXT:    retq
   %1 = call i32 @llvm.vector.reduce.or.v8i32(<8 x i32> %a0)
   %2 = and i32 %1, 2147483648
   %3 = icmp eq i32 %2, 0
@@ -965,28 +949,12 @@ define i1 @signtest_v8i32(<8 x i32> %a0) {
 ; SSE41-NEXT:    sete %al
 ; SSE41-NEXT:    retq
 ;
-; AVX1-LABEL: signtest_v8i32:
-; AVX1:       # %bb.0:
-; AVX1-NEXT:    vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
-; AVX1-NEXT:    sete %al
-; AVX1-NEXT:    vzeroupper
-; AVX1-NEXT:    retq
-;
-; AVX2-LABEL: signtest_v8i32:
-; AVX2:       # %bb.0:
-; AVX2-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372039002259456,9223372039002259456,9223372039002259456,9223372039002259456]
-; AVX2-NEXT:    vptest %ymm1, %ymm0
-; AVX2-NEXT:    sete %al
-; AVX2-NEXT:    vzeroupper
-; AVX2-NEXT:    retq
-;
-; AVX512-LABEL: signtest_v8i32:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372039002259456,9223372039002259456,9223372039002259456,9223372039002259456]
-; AVX512-NEXT:    vptest %ymm1, %ymm0
-; AVX512-NEXT:    sete %al
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX-LABEL: signtest_v8i32:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vtestps %ymm0, %ymm0
+; AVX-NEXT:    sete %al
+; AVX-NEXT:    vzeroupper
+; AVX-NEXT:    retq
   %1 = call i32 @llvm.vector.reduce.or.v8i32(<8 x i32> %a0)
   %2 = icmp sgt i32 %1, -1
   ret i1 %2
@@ -1010,28 +978,12 @@ define i1 @signtest_v4i64(<4 x i64> %a0) {
 ; SSE41-NEXT:    sete %al
 ; SSE41-NEXT:    retq
 ;
-; AVX1-LABEL: signtest_v4i64:
-; AVX1:       # %bb.0:
-; AVX1-NEXT:    vptest {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0
-; AVX1-NEXT:    sete %al
-; AVX1-NEXT:    vzeroupper
-; AVX1-NEXT:    retq
-;
-; AVX2-LABEL: signtest_v4i64:
-; AVX2:       # %bb.0:
-; AVX2-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
-; AVX2-NEXT:    vptest %ymm1, %ymm0
-; AVX2-NEXT:    sete %al
-; AVX2-NEXT:    vzeroupper
-; AVX2-NEXT:    retq
-;
-; AVX512-LABEL: signtest_v4i64:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
-; AVX512-NEXT:    vptest %ymm1, %ymm0
-; AVX512-NEXT:    sete %al
-; AVX512-NEXT:    vzeroupper
-; AVX512-NEXT:    retq
+; AVX-LABEL: signtest_v4i64:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vtestpd %ymm0, %ymm0
+; AVX-NEXT:    sete %al
+; AVX-NEXT:    vzeroupper
+; AVX-NEXT:    retq
   %1 = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> %a0)
   %2 = icmp sgt i64 %1, -1
   ret i1 %2

phoebewang · 2025-10-30T09:01:08Z

llvm/lib/Target/X86/X86ISelLowering.cpp

    if (ISD::isBuildVectorAllOnes(Op1.getNode()))
      return DAG.getNode(EFLAGS.getOpcode(), SDLoc(EFLAGS), VT, Op0, Op0);

+    // Attempt to convert PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets.


Do we need to freeze X?

Nice catch! Yes, we should - all the similar folds in combinePTESTCC seem to be missing that.

phoebewang · 2025-10-30T09:08:02Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+    // Attempt to convert PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets.
+    if (EFLAGS.getOpcode() == X86ISD::PTEST && Subtarget.hasAVX()) {
+      KnownBits KnownOp1 = DAG.computeKnownBits(Op1);
+      assert(KnownOp1.getBitWidth() == 64 &&


SDM doesn't define element width, while X86ptest defines as i32 vector. Why we check it's 64 here?

Isn't the i32 the (scalar) result type (for X86SETCC handling?) The isel patterns all use v2i64/v4i64 vectors for the input operands. But I agree, technically PTEST shouldn't care about vector element type but there's never been any need to make it work generically.

Yeah, I took it as a element type 🤦‍♀️

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

phoebewang

LGTM.

As noticed on #165676 - if we're increasing the use of an operand we should freeze it

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

… AVX targets (llvm#165676) If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant. Fixes some codegen identified in llvm#156233

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

… AVX targets (llvm#165676) If the PTEST is just using the ZF result and one of the operands is a i32/i64 sign mask we can use the TESTPD/PS instructions instead and avoid the use of an extra constant. Fixes some codegen identified in llvm#156233

RKSimon requested review from abhishek-kaushik22 and phoebewang October 30, 2025 08:25

llvmbot added the backend:X86 label Oct 30, 2025

phoebewang reviewed Oct 30, 2025

View reviewed changes

Freeze duplicated operand

547cfae

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Oct 30, 2025

[X86] combinePTESTCC - ensure repeated operands are frozen

59a52a7

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

RKSimon mentioned this pull request Oct 30, 2025

[X86] combinePTESTCC - ensure repeated operands are frozen #165697

Merged

phoebewang approved these changes Oct 30, 2025

View reviewed changes

RKSimon added a commit that referenced this pull request Oct 30, 2025

[X86] combinePTESTCC - ensure repeated operands are frozen (#165697)

8c8bead

As noticed on #165676 - if we're increasing the use of an operand we should freeze it

Merge branch 'main' into x86-ptest-signmask-testp

5ab72dc

RKSimon enabled auto-merge (squash) October 30, 2025 12:39

RKSimon merged commit da709f5 into llvm:main Oct 30, 2025
9 of 10 checks passed

RKSimon deleted the x86-ptest-signmask-testp branch October 30, 2025 13:17

aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025

[X86] combinePTESTCC - ensure repeated operands are frozen (llvm#165697)

248fe96

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

luciechoi pushed a commit to luciechoi/llvm-project that referenced this pull request Nov 1, 2025

[X86] combinePTESTCC - ensure repeated operands are frozen (llvm#165697)

0cff0e6

As noticed on llvm#165676 - if we're increasing the use of an operand we should freeze it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] combinePTESTCC - fold PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets #165676

[X86] combinePTESTCC - fold PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets #165676

Uh oh!

RKSimon commented Oct 30, 2025

Uh oh!

llvmbot commented Oct 30, 2025

Uh oh!

phoebewang Oct 30, 2025

Uh oh!

RKSimon Oct 30, 2025

Uh oh!

phoebewang Oct 30, 2025

Uh oh!

RKSimon Oct 30, 2025

Uh oh!

phoebewang Oct 30, 2025

Uh oh!

phoebewang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[X86] combinePTESTCC - fold PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets #165676

[X86] combinePTESTCC - fold PTESTZ(X,SIGNMASK) -> VTESTPD/PSZ(X,X) on AVX targets #165676

Uh oh!

Conversation

RKSimon commented Oct 30, 2025

Uh oh!

llvmbot commented Oct 30, 2025

Uh oh!

phoebewang Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants