Skip to content

Conversation

@yichi170
Copy link
Contributor

This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as ~(A | B | C). By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction.

The change preserves the match for patterns like (x != C1) & (x != C2), which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP).

Resolves #163738

@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2025

@llvm/pr-subscribers-backend-x86

Author: Yi-Chi Lee (yichi170)

Changes

This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as ~(A | B | C). By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction.

The change preserves the match for patterns like (x != C1) & (x != C2), which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP).

Resolves #163738


Full diff: https://github.com/llvm/llvm-project/pull/164863.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelDAGToDAG.cpp (+35-4)
  • (added) llvm/test/CodeGen/X86/issue163738.ll (+13)
diff --git a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
index 62073ec125e8f..06a691d73c749 100644
--- a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
@@ -4721,9 +4721,6 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
   if (!(Subtarget->hasVLX() || NVT.is512BitVector()))
     return false;
 
-  SDValue N0 = N->getOperand(0);
-  SDValue N1 = N->getOperand(1);
-
   auto getFoldableLogicOp = [](SDValue Op) {
     // Peek through single use bitcast.
     if (Op.getOpcode() == ISD::BITCAST && Op.hasOneUse())
@@ -4740,6 +4737,37 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
     return SDValue();
   };
 
+  // Identify and (optionally) peel an outer NOT that wraps a pure logic tree
+  auto tryPeelOuterNotWrappingLogic = [&](SDNode *Op) {
+    if (Op->getOpcode() == ISD::XOR && Op->hasOneUse() &&
+        ISD::isBuildVectorAllOnes(Op->getOperand(1).getNode())) {
+      SDNode *InnerN = Op->getOperand(0).getNode();
+
+      unsigned InnerOpc = InnerN->getOpcode();
+      if (InnerOpc != ISD::AND && InnerOpc != ISD::OR &&
+          InnerOpc != ISD::XOR && InnerOpc != X86ISD::ANDNP) {
+        return Op;
+      }
+
+      SDValue InnerN0 = InnerN->getOperand(0);
+      SDValue InnerN1 = InnerN->getOperand(1);
+      SDValue FoldableOp;
+      if (getFoldableLogicOp(InnerN1) || getFoldableLogicOp(InnerN0))
+        return InnerN;
+    }
+    return Op;
+  };
+
+  SDNode *OriN = N;
+  bool PeeledOuterNot = false;
+  N = tryPeelOuterNotWrappingLogic(N);
+  if (N != OriN)
+    PeeledOuterNot = true;
+    
+
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+
   SDValue A, FoldableOp;
   if ((FoldableOp = getFoldableLogicOp(N1))) {
     A = N0;
@@ -4798,7 +4826,10 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
   case ISD::XOR: Imm ^= TernlogMagicA; break;
   }
 
-  return matchVPTERNLOG(N, ParentA, ParentB, ParentC, A, B, C, Imm);
+  if (PeeledOuterNot)
+    Imm = ~Imm;
+
+  return matchVPTERNLOG(OriN, ParentA, ParentB, ParentC, A, B, C, Imm);
 }
 
 /// If the high bits of an 'and' operand are known zero, try setting the
diff --git a/llvm/test/CodeGen/X86/issue163738.ll b/llvm/test/CodeGen/X86/issue163738.ll
new file mode 100644
index 0000000000000..61fe043a970dd
--- /dev/null
+++ b/llvm/test/CodeGen/X86/issue163738.ll
@@ -0,0 +1,13 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512f,+avx512vl | FileCheck %s --check-prefixes=CHECK
+
+define <8 x i64> @foo(<8 x i64> %a, <8 x i64> %b, <8 x i64> %c) {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpternlogq {{.*#+}} zmm0 = ~(zmm0 | zmm2 | zmm1)
+; CHECK-NEXT:    retq
+  %and.demorgan = or <8 x i64> %b, %a
+  %and3.demorgan = or <8 x i64> %and.demorgan, %c
+  %and3 = xor <8 x i64> %and3.demorgan, splat (i64 -1)
+  ret <8 x i64> %and3
+}

@github-actions
Copy link

github-actions bot commented Oct 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 4752 to 4755
SDValue InnerN0 = InnerN->getOperand(0);
SDValue InnerN1 = InnerN->getOperand(1);
if (getFoldableLogicOp(InnerN1) || getFoldableLogicOp(InnerN0))
return InnerN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated with 4766 line and later. Better to see if we can reuse them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about how to better handle this

SDNode *OriN = N;
if (SDValue InnerOp = tryPeelOuterNotWrappingLogic(N)) {
PeeledOuterNot = true;
N = InnerOp.getNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
N = InnerOp.getNode();
ParentA = InnerOp.getNode();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we still need N in line 4813...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we peel the not out, the immediate (Imm) should be computed using the inner node (InnerOp), since the negation will be applied afterward. Therefore, we should not update this line here.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@yichi170
Copy link
Contributor Author

Thanks for the review! I don't have commit access. Can you please help me merge it?

@phoebewang phoebewang merged commit cc4f462 into llvm:main Oct 25, 2025
10 checks passed
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…164863)

This patch extends VPTERNLOG pattern matching to handle cases where an
outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By
recognizing these negated logic trees, the instruction selector can now
emit a single vpternlog instruction.

The change preserves the match for patterns like `(x != C1) & (x !=
C2)`, which also have the xor-with-all-ones pattern outside. The patch
conservatively peels the outer XOR-with-all-ones only when it directly
wraps a foldable logical operator (AND, OR, XOR, or ANDNP).

Resolves llvm#163738
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…164863)

This patch extends VPTERNLOG pattern matching to handle cases where an
outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By
recognizing these negated logic trees, the instruction selector can now
emit a single vpternlog instruction.

The change preserves the match for patterns like `(x != C1) & (x !=
C2)`, which also have the xor-with-all-ones pattern outside. The patch
conservatively peels the outer XOR-with-all-ones only when it directly
wraps a foldable logical operator (AND, OR, XOR, or ANDNP).

Resolves llvm#163738
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…164863)

This patch extends VPTERNLOG pattern matching to handle cases where an
outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By
recognizing these negated logic trees, the instruction selector can now
emit a single vpternlog instruction.

The change preserves the match for patterns like `(x != C1) & (x !=
C2)`, which also have the xor-with-all-ones pattern outside. The patch
conservatively peels the outer XOR-with-all-ones only when it directly
wraps a foldable logical operator (AND, OR, XOR, or ANDNP).

Resolves llvm#163738
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[X86] Failure to merge VPTERNLOG patterns

3 participants