-
Notifications
You must be signed in to change notification settings - Fork 15k
[X86][ISel] Improve VPTERNLOG matching for negated logic trees #164863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-x86 Author: Yi-Chi Lee (yichi170) ChangesThis patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as The change preserves the match for patterns like Resolves #163738 Full diff: https://github.com/llvm/llvm-project/pull/164863.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
index 62073ec125e8f..06a691d73c749 100644
--- a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
@@ -4721,9 +4721,6 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
if (!(Subtarget->hasVLX() || NVT.is512BitVector()))
return false;
- SDValue N0 = N->getOperand(0);
- SDValue N1 = N->getOperand(1);
-
auto getFoldableLogicOp = [](SDValue Op) {
// Peek through single use bitcast.
if (Op.getOpcode() == ISD::BITCAST && Op.hasOneUse())
@@ -4740,6 +4737,37 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
return SDValue();
};
+ // Identify and (optionally) peel an outer NOT that wraps a pure logic tree
+ auto tryPeelOuterNotWrappingLogic = [&](SDNode *Op) {
+ if (Op->getOpcode() == ISD::XOR && Op->hasOneUse() &&
+ ISD::isBuildVectorAllOnes(Op->getOperand(1).getNode())) {
+ SDNode *InnerN = Op->getOperand(0).getNode();
+
+ unsigned InnerOpc = InnerN->getOpcode();
+ if (InnerOpc != ISD::AND && InnerOpc != ISD::OR &&
+ InnerOpc != ISD::XOR && InnerOpc != X86ISD::ANDNP) {
+ return Op;
+ }
+
+ SDValue InnerN0 = InnerN->getOperand(0);
+ SDValue InnerN1 = InnerN->getOperand(1);
+ SDValue FoldableOp;
+ if (getFoldableLogicOp(InnerN1) || getFoldableLogicOp(InnerN0))
+ return InnerN;
+ }
+ return Op;
+ };
+
+ SDNode *OriN = N;
+ bool PeeledOuterNot = false;
+ N = tryPeelOuterNotWrappingLogic(N);
+ if (N != OriN)
+ PeeledOuterNot = true;
+
+
+ SDValue N0 = N->getOperand(0);
+ SDValue N1 = N->getOperand(1);
+
SDValue A, FoldableOp;
if ((FoldableOp = getFoldableLogicOp(N1))) {
A = N0;
@@ -4798,7 +4826,10 @@ bool X86DAGToDAGISel::tryVPTERNLOG(SDNode *N) {
case ISD::XOR: Imm ^= TernlogMagicA; break;
}
- return matchVPTERNLOG(N, ParentA, ParentB, ParentC, A, B, C, Imm);
+ if (PeeledOuterNot)
+ Imm = ~Imm;
+
+ return matchVPTERNLOG(OriN, ParentA, ParentB, ParentC, A, B, C, Imm);
}
/// If the high bits of an 'and' operand are known zero, try setting the
diff --git a/llvm/test/CodeGen/X86/issue163738.ll b/llvm/test/CodeGen/X86/issue163738.ll
new file mode 100644
index 0000000000000..61fe043a970dd
--- /dev/null
+++ b/llvm/test/CodeGen/X86/issue163738.ll
@@ -0,0 +1,13 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512f,+avx512vl | FileCheck %s --check-prefixes=CHECK
+
+define <8 x i64> @foo(<8 x i64> %a, <8 x i64> %b, <8 x i64> %c) {
+; CHECK-LABEL: foo:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpternlogq {{.*#+}} zmm0 = ~(zmm0 | zmm2 | zmm1)
+; CHECK-NEXT: retq
+ %and.demorgan = or <8 x i64> %b, %a
+ %and3.demorgan = or <8 x i64> %and.demorgan, %c
+ %and3 = xor <8 x i64> %and3.demorgan, splat (i64 -1)
+ ret <8 x i64> %and3
+}
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
| SDValue InnerN0 = InnerN->getOperand(0); | ||
| SDValue InnerN1 = InnerN->getOperand(1); | ||
| if (getFoldableLogicOp(InnerN1) || getFoldableLogicOp(InnerN0)) | ||
| return InnerN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated with 4766 line and later. Better to see if we can reuse them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about how to better handle this
| SDNode *OriN = N; | ||
| if (SDValue InnerOp = tryPeelOuterNotWrappingLogic(N)) { | ||
| PeeledOuterNot = true; | ||
| N = InnerOp.getNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| N = InnerOp.getNode(); | |
| ParentA = InnerOp.getNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we still need N in line 4813...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we peel the not out, the immediate (Imm) should be computed using the inner node (InnerOp), since the negation will be applied afterward. Therefore, we should not update this line here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks for the review! I don't have commit access. Can you please help me merge it? |
…164863) This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction. The change preserves the match for patterns like `(x != C1) & (x != C2)`, which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP). Resolves llvm#163738
…164863) This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction. The change preserves the match for patterns like `(x != C1) & (x != C2)`, which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP). Resolves llvm#163738
…164863) This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as `~(A | B | C)`. By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction. The change preserves the match for patterns like `(x != C1) & (x != C2)`, which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP). Resolves llvm#163738
This patch extends VPTERNLOG pattern matching to handle cases where an outer NOT wraps a pure logical tree, such as
~(A | B | C). By recognizing these negated logic trees, the instruction selector can now emit a single vpternlog instruction.The change preserves the match for patterns like
(x != C1) & (x != C2), which also have the xor-with-all-ones pattern outside. The patch conservatively peels the outer XOR-with-all-ones only when it directly wraps a foldable logical operator (AND, OR, XOR, or ANDNP).Resolves #163738