[AArch64][SVE] Match (add_like x (lsr y, c)) -> usra x, y, c#197657
[AArch64][SVE] Match (add_like x (lsr y, c)) -> usra x, y, c#197657Harry-Ramsey wants to merge 1 commit into
Conversation
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-selectiondag Author: Harry Ramsey (Harry-Ramsey) ChangesModify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra. Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible. Full diff: https://github.com/llvm/llvm-project/pull/197657.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 9af378ec28868..29bad5c513c55 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4525,11 +4525,6 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
case ISD::INTRINSIC_WO_CHAIN:
case ISD::INTRINSIC_W_CHAIN:
case ISD::INTRINSIC_VOID:
- // TODO: Probably okay to remove after audit; here to reduce change size
- // in initial enablement patch for scalable vectors
- if (Op.getValueType().isScalableVector())
- break;
-
// Allow the target to implement this method for its nodes.
TLI->computeKnownBitsForTargetNode(Op, Known, DemandedElts, *this, Depth);
break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 9de0b461a216c..e78fd18f6604a 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2902,6 +2902,18 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
}
break;
}
+ case AArch64ISD::SHL_PRED:
+ case AArch64ISD::SRL_PRED: {
+ KnownBits KnownVal =
+ DAG.computeKnownBits(Op->getOperand(1), DemandedElts, Depth + 1);
+ KnownBits KnownAmt =
+ DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1);
+
+ Known = Op.getOpcode() == AArch64ISD::SHL_PRED
+ ? KnownBits::shl(KnownVal, KnownAmt)
+ : KnownBits::lshr(KnownVal, KnownAmt);
+ break;
+ }
case ISD::INTRINSIC_WO_CHAIN:
case ISD::INTRINSIC_VOID: {
unsigned IntNo = Op.getConstantOperandVal(0);
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 9df77f8e93c64..e4c281a84d72b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -362,7 +362,7 @@ def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3),
def AArch64usra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_usra node:$op1, node:$op2, node:$op3),
- (add node:$op1, (AArch64lsr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
+ (add_like node:$op1, (AArch64lsr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
def AArch64ssra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_ssra node:$op1, node:$op2, node:$op3),
diff --git a/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll b/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
index 80999fb1f4864..a7048731850d3 100644
--- a/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
@@ -118,14 +118,22 @@ define <vscale x 8 x i16> @testRightGood8x16(<vscale x 8 x i16> %src1, <vscale x
}
define <vscale x 8 x i16> @testRightBad8x16(<vscale x 8 x i16> %src1, <vscale x 8 x i16> %src2) {
-; CHECK-LABEL: testRightBad8x16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: mov w8, #16500 // =0x4074
-; CHECK-NEXT: lsr z1.h, z1.h, #14
-; CHECK-NEXT: mov z2.h, w8
-; CHECK-NEXT: and z0.d, z0.d, z2.d
-; CHECK-NEXT: orr z0.d, z0.d, z1.d
-; CHECK-NEXT: ret
+; SVE-LABEL: testRightBad8x16:
+; SVE: // %bb.0:
+; SVE-NEXT: mov w8, #16500 // =0x4074
+; SVE-NEXT: lsr z1.h, z1.h, #14
+; SVE-NEXT: mov z2.h, w8
+; SVE-NEXT: and z0.d, z0.d, z2.d
+; SVE-NEXT: orr z0.d, z0.d, z1.d
+; SVE-NEXT: ret
+;
+; SVE2-LABEL: testRightBad8x16:
+; SVE2: // %bb.0:
+; SVE2-NEXT: mov w8, #16500 // =0x4074
+; SVE2-NEXT: mov z2.h, w8
+; SVE2-NEXT: and z0.d, z0.d, z2.d
+; SVE2-NEXT: usra z0.h, z1.h, #14
+; SVE2-NEXT: ret
%and.i = and <vscale x 8 x i16> %src1, splat(i16 16500)
%vshl_n = lshr <vscale x 8 x i16> %src2, splat(i16 14)
%result = or <vscale x 8 x i16> %and.i, %vshl_n
|
| case ISD::INTRINSIC_WO_CHAIN: | ||
| case ISD::INTRINSIC_W_CHAIN: | ||
| case ISD::INTRINSIC_VOID: | ||
| // TODO: Probably okay to remove after audit; here to reduce change size |
There was a problem hiding this comment.
Please let me know if the audit concluded and this code can be removed?
There was a problem hiding this comment.
I think it looks OK, looking at the existing nodes in architectures that have scalable vectors and how they use types.
| case ISD::INTRINSIC_WO_CHAIN: | ||
| case ISD::INTRINSIC_W_CHAIN: | ||
| case ISD::INTRINSIC_VOID: | ||
| // TODO: Probably okay to remove after audit; here to reduce change size |
There was a problem hiding this comment.
I think it looks OK, looking at the existing nodes in architectures that have scalable vectors and how they use types.
|
I have taken a moment to additionally create tests for SSRA. Let me know if the test names should be renamed. |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
21e077d to
4e9e66c
Compare
| KnownBits KnownAmt = | ||
| DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1); | ||
|
|
||
| Known = Op.getOpcode() == AArch64ISD::SHL_PRED |
There was a problem hiding this comment.
From what I can tell this isn't tested any more, or isn't triggering where it was before? The add_like would be enough to trigger the or disjoint cases IIUC. It might be better to separate this known-bits part out so that it can be tested on it's own.
There was a problem hiding this comment.
Sorry, just a bit confused by what is meant here. The usra/ssra tests are still going down this code path with SVE2 so this should be tested?
There was a problem hiding this comment.
The add_like tablegen change you have added will pick up "add like" from it being an add or a or_disjoint. or_disjoint in turn can either be an or with the disjoint flag or where we look at the known-bits for the two inputs having haveNoCommonBitsSet.
This bit is helping with the second known-bits check, but the new tests are going through the first path I believe, with a disjoint or flag. And as no other tests are changing, they are likely not tested any more or not working in some way. Some of the sri tests used to trigger IIRC.
The add_like change alone is enough to fix the issue, so if we can separate this known-bits part out and make sure it is tested separately, we can get the other part in and each half is simpler.
There was a problem hiding this comment.
Oh - When I say different patch I should say different PR, so they can be reviewed and submitted separately.
| ret <vscale x 16 x i8> %result | ||
| } | ||
|
|
||
| define <vscale x 16 x i8> @usra_disjoint_or16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) { |
There was a problem hiding this comment.
Can you separate these out into a new test file for usra+ssra or add it to an existing file like llvm/test/CodeGen/AArch64/sve2-sra.ll
| @@ -1,5 +1,5 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py | |||
| ; RUN: llc < %s | FileCheck %s | |||
| ; RUN: llc -mattr=+sve2 < %s | FileCheck %s | |||
There was a problem hiding this comment.
To fit in with the other tests here, the #0 attributes can be added to the new functions to pick up +sve2.
| KnownBits KnownAmt = | ||
| DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1); | ||
|
|
||
| Known = Op.getOpcode() == AArch64ISD::SHL_PRED |
There was a problem hiding this comment.
The add_like tablegen change you have added will pick up "add like" from it being an add or a or_disjoint. or_disjoint in turn can either be an or with the disjoint flag or where we look at the known-bits for the two inputs having haveNoCommonBitsSet.
This bit is helping with the second known-bits check, but the new tests are going through the first path I believe, with a disjoint or flag. And as no other tests are changing, they are likely not tested any more or not working in some way. Some of the sri tests used to trigger IIRC.
The add_like change alone is enough to fix the issue, so if we can separate this known-bits part out and make sure it is tested separately, we can get the other part in and each half is simpler.
Modify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra. Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible.
72d39e9 to
321f9ea
Compare
Modify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra.
Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible.