[AArch64][SVE] Match (add_like x (lsr y, c)) -> usra x, y, c by Harry-Ramsey · Pull Request #197657 · llvm/llvm-project

Harry-Ramsey · 2026-05-14T11:45:27Z

Modify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra.

Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible.

llvmorg-github-actions · 2026-05-14T11:46:07Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-selectiondag

Author: Harry Ramsey (Harry-Ramsey)

Changes

Modify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra.

Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible.

Full diff: https://github.com/llvm/llvm-project/pull/197657.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (-5)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+12)
(modified) llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve2-sli-sri.ll (+16-8)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 9af378ec28868..29bad5c513c55 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4525,11 +4525,6 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
   case ISD::INTRINSIC_WO_CHAIN:
   case ISD::INTRINSIC_W_CHAIN:
   case ISD::INTRINSIC_VOID:
-    // TODO: Probably okay to remove after audit; here to reduce change size
-    // in initial enablement patch for scalable vectors
-    if (Op.getValueType().isScalableVector())
-      break;
-
     // Allow the target to implement this method for its nodes.
     TLI->computeKnownBitsForTargetNode(Op, Known, DemandedElts, *this, Depth);
     break;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 9de0b461a216c..e78fd18f6604a 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2902,6 +2902,18 @@ void AArch64TargetLowering::computeKnownBitsForTargetNode(
     }
     break;
   }
+  case AArch64ISD::SHL_PRED:
+  case AArch64ISD::SRL_PRED: {
+    KnownBits KnownVal =
+        DAG.computeKnownBits(Op->getOperand(1), DemandedElts, Depth + 1);
+    KnownBits KnownAmt =
+        DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1);
+
+    Known = Op.getOpcode() == AArch64ISD::SHL_PRED
+                ? KnownBits::shl(KnownVal, KnownAmt)
+                : KnownBits::lshr(KnownVal, KnownAmt);
+    break;
+  }
   case ISD::INTRINSIC_WO_CHAIN:
   case ISD::INTRINSIC_VOID: {
     unsigned IntNo = Op.getConstantOperandVal(0);
diff --git a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
index 9df77f8e93c64..e4c281a84d72b 100644
--- a/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
@@ -362,7 +362,7 @@ def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3),
 
 def AArch64usra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
                            [(int_aarch64_sve_usra node:$op1, node:$op2, node:$op3),
-                            (add node:$op1, (AArch64lsr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
+                            (add_like node:$op1, (AArch64lsr_p (SVEAnyPredicate), node:$op2, (SVEShiftSplatImmR (i32 node:$op3))))]>;
 
 def AArch64ssra : PatFrags<(ops node:$op1, node:$op2, node:$op3),
                            [(int_aarch64_sve_ssra node:$op1, node:$op2, node:$op3),
diff --git a/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll b/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
index 80999fb1f4864..a7048731850d3 100644
--- a/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
+++ b/llvm/test/CodeGen/AArch64/sve2-sli-sri.ll
@@ -118,14 +118,22 @@ define <vscale x 8 x i16> @testRightGood8x16(<vscale x 8 x i16> %src1, <vscale x
 }
 
 define <vscale x 8 x i16> @testRightBad8x16(<vscale x 8 x i16> %src1, <vscale x 8 x i16> %src2) {
-; CHECK-LABEL: testRightBad8x16:
-; CHECK:       // %bb.0:
-; CHECK-NEXT:    mov w8, #16500 // =0x4074
-; CHECK-NEXT:    lsr z1.h, z1.h, #14
-; CHECK-NEXT:    mov z2.h, w8
-; CHECK-NEXT:    and z0.d, z0.d, z2.d
-; CHECK-NEXT:    orr z0.d, z0.d, z1.d
-; CHECK-NEXT:    ret
+; SVE-LABEL: testRightBad8x16:
+; SVE:       // %bb.0:
+; SVE-NEXT:    mov w8, #16500 // =0x4074
+; SVE-NEXT:    lsr z1.h, z1.h, #14
+; SVE-NEXT:    mov z2.h, w8
+; SVE-NEXT:    and z0.d, z0.d, z2.d
+; SVE-NEXT:    orr z0.d, z0.d, z1.d
+; SVE-NEXT:    ret
+;
+; SVE2-LABEL: testRightBad8x16:
+; SVE2:       // %bb.0:
+; SVE2-NEXT:    mov w8, #16500 // =0x4074
+; SVE2-NEXT:    mov z2.h, w8
+; SVE2-NEXT:    and z0.d, z0.d, z2.d
+; SVE2-NEXT:    usra z0.h, z1.h, #14
+; SVE2-NEXT:    ret
   %and.i = and <vscale x 8 x i16> %src1, splat(i16 16500)
   %vshl_n = lshr <vscale x 8 x i16> %src2, splat(i16 14)
   %result = or <vscale x 8 x i16> %and.i, %vshl_n

Harry-Ramsey · 2026-05-14T11:54:14Z

  case ISD::INTRINSIC_WO_CHAIN:
  case ISD::INTRINSIC_W_CHAIN:
  case ISD::INTRINSIC_VOID:
-    // TODO: Probably okay to remove after audit; here to reduce change size


Please let me know if the audit concluded and this code can be removed?

I think it looks OK, looking at the existing nodes in architectures that have scalable vectors and how they use types.

davemgreen · 2026-05-15T10:28:50Z

  case ISD::INTRINSIC_WO_CHAIN:
  case ISD::INTRINSIC_W_CHAIN:
  case ISD::INTRINSIC_VOID:
-    // TODO: Probably okay to remove after audit; here to reduce change size


I think it looks OK, looking at the existing nodes in architectures that have scalable vectors and how they use types.

Harry-Ramsey · 2026-05-18T08:27:26Z

I have taken a moment to additionally create tests for SSRA. Let me know if the test names should be renamed.

github-actions · 2026-05-18T09:00:37Z

🪟 Windows x64 Test Results

135232 tests passed
3335 tests skipped

✅ The build succeeded and all tests passed.

github-actions · 2026-05-18T09:00:37Z

🐧 Linux x64 Test Results

195946 tests passed
5281 tests skipped

✅ The build succeeded and all tests passed.

davemgreen · 2026-05-21T15:32:46Z

+    KnownBits KnownAmt =
+        DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1);
+
+    Known = Op.getOpcode() == AArch64ISD::SHL_PRED


From what I can tell this isn't tested any more, or isn't triggering where it was before? The add_like would be enough to trigger the or disjoint cases IIUC. It might be better to separate this known-bits part out so that it can be tested on it's own.

Sorry, just a bit confused by what is meant here. The usra/ssra tests are still going down this code path with SVE2 so this should be tested?

The add_like tablegen change you have added will pick up "add like" from it being an add or a or_disjoint. or_disjoint in turn can either be an or with the disjoint flag or where we look at the known-bits for the two inputs having haveNoCommonBitsSet.

This bit is helping with the second known-bits check, but the new tests are going through the first path I believe, with a disjoint or flag. And as no other tests are changing, they are likely not tested any more or not working in some way. Some of the sri tests used to trigger IIRC.

The add_like change alone is enough to fix the issue, so if we can separate this known-bits part out and make sure it is tested separately, we can get the other part in and each half is simpler.

Oh - When I say different patch I should say different PR, so they can be reviewed and submitted separately.

davemgreen · 2026-05-21T15:34:32Z

  ret <vscale x 16 x i8> %result
 }

+define <vscale x 16 x i8> @usra_disjoint_or16xi8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {


Can you separate these out into a new test file for usra+ssra or add it to an existing file like llvm/test/CodeGen/AArch64/sve2-sra.ll

davemgreen · 2026-05-27T14:34:17Z

@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s | FileCheck %s
+; RUN: llc -mattr=+sve2 < %s | FileCheck %s


To fit in with the other tests here, the #0 attributes can be added to the new functions to pick up +sve2.

davemgreen · 2026-05-27T14:49:34Z

+    KnownBits KnownAmt =
+        DAG.computeKnownBits(Op->getOperand(2), DemandedElts, Depth + 1);
+
+    Known = Op.getOpcode() == AArch64ISD::SHL_PRED


The add_like tablegen change you have added will pick up "add like" from it being an add or a or_disjoint. or_disjoint in turn can either be an or with the disjoint flag or where we look at the known-bits for the two inputs having haveNoCommonBitsSet.

This bit is helping with the second known-bits check, but the new tests are going through the first path I believe, with a disjoint or flag. And as no other tests are changing, they are likely not tested any more or not working in some way. Some of the sri tests used to trigger IIRC.

The add_like change alone is enough to fix the issue, so if we can separate this known-bits part out and make sure it is tested separately, we can get the other part in and each half is simpler.

Modify SVEE USRA pattern to accept add_like, so both add and or disjoint forms can select usra. Add known-bits support for predicated SVE logical shifts, allowing or_disjoint matching to prove disjointness for plain ORs where possible.

davemgreen

LGTM - thanks.

Harry-Ramsey self-assigned this May 14, 2026

llvmorg-github-actions Bot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels May 14, 2026

Harry-Ramsey requested review from davemgreen and efriedma-quic May 14, 2026 11:45

Harry-Ramsey commented May 14, 2026

View reviewed changes

davemgreen reviewed May 15, 2026

View reviewed changes

davemgreen requested a review from david-arm May 15, 2026 10:32

Harry-Ramsey force-pushed the ramsey/sve-generate-ursa branch 2 times, most recently from 21e077d to 4e9e66c Compare May 20, 2026 11:03

Harry-Ramsey requested a review from davemgreen May 20, 2026 12:51

davemgreen reviewed May 21, 2026

View reviewed changes

davemgreen reviewed May 27, 2026

View reviewed changes

Harry-Ramsey force-pushed the ramsey/sve-generate-ursa branch 3 times, most recently from 72d39e9 to 321f9ea Compare May 29, 2026 08:27

Harry-Ramsey mentioned this pull request May 29, 2026

[AArch64][SVE] Enable known bits for predicated shifts #200347

Open

davemgreen approved these changes May 31, 2026

View reviewed changes

Conversation

Harry-Ramsey commented May 14, 2026

Uh oh!

llvmorg-github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Harry-Ramsey commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🪟 Windows x64 Test Results

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llvmorg-github-actions Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading