Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAG][AArch64] Handle vscale addressing modes in reassociationCanBreakAddressingModePattern #89908

Merged
merged 1 commit into from
May 10, 2024

Conversation

davemgreen
Copy link
Collaborator

reassociationCanBreakAddressingModePattern tries to prevent bad add reassociations that would break adrressing mode patterns. This adds support for vscale offset addressing modes, making sure we don't break patterns that already exist. It does not optimize to the correct addressing modes yet, but prevents us from optimizating away from them.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 24, 2024

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

reassociationCanBreakAddressingModePattern tries to prevent bad add reassociations that would break adrressing mode patterns. This adds support for vscale offset addressing modes, making sure we don't break patterns that already exist. It does not optimize to the correct addressing modes yet, but prevents us from optimizating away from them.


Full diff: https://github.com/llvm/llvm-project/pull/89908.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+31)
  • (modified) llvm/test/CodeGen/AArch64/sve-reassocadd.ll (+10-20)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index fd265b12d73ca4..0ea5f58bb53270 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -1085,6 +1085,37 @@ bool DAGCombiner::reassociationCanBreakAddressingModePattern(unsigned Opc,
   if (Opc != ISD::ADD || N0.getOpcode() != ISD::ADD)
     return false;
 
+  // Check for vscale addressing modes.
+  // (load/store (add (add x, y), vscale))
+  // (load/store (add (add x, y), (lsl vscale, C)))
+  // (load/store (add (add x, y), (mul vscale, C)))
+  if ((N1.getOpcode() == ISD::VSCALE ||
+       ((N1.getOpcode() == ISD::SHL || N1.getOpcode() == ISD::MUL) &&
+        N1.getOperand(0).getOpcode() == ISD::VSCALE &&
+        isa<ConstantSDNode>(N1.getOperand(1)))) &&
+      N1.getValueSizeInBits() <= 64) {
+    unsigned ScalableOffset =
+        N1.getOpcode() == ISD::VSCALE
+            ? N1.getConstantOperandVal(0)
+            : (N1.getOperand(0).getConstantOperandVal(0) *
+               (N1.getOpcode() == ISD::SHL ? (1 << N1.getConstantOperandVal(1))
+                                           : N1.getConstantOperandVal(1)));
+    if (all_of(N->uses(), [&](SDNode *Node) {
+          if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
+            TargetLoweringBase::AddrMode AM;
+            AM.HasBaseReg = true;
+            AM.ScalableOffset = ScalableOffset;
+            EVT VT = LoadStore->getMemoryVT();
+            unsigned AS = LoadStore->getAddressSpace();
+            Type *AccessTy = VT.getTypeForEVT(*DAG.getContext());
+            return TLI.isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy,
+                                             AS);
+          }
+          return false;
+        }))
+      return true;
+  }
+
   auto *C2 = dyn_cast<ConstantSDNode>(N1);
   if (!C2)
     return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-reassocadd.ll b/llvm/test/CodeGen/AArch64/sve-reassocadd.ll
index 47ddab8e296478..9ae0a396d07929 100644
--- a/llvm/test/CodeGen/AArch64/sve-reassocadd.ll
+++ b/llvm/test/CodeGen/AArch64/sve-reassocadd.ll
@@ -22,11 +22,9 @@ entry:
 define <vscale x 16 x i8> @i8_4s_1v(ptr %b) {
 ; CHECK-LABEL: i8_4s_1v:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    rdvl x8, #1
 ; CHECK-NEXT:    ptrue p0.b
-; CHECK-NEXT:    mov w9, #4 // =0x4
-; CHECK-NEXT:    add x8, x0, x8
-; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x8, x9]
+; CHECK-NEXT:    add x8, x0, #4
+; CHECK-NEXT:    ld1b { z0.b }, p0/z, [x8, #1, mul vl]
 ; CHECK-NEXT:    ret
 entry:
   %add.ptr = getelementptr inbounds i8, ptr %b, i64 4
@@ -58,11 +56,9 @@ entry:
 define <vscale x 8 x i16> @i16_8s_1v(ptr %b) {
 ; CHECK-LABEL: i16_8s_1v:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    rdvl x8, #1
 ; CHECK-NEXT:    ptrue p0.h
-; CHECK-NEXT:    mov x9, #4 // =0x4
-; CHECK-NEXT:    add x8, x0, x8
-; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x8, x9, lsl #1]
+; CHECK-NEXT:    add x8, x0, #8
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x8, #1, mul vl]
 ; CHECK-NEXT:    ret
 entry:
   %add.ptr = getelementptr inbounds i8, ptr %b, i64 8
@@ -94,11 +90,9 @@ entry:
 define <vscale x 8 x i16> @i16_8s_2v(ptr %b) {
 ; CHECK-LABEL: i16_8s_2v:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    rdvl x8, #2
 ; CHECK-NEXT:    ptrue p0.h
-; CHECK-NEXT:    mov x9, #4 // =0x4
-; CHECK-NEXT:    add x8, x0, x8
-; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x8, x9, lsl #1]
+; CHECK-NEXT:    add x8, x0, #8
+; CHECK-NEXT:    ld1h { z0.h }, p0/z, [x8, #2, mul vl]
 ; CHECK-NEXT:    ret
 entry:
   %add.ptr = getelementptr inbounds i8, ptr %b, i64 8
@@ -130,11 +124,9 @@ entry:
 define <vscale x 4 x i32> @i32_16s_2v(ptr %b) {
 ; CHECK-LABEL: i32_16s_2v:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    rdvl x8, #1
 ; CHECK-NEXT:    ptrue p0.s
-; CHECK-NEXT:    mov x9, #4 // =0x4
-; CHECK-NEXT:    add x8, x0, x8
-; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x8, x9, lsl #2]
+; CHECK-NEXT:    add x8, x0, #16
+; CHECK-NEXT:    ld1w { z0.s }, p0/z, [x8, #1, mul vl]
 ; CHECK-NEXT:    ret
 entry:
   %add.ptr = getelementptr inbounds i8, ptr %b, i64 16
@@ -166,11 +158,9 @@ entry:
 define <vscale x 2 x i64> @i64_32s_2v(ptr %b) {
 ; CHECK-LABEL: i64_32s_2v:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    rdvl x8, #1
 ; CHECK-NEXT:    ptrue p0.d
-; CHECK-NEXT:    mov x9, #4 // =0x4
-; CHECK-NEXT:    add x8, x0, x8
-; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x8, x9, lsl #3]
+; CHECK-NEXT:    add x8, x0, #32
+; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x8, #1, mul vl]
 ; CHECK-NEXT:    ret
 entry:
   %add.ptr = getelementptr inbounds i8, ptr %b, i64 32

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a useful patch to me! I just have a few comments ...

((N1.getOpcode() == ISD::SHL || N1.getOpcode() == ISD::MUL) &&
N1.getOperand(0).getOpcode() == ISD::VSCALE &&
isa<ConstantSDNode>(N1.getOperand(1)))) &&
N1.getValueSizeInBits() <= 64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use getScalarValueSizeInBits here instead to avoid the implicit TypeSize->uint64_t conversion?

N1.getOperand(0).getOpcode() == ISD::VSCALE &&
isa<ConstantSDNode>(N1.getOperand(1)))) &&
N1.getValueSizeInBits() <= 64) {
unsigned ScalableOffset =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things I've noticed here:

  1. The AddrMode struct's ScalableOffset field is actually int64_t and the offset can be negative. e.g. add (add x, y), vscale(-3). I think we need to use int64_t here.
  2. This code potentially accepts ISD::VSCALE nodes with result types of 8 or 16-bits, where there is a chance of overflow for large values of vscale. Is it worth just restricting this to integer types that are suitable for pointer arithmetic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add some tests for negative vscale offsets too?

…kAddressingModePattern.

reassociationCanBreakAddressingModePattern tries to prevent bad add
reassociations that would break adrressing mode patterns. This adds support for
vscale offset addressing modes, making sure we don't break patterns that
already exist. It does not optimize _to_ the correct addressing modes yet, but
prevents us from optimizating _away_ from them.
@davemgreen
Copy link
Collaborator Author

I've added some negative tests and attempted to extend this to handle them well. In the long run we will want to do the opposite transform too where we combine towards legal addressing modes, which might involve more changes.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for making the changes.

Copy link
Collaborator

@huntergr-arm huntergr-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davemgreen davemgreen merged commit 23b673e into llvm:main May 10, 2024
3 of 4 checks passed
@davemgreen davemgreen deleted the gh-a64-reassocvscale branch May 10, 2024 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants