Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARMISelDAGToDAG] use MO_FrameIndex to represent FrameIndex rather than MO_Register for INLINEASM #69654

Closed
wants to merge 1 commit into from

Conversation

nickdesaulniers
Copy link
Member

Many other targets use stack slot indexes to represent memory locations.
This patch normalizes ARM to also use these kind of MachineOperands so
that register selection can find which stack slot is necessary to fold
memory operands when "rm" is used.

A nice side effect of this change is that we're less likely to exhaust
registers for inline asm with many clobbers.

Link: #20571 (comment)

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 19, 2023

@llvm/pr-subscribers-backend-arm

Author: Nick Desaulniers (nickdesaulniers)

Changes

Many other targets use stack slot indexes to represent memory locations.
This patch normalizes ARM to also use these kind of MachineOperands so
that register selection can find which stack slot is necessary to fold
memory operands when "rm" is used.

A nice side effect of this change is that we're less likely to exhaust
registers for inline asm with many clobbers.

Link: #20571 (comment)


Full diff: https://github.com/llvm/llvm-project/pull/69654.diff

6 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+16-1)
  • (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp (+2-2)
  • (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+1-1)
  • (modified) llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp (+8-1)
  • (added) llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll (+42)
  • (modified) llvm/test/CodeGen/ARM/inlineasm3.ll (+1-1)
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 15cda9b9432d5f2..fbe6fb7521a1ff4 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -478,7 +478,22 @@ bool ARMAsmPrinter::PrintAsmMemoryOperand(const MachineInstr *MI,
 
   const MachineOperand &MO = MI->getOperand(OpNum);
   assert(MO.isReg() && "unexpected inline asm memory operand");
-  O << "[" << ARMInstPrinter::getRegisterName(MO.getReg()) << "]";
+  O << "[" << ARMInstPrinter::getRegisterName(MO.getReg());
+  if (MI->isInlineAsm()) {
+    assert(OpNum > 0 && "unexpected offset");
+    const MachineOperand &MD = MI->getOperand(OpNum - 1);
+    assert(MD.isImm() && "unexpected meta operand");
+    const InlineAsm::Flag F(MD.getImm());
+    const unsigned NumOps = F.getNumOperandRegisters();
+    if (NumOps == 2) {
+      assert(OpNum + 1 < MI->getNumOperands() && "missing offset operand");
+      const MachineOperand &OffsetOp = MI->getOperand(OpNum + 1);
+      assert(OffsetOp.isImm() && "unexpected inline asm memory operand");
+      if (uint64_t Offset = OffsetOp.getImm())
+        O << ", #" << Offset;
+    }
+  }
+  O << "]";
   return false;
 }
 
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
index 1ffdde0360cf623..d815b2288f46242 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
@@ -2649,9 +2649,9 @@ bool llvm::rewriteARMFrameIndex(MachineInstr &MI, unsigned FrameRegIdx,
   unsigned AddrMode = (Desc.TSFlags & ARMII::AddrModeMask);
   bool isSub = false;
 
-  // Memory operands in inline assembly always use AddrMode2.
+  // Memory operands in inline assembly always use AddrMode_i12.
   if (Opcode == ARM::INLINEASM || Opcode == ARM::INLINEASM_BR)
-    AddrMode = ARMII::AddrMode2;
+    AddrMode = ARMII::AddrMode_i12;
 
   if (Opcode == ARM::ADDri) {
     Offset += MI.getOperand(FrameRegIdx+1).getImm();
diff --git a/llvm/lib/Target/ARM/ARMFrameLowering.cpp b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
index a3a71a8ec09a450..3d919179e5c64e8 100644
--- a/llvm/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
@@ -2097,7 +2097,7 @@ static unsigned estimateRSStackSizeLimit(MachineFunction &MF,
   unsigned Limit = (1 << 12) - 1;
   for (auto &MBB : MF) {
     for (auto &MI : MBB) {
-      if (MI.isDebugInstr())
+      if (MI.isDebugInstr() || MI.isInlineAsm())
         continue;
       for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
         if (!MI.getOperand(i).isFI())
diff --git a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
index 984d8d3e0b08c4b..9c147cdd91734c1 100644
--- a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
+++ b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
@@ -5880,13 +5880,20 @@ bool ARMDAGToDAGISel::SelectInlineAsmMemoryOperand(
   case InlineAsm::ConstraintCode::Us:
   case InlineAsm::ConstraintCode::Ut:
   case InlineAsm::ConstraintCode::Uv:
-  case InlineAsm::ConstraintCode::Uy:
+  case InlineAsm::ConstraintCode::Uy: {
+    SDValue Base, OffImm;
+    if (Op.getOpcode() == ISD::FrameIndex && SelectAddrModeImm12(Op, Base, OffImm)) {
+      OutOps.push_back(Base);
+      OutOps.push_back(OffImm);
+      return false;
+    }
     // Require the address to be in a register.  That is safe for all ARM
     // variants and it is hard to do anything much smarter without knowing
     // how the operand is used.
     OutOps.push_back(Op);
     return false;
   }
+  }
   return true;
 }
 
diff --git a/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll b/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll
new file mode 100644
index 000000000000000..1e680fb4c03ee29
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -mtriple=arm-linux-gnueabihf -o - %s 2>&1 | FileCheck %s
+
+; The intent of this test is 3 fold:
+; 1. The first asm demonstrates a change in behavior of using sp.
+; 2. The second asm is a case the previously would cause register exhaustion
+; due to not using sp.
+; 3. The third asm demonstrates ARMAsmPrinter::PrintAsmMemoryOperand not
+; needing to print an offset when the offset is zero, and that we get the other
+; alloca's offset correct.
+define void @fail() {
+; CHECK-LABEL: fail:
+; CHECK:       @ %bb.0:
+; CHECK-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT:    push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT:    .pad #8
+; CHECK-NEXT:    sub sp, sp, #8
+; CHECK-NEXT:    mov r0, #42
+; CHECK-NEXT:    str r0, [sp, #4]
+; CHECK-NEXT:    @APP
+; CHECK-NEXT:    @ [sp, #4]
+; CHECK-NEXT:    @NO_APP
+; CHECK-NEXT:    @APP
+; CHECK-NEXT:    @ [sp, #4]
+; CHECK-NEXT:    @NO_APP
+; CHECK-NEXT:    mov r0, #99
+; CHECK-NEXT:    str r0, [sp]
+; CHECK-NEXT:    @APP
+; CHECK-NEXT:    @ [sp]
+; CHECK-NEXT:    @NO_APP
+; CHECK-NEXT:    add sp, sp, #8
+; CHECK-NEXT:    pop {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT:    mov pc, lr
+  %x = alloca i32, align 4
+  %y = alloca i32, align 4
+  store i32 42, ptr %x
+  call void asm sideeffect "# $0", "*m,~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %x)
+  call void asm sideeffect "# $0", "*m,~{r0},~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %x)
+  store i32 99, ptr %y
+  call void asm sideeffect "# $0", "*m,~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %y)
+  ret void
+}
diff --git a/llvm/test/CodeGen/ARM/inlineasm3.ll b/llvm/test/CodeGen/ARM/inlineasm3.ll
index 5589885bd1745e3..fd9bbe64a85a82f 100644
--- a/llvm/test/CodeGen/ARM/inlineasm3.ll
+++ b/llvm/test/CodeGen/ARM/inlineasm3.ll
@@ -105,7 +105,7 @@ entry:
 define void @t10(ptr %f, i32 %g) nounwind {
 entry:
 ; CHECK: t10
-; CHECK: str r1, [r0]
+; CHECK: str r1, [sp]
   %f.addr = alloca ptr, align 4
   store ptr %f, ptr %f.addr, align 4
   call void asm "str $1, $0", "=*Q,r"(ptr elementtype(ptr) %f.addr, i32 %g) nounwind

@github-actions
Copy link

github-actions bot commented Oct 19, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

…an MO_Register

Many other targets use stack slot indexes to represent memory locations.
This patch normalizes ARM to also use these kind of MachineOperands so
that register selection can find which stack slot is necessary to fold
memory operands when "rm" is used.

A nice side effect of this change is that we're less likely to exhaust
registers for inline asm with many clobbers.

Link: llvm#20571 (comment)
@efriedma-quic
Copy link
Collaborator

I suspect this will trigger assembler errors in some cases. ARM has a variety of addressing modes, and most of them aren't compatible with SelectAddrModeImm12. Particularly for the constraints that are not "m"; "Uv" is explicitly documented to be an operand for a "vldr", which only has a 10 bit offset.

And it looks like the handling for the case where the the stack frame is actually larger than 4096 bytes is missing?

@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Oct 20, 2023

Particularly for the constraints that are not "m"; "Uv" is explicitly documented to be an operand for a "vldr", which only has a 10 bit offset.

Yeah, I was waffling about splitting the cases in ARMDAGToDAGISel::SelectInlineAsmMemoryOperand to just do this change for the non-Ux constraints. I could change it back.

And it looks like the handling for the case where the the stack frame is actually larger than 4096 bytes is missing?

Is that possible to know (the current stack depth) during SelectionDAGISel? Or should that be handled elsewhere later? GCC just errors once the stack gets too large. (TODO(Nick): reproduce example; was this only under register pressure? hmm...)

also, this PR doesn't handle cases like:

void v (int *);

void foo (int *x) {
    asm ("# %0 %1"::"m"(x[1023]), "m"(x[1024]));
    v(x);
}

I should fix that, too.

@nickdesaulniers nickdesaulniers changed the title [ARMISelDAGToDAG] use MO_FrameIndex to represent FrameIndex rather than MO_Register [ARMISelDAGToDAG] use MO_FrameIndex to represent FrameIndex rather than MO_Register for INLINEASM Oct 20, 2023
@nickdesaulniers nickdesaulniers marked this pull request as draft October 20, 2023 23:04
@nickdesaulniers
Copy link
Member Author

I'll be reusing pieces of this change in a follow up PR, but for now, I won't change over arm in general to do this.

@efriedma-quic
Copy link
Collaborator

Is that possible to know (the current stack depth) during SelectionDAGISel? Or should that be handled elsewhere later?

You can sort of tell some things; for example, I think we track whether the function has dynamic allocation. But the stack layout isn't finalized until FrameLowering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants