-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARMISelDAGToDAG] use MO_FrameIndex to represent FrameIndex rather than MO_Register for INLINEASM #69654
Conversation
@llvm/pr-subscribers-backend-arm Author: Nick Desaulniers (nickdesaulniers) ChangesMany other targets use stack slot indexes to represent memory locations. A nice side effect of this change is that we're less likely to exhaust Link: #20571 (comment) Full diff: https://github.com/llvm/llvm-project/pull/69654.diff 6 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 15cda9b9432d5f2..fbe6fb7521a1ff4 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -478,7 +478,22 @@ bool ARMAsmPrinter::PrintAsmMemoryOperand(const MachineInstr *MI,
const MachineOperand &MO = MI->getOperand(OpNum);
assert(MO.isReg() && "unexpected inline asm memory operand");
- O << "[" << ARMInstPrinter::getRegisterName(MO.getReg()) << "]";
+ O << "[" << ARMInstPrinter::getRegisterName(MO.getReg());
+ if (MI->isInlineAsm()) {
+ assert(OpNum > 0 && "unexpected offset");
+ const MachineOperand &MD = MI->getOperand(OpNum - 1);
+ assert(MD.isImm() && "unexpected meta operand");
+ const InlineAsm::Flag F(MD.getImm());
+ const unsigned NumOps = F.getNumOperandRegisters();
+ if (NumOps == 2) {
+ assert(OpNum + 1 < MI->getNumOperands() && "missing offset operand");
+ const MachineOperand &OffsetOp = MI->getOperand(OpNum + 1);
+ assert(OffsetOp.isImm() && "unexpected inline asm memory operand");
+ if (uint64_t Offset = OffsetOp.getImm())
+ O << ", #" << Offset;
+ }
+ }
+ O << "]";
return false;
}
diff --git a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
index 1ffdde0360cf623..d815b2288f46242 100644
--- a/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
@@ -2649,9 +2649,9 @@ bool llvm::rewriteARMFrameIndex(MachineInstr &MI, unsigned FrameRegIdx,
unsigned AddrMode = (Desc.TSFlags & ARMII::AddrModeMask);
bool isSub = false;
- // Memory operands in inline assembly always use AddrMode2.
+ // Memory operands in inline assembly always use AddrMode_i12.
if (Opcode == ARM::INLINEASM || Opcode == ARM::INLINEASM_BR)
- AddrMode = ARMII::AddrMode2;
+ AddrMode = ARMII::AddrMode_i12;
if (Opcode == ARM::ADDri) {
Offset += MI.getOperand(FrameRegIdx+1).getImm();
diff --git a/llvm/lib/Target/ARM/ARMFrameLowering.cpp b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
index a3a71a8ec09a450..3d919179e5c64e8 100644
--- a/llvm/lib/Target/ARM/ARMFrameLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMFrameLowering.cpp
@@ -2097,7 +2097,7 @@ static unsigned estimateRSStackSizeLimit(MachineFunction &MF,
unsigned Limit = (1 << 12) - 1;
for (auto &MBB : MF) {
for (auto &MI : MBB) {
- if (MI.isDebugInstr())
+ if (MI.isDebugInstr() || MI.isInlineAsm())
continue;
for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
if (!MI.getOperand(i).isFI())
diff --git a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
index 984d8d3e0b08c4b..9c147cdd91734c1 100644
--- a/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
+++ b/llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
@@ -5880,13 +5880,20 @@ bool ARMDAGToDAGISel::SelectInlineAsmMemoryOperand(
case InlineAsm::ConstraintCode::Us:
case InlineAsm::ConstraintCode::Ut:
case InlineAsm::ConstraintCode::Uv:
- case InlineAsm::ConstraintCode::Uy:
+ case InlineAsm::ConstraintCode::Uy: {
+ SDValue Base, OffImm;
+ if (Op.getOpcode() == ISD::FrameIndex && SelectAddrModeImm12(Op, Base, OffImm)) {
+ OutOps.push_back(Base);
+ OutOps.push_back(OffImm);
+ return false;
+ }
// Require the address to be in a register. That is safe for all ARM
// variants and it is hard to do anything much smarter without knowing
// how the operand is used.
OutOps.push_back(Op);
return false;
}
+ }
return true;
}
diff --git a/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll b/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll
new file mode 100644
index 000000000000000..1e680fb4c03ee29
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/inlineasm-m-constraint.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -mtriple=arm-linux-gnueabihf -o - %s 2>&1 | FileCheck %s
+
+; The intent of this test is 3 fold:
+; 1. The first asm demonstrates a change in behavior of using sp.
+; 2. The second asm is a case the previously would cause register exhaustion
+; due to not using sp.
+; 3. The third asm demonstrates ARMAsmPrinter::PrintAsmMemoryOperand not
+; needing to print an offset when the offset is zero, and that we get the other
+; alloca's offset correct.
+define void @fail() {
+; CHECK-LABEL: fail:
+; CHECK: @ %bb.0:
+; CHECK-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT: .pad #8
+; CHECK-NEXT: sub sp, sp, #8
+; CHECK-NEXT: mov r0, #42
+; CHECK-NEXT: str r0, [sp, #4]
+; CHECK-NEXT: @APP
+; CHECK-NEXT: @ [sp, #4]
+; CHECK-NEXT: @NO_APP
+; CHECK-NEXT: @APP
+; CHECK-NEXT: @ [sp, #4]
+; CHECK-NEXT: @NO_APP
+; CHECK-NEXT: mov r0, #99
+; CHECK-NEXT: str r0, [sp]
+; CHECK-NEXT: @APP
+; CHECK-NEXT: @ [sp]
+; CHECK-NEXT: @NO_APP
+; CHECK-NEXT: add sp, sp, #8
+; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r10, r11, lr}
+; CHECK-NEXT: mov pc, lr
+ %x = alloca i32, align 4
+ %y = alloca i32, align 4
+ store i32 42, ptr %x
+ call void asm sideeffect "# $0", "*m,~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %x)
+ call void asm sideeffect "# $0", "*m,~{r0},~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %x)
+ store i32 99, ptr %y
+ call void asm sideeffect "# $0", "*m,~{r1},~{r2},~{r3},~{r12},~{lr},~{r4},~{r5},~{r6},~{r7},~{r8},~{r9},~{r10},~{r11}"(ptr elementtype(i32) %y)
+ ret void
+}
diff --git a/llvm/test/CodeGen/ARM/inlineasm3.ll b/llvm/test/CodeGen/ARM/inlineasm3.ll
index 5589885bd1745e3..fd9bbe64a85a82f 100644
--- a/llvm/test/CodeGen/ARM/inlineasm3.ll
+++ b/llvm/test/CodeGen/ARM/inlineasm3.ll
@@ -105,7 +105,7 @@ entry:
define void @t10(ptr %f, i32 %g) nounwind {
entry:
; CHECK: t10
-; CHECK: str r1, [r0]
+; CHECK: str r1, [sp]
%f.addr = alloca ptr, align 4
store ptr %f, ptr %f.addr, align 4
call void asm "str $1, $0", "=*Q,r"(ptr elementtype(ptr) %f.addr, i32 %g) nounwind
|
5c57cb8
to
97c561e
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
97c561e
to
e03b26f
Compare
…an MO_Register Many other targets use stack slot indexes to represent memory locations. This patch normalizes ARM to also use these kind of MachineOperands so that register selection can find which stack slot is necessary to fold memory operands when "rm" is used. A nice side effect of this change is that we're less likely to exhaust registers for inline asm with many clobbers. Link: llvm#20571 (comment)
e03b26f
to
8237cb7
Compare
I suspect this will trigger assembler errors in some cases. ARM has a variety of addressing modes, and most of them aren't compatible with SelectAddrModeImm12. Particularly for the constraints that are not "m"; "Uv" is explicitly documented to be an operand for a "vldr", which only has a 10 bit offset. And it looks like the handling for the case where the the stack frame is actually larger than 4096 bytes is missing? |
Yeah, I was waffling about splitting the cases in
Is that possible to know (the current stack depth) during SelectionDAGISel? Or should that be handled elsewhere later? GCC just errors once the stack gets too large. (TODO(Nick): reproduce example; was this only under register pressure? hmm...) also, this PR doesn't handle cases like: void v (int *);
void foo (int *x) {
asm ("# %0 %1"::"m"(x[1023]), "m"(x[1024]));
v(x);
} I should fix that, too. |
I'll be reusing pieces of this change in a follow up PR, but for now, I won't change over arm in general to do this. |
You can sort of tell some things; for example, I think we track whether the function has dynamic allocation. But the stack layout isn't finalized until FrameLowering. |
Many other targets use stack slot indexes to represent memory locations.
This patch normalizes ARM to also use these kind of MachineOperands so
that register selection can find which stack slot is necessary to fold
memory operands when "rm" is used.
A nice side effect of this change is that we're less likely to exhaust
registers for inline asm with many clobbers.
Link: #20571 (comment)