-
Notifications
You must be signed in to change notification settings - Fork 15.6k
Description
This is something i noticed while implementing the global stack protector feature for SystemZ.
When doing frame index elimination, the SystemZ backend may realize that an instruction with a 12-bit displacement may read from a stack slot that is out of reach. In that case, it may insert an lay instruction to materialize the address into a register, or an llilh to prepare an index register, and then make the read relative to that. That introduces new virtual registers post-RA though: one in the operand slot that used to contain the frame index, and one def associated with the lay/llilh inserted above that.
scavengeFrameVirtualRegsInBlock (RegisterScavenging.cpp:402) is then called to savenge physical registers for new VRegs thus introduced. If the original instruction with the 12-bit displacement, which now has the new VReg as an operand, also defines a physical register rX, which is marked early-clobber, then the VReg must not be assigned to rX, in order to avoid an overlap between input and output forbidden by early-clobber.
However, exactly that happens. As stated above, i found this on expanding new pseudoinstructions post-RA for the purpose of implementing the global stack protector, but the same effect can be demonstrated, e.g. with inline asm:
__attribute__((noinline)) uint64_t repro(uint64_t in) {
// Big char buffer to trigger lay insertion
volatile char big[16384];
uint64_t out;
// Use a memory operand accessing the big buffer
asm volatile(
" lg %0, %1\n\t"
: "=&r"(out) // <-- early-clobber output GPR
: "m"(big[5000]) // <-- memory operand -> far FI
: // Broad clobber to force the problematic situation.
// Correct response would be to spill, but this does not happen.
"r2","r3","r4","r5","r6","r7","r8",
"r9","r10","r11","r12","r13","r14",
"cc","memory");
return out + in;
}This will, during Prologue-Epilogue Insertion, insert a LAY instruction before the inline asm block, which targets exactly the physical register that is also marked early-clobber, i.e. the MIR will go from
INLINEASM &" lg $0, $1\0A\09", [...], def early-clobber renamable $r1d, 262174 /* mem:m */, %stack.0, [...]
to
$r1d = LAY $r15d, 4096, $noreg
INLINEASM &" lg $0, $1\0A\09", [...] def early-clobber renamable $r1d, 262174 /* mem:m */, killed $r1d, [...]
Looking at RegisterScavenging.cpp, it seems the core of the search for a register to assign is the call to findSurvivorBackwards in RegScavenger::scavengeRegisterBackwards:
std::pair<MCPhysReg, MachineBasicBlock::iterator> P = findSurvivorBackwards(
*MRI, std::prev(MBBI), To, LiveUnits, AllocationOrder, RestoreAfter);Here To is the instruction defining the VReg (LAY above), while MBBI stands on the instruction using the VReg (INLINEASM above). Note that the search starts at std::prev(MBBI), i.e. it does not include the using instruction, and thus cannot include the early-clobber constraint associated with it. The LiveRegUnits instance LiveUnits also does not contain this information.
I have found that both
- extending the search to include the using instruction (i.e. using
MBBIinstead ofstd::prev(MBBI)), and - temporarily adding any early clobber def to
LiveUnits
solve the problem, with the former breaking an x86 test case in ways that i am not sure is problematic, as well as needing additional safeguards (e.g., to account for cases like MBBI == MBB.end(). But, i am not sure what the best way to proceed here is.
I have a repro that is attached to this issue that demonstrates the inline-asm case outlined above:
repro.tar.gz