Skip to content

Commit

Permalink
[RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru ope…
Browse files Browse the repository at this point in the history
…rands

In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in #64282.

This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.

We may end up backporting this into the 17.x release branch.  Given how late in the release cycle this is landing, that's much less likely now, but still a possibility.

Differential Revision: https://reviews.llvm.org/D156909
  • Loading branch information
preames committed Aug 14, 2023
1 parent d9c2c6f commit a63bd7e
Show file tree
Hide file tree
Showing 123 changed files with 5,790 additions and 5,573 deletions.
46 changes: 46 additions & 0 deletions llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,14 @@ void RISCVDAGToDAGISel::PostprocessISelDAG() {

MadeChange |= doPeepholeMergeVVMFold();

// After we're done with everything else, convert IMPLICIT_DEF
// passthru operands to NoRegister. This is required to workaround
// an optimization deficiency in MachineCSE. This really should
// be merged back into each of the patterns (i.e. there's no good
// reason not to go directly to NoReg), but is being done this way
// to allow easy backporting.
MadeChange |= doPeepholeNoRegPassThru();

if (MadeChange)
CurDAG->RemoveDeadNodes();
}
Expand Down Expand Up @@ -3593,6 +3601,44 @@ bool RISCVDAGToDAGISel::doPeepholeMergeVVMFold() {
return MadeChange;
}

/// If our passthru is an implicit_def, use noreg instead. This side
/// steps issues with MachineCSE not being able to CSE expressions with
/// IMPLICIT_DEF operands while preserving the semantic intent. See
/// pr64282 for context. Note that this transform is the last one
/// performed at ISEL DAG to DAG.
bool RISCVDAGToDAGISel::doPeepholeNoRegPassThru() {
bool MadeChange = false;
SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();

while (Position != CurDAG->allnodes_begin()) {
SDNode *N = &*--Position;
if (N->use_empty() || !N->isMachineOpcode())
continue;

const unsigned Opc = N->getMachineOpcode();
if (!RISCVVPseudosTable::getPseudoInfo(Opc) ||
!RISCVII::isFirstDefTiedToFirstUse(TII->get(Opc)) ||
!isImplicitDef(N->getOperand(0)))
continue;

SmallVector<SDValue> Ops;
Ops.push_back(CurDAG->getRegister(RISCV::NoRegister, N->getValueType(0)));
for (unsigned I = 1, E = N->getNumOperands(); I != E; I++) {
SDValue Op = N->getOperand(I);
Ops.push_back(Op);
}

MachineSDNode *Result =
CurDAG->getMachineNode(Opc, SDLoc(N), N->getVTList(), Ops);
Result->setFlags(N->getFlags());
CurDAG->setNodeMemRefs(Result, cast<MachineSDNode>(N)->memoperands());
ReplaceUses(N, Result);
MadeChange = true;
}
return MadeChange;
}


// This pass converts a legalized DAG into a RISCV-specific DAG, ready
// for instruction scheduling.
FunctionPass *llvm::createRISCVISelDag(RISCVTargetMachine &TM,
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ class RISCVDAGToDAGISel : public SelectionDAGISel {
bool doPeepholeSExtW(SDNode *Node);
bool doPeepholeMaskedRVV(MachineSDNode *Node);
bool doPeepholeMergeVVMFold();
bool doPeepholeNoRegPassThru();
bool performVMergeToVMv(SDNode *N);
bool performCombineVMergeAndVOps(SDNode *N);
};
Expand Down
8 changes: 6 additions & 2 deletions llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,13 @@ static bool hasUndefinedMergeOp(const MachineInstr &MI,
// lanes are undefined.
return true;

// If the tied operand is an IMPLICIT_DEF (or a REG_SEQUENCE whose operands
// are solely IMPLICIT_DEFS), the pass through lanes are undefined.
// If the tied operand is NoReg, an IMPLICIT_DEF, or a REG_SEQEUENCE whose
// operands are solely IMPLICIT_DEFS, then the pass through lanes are
// undefined.
const MachineOperand &UseMO = MI.getOperand(UseOpIdx);
if (UseMO.getReg() == RISCV::NoRegister)
return true;

if (MachineInstr *UseMI = MRI.getVRegDef(UseMO.getReg())) {
if (UseMI->isImplicitDef())
return true;
Expand Down
22 changes: 11 additions & 11 deletions llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
Original file line number Diff line number Diff line change
Expand Up @@ -52,16 +52,16 @@
///
/// Currently, the policy is represented via the following instrinsic families:
/// * _MASK - Can represent all three policy states for both tail and mask. If
/// passthrough is IMPLICIT_DEF, then represents "undefined". Otherwise,
/// policy operand and tablegen flags drive the interpretation. (If policy
/// operand is not present - there are a couple, thought we're rapidly
/// removing them - a non-undefined policy defaults to "tail agnostic", and
/// "mask undisturbed". Since this is the only variant with a mask, all
/// other variants are "mask undefined".
/// passthrough is IMPLICIT_DEF (or NoReg), then represents "undefined".
/// Otherwise, policy operand and tablegen flags drive the interpretation.
/// (If policy operand is not present - there are a couple, though we're
/// rapidly removing them - a non-undefined policy defaults to "tail
/// agnostic", and "mask undisturbed". Since this is the only variant with
/// a mask, all other variants are "mask undefined".
/// * Unsuffixed w/ both passthrough and policy operand. Can represent all
/// three policy states. If passthrough is IMPLICIT_DEF, then represents
/// "undefined". Otherwise, policy operand and tablegen flags drive the
/// interpretation.
/// three policy states. If passthrough is IMPLICIT_DEF (or NoReg), then
/// represents "undefined". Otherwise, policy operand and tablegen flags
/// drive the interpretation.
/// * Unsuffixed w/o passthrough or policy operand -- Does not have a
/// passthrough operand, and thus represents the "undefined" state. Note
/// that terminology in code frequently refers to these as "TA" which is
Expand All @@ -70,8 +70,8 @@
/// * _TU w/o policy operand -- Has a passthrough operand, and always
/// represents the tail undisturbed state.
/// * _TU w/policy operand - Can represent all three policy states. If
/// passthrough is IMPLICIT_DEF, then represents "undefined". Otherwise,
/// policy operand and tablegen flags drive the interpretation.
/// passthrough is IMPLICIT_DEF (or NoReg), then represents "undefined".
/// Otherwise, policy operand and tablegen flags drive the interpretation.
///
//===----------------------------------------------------------------------===//

Expand Down
29 changes: 28 additions & 1 deletion llvm/lib/Target/RISCV/RISCVRVVInitUndef.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
// This file implements a function pass that initializes undef vector value to
// temporary pseudo instruction and remove it in expandpseudo pass to prevent
// register allocation resulting in a constraint violated result for vector
// instruction.
// instruction. It also rewrites the NoReg tied operand back to an
// IMPLICIT_DEF.
//
// RISC-V vector instruction has register overlapping constraint for certain
// instructions, and will cause illegal instruction trap if violated, we use
Expand All @@ -30,6 +31,12 @@
//
// See also: https://github.com/llvm/llvm-project/issues/50157
//
// Additionally, this pass rewrites tied operands of vector instructions
// from NoReg to IMPLICIT_DEF. (Not that this is a non-overlapping set of
// operands to the above.) We use NoReg to side step a MachineCSE
// optimization quality problem but need to convert back before
// TwoAddressInstruction. See pr64282 for context.
//
//===----------------------------------------------------------------------===//

#include "RISCV.h"
Expand Down Expand Up @@ -244,6 +251,26 @@ bool RISCVInitUndef::processBasicBlock(MachineFunction &MF,
bool Changed = false;
for (MachineBasicBlock::iterator I = MBB.begin(); I != MBB.end(); ++I) {
MachineInstr &MI = *I;

// If we used NoReg to represent the passthru, switch this back to being
// an IMPLICIT_DEF before TwoAddressInstructions.
unsigned UseOpIdx;
if (MI.getNumDefs() != 0 && MI.isRegTiedToUseOperand(0, &UseOpIdx)) {
MachineOperand &UseMO = MI.getOperand(UseOpIdx);
if (UseMO.getReg() == RISCV::NoRegister) {
const TargetRegisterClass *RC =
TII->getRegClass(MI.getDesc(), UseOpIdx, TRI, MF);
Register NewDest = MRI->createVirtualRegister(RC);
// We don't have a way to update dead lanes, so keep track of the
// new register so that we avoid querying it later.
NewRegs.insert(NewDest);
BuildMI(MBB, I, I->getDebugLoc(),
TII->get(TargetOpcode::IMPLICIT_DEF), NewDest);
UseMO.setReg(NewDest);
Changed = true;
}
}

if (ST->enableSubRegLiveness() && isEarlyClobberMI(MI))
Changed |= handleSubReg(MF, MI, DLD);
if (MI.isImplicitDef()) {
Expand Down
10 changes: 8 additions & 2 deletions llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ class RISCVPassConfig : public TargetPassConfig {
void addPreRegAlloc() override;
void addPostRegAlloc() override;
void addOptimizedRegAlloc() override;
void addFastRegAlloc() override;
};
} // namespace

Expand Down Expand Up @@ -392,12 +393,17 @@ void RISCVPassConfig::addPreRegAlloc() {
}

void RISCVPassConfig::addOptimizedRegAlloc() {
if (getOptimizeRegAlloc())
insertPass(&DetectDeadLanesID, &RISCVInitUndefID);
insertPass(&DetectDeadLanesID, &RISCVInitUndefID);

TargetPassConfig::addOptimizedRegAlloc();
}

void RISCVPassConfig::addFastRegAlloc() {
addPass(createRISCVInitUndefPass());
TargetPassConfig::addFastRegAlloc();
}


void RISCVPassConfig::addPostRegAlloc() {
if (TM->getOptLevel() != CodeGenOpt::None && EnableRedundantCopyElimination)
addPass(createRISCVRedundantCopyEliminationPass());
Expand Down
1 change: 1 addition & 0 deletions llvm/test/CodeGen/RISCV/O0-pipeline.ll
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
; CHECK-NEXT: RISC-V Pre-RA pseudo instruction expansion pass
; CHECK-NEXT: RISC-V Insert VSETVLI pass
; CHECK-NEXT: RISC-V Insert Read/Write CSR Pass
; CHECK-NEXT: RISC-V init undef pass
; CHECK-NEXT: Eliminate PHI nodes for register allocation
; CHECK-NEXT: Two-Address instruction pass
; CHECK-NEXT: Fast Register Allocator
Expand Down
6 changes: 3 additions & 3 deletions llvm/test/CodeGen/RISCV/calling-conv-vector-on-stack.ll
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ define void @bar() nounwind {
; CHECK-NEXT: andi sp, sp, -64
; CHECK-NEXT: mv s1, sp
; CHECK-NEXT: addi sp, sp, -16
; CHECK-NEXT: addi a0, s1, 64
; CHECK-NEXT: sd a0, 0(sp)
; CHECK-NEXT: vsetvli a1, zero, e32, m8, ta, ma
; CHECK-NEXT: vsetvli a0, zero, e32, m8, ta, ma
; CHECK-NEXT: vmv.v.i v8, 0
; CHECK-NEXT: addi a0, s1, 64
; CHECK-NEXT: vs8r.v v8, (a0)
; CHECK-NEXT: sd a0, 0(sp)
; CHECK-NEXT: li a0, 0
; CHECK-NEXT: li a1, 0
; CHECK-NEXT: li a2, 0
Expand Down
24 changes: 12 additions & 12 deletions llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll
Original file line number Diff line number Diff line change
Expand Up @@ -103,15 +103,15 @@ define <8 x i1> @fv8(ptr %p, i64 %index, i64 %tc) {
define <32 x i1> @fv32(ptr %p, i64 %index, i64 %tc) {
; CHECK-LABEL: fv32:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: lui a0, %hi(.LCPI8_0)
; CHECK-NEXT: addi a0, a0, %lo(.LCPI8_0)
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: vle64.v v8, (a0)
; CHECK-NEXT: vid.v v16
; CHECK-NEXT: vsaddu.vx v16, v16, a1
; CHECK-NEXT: vmsltu.vx v0, v16, a2
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v16, v8, a2
; CHECK-NEXT: vid.v v8
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v0, v8, a2
; CHECK-NEXT: vsetivli zero, 4, e8, mf4, ta, ma
; CHECK-NEXT: vslideup.vi v0, v16, 2
; CHECK-NEXT: ret
Expand All @@ -122,15 +122,15 @@ define <32 x i1> @fv32(ptr %p, i64 %index, i64 %tc) {
define <64 x i1> @fv64(ptr %p, i64 %index, i64 %tc) {
; CHECK-LABEL: fv64:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: lui a0, %hi(.LCPI9_0)
; CHECK-NEXT: addi a0, a0, %lo(.LCPI9_0)
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: vle64.v v8, (a0)
; CHECK-NEXT: vid.v v16
; CHECK-NEXT: vsaddu.vx v16, v16, a1
; CHECK-NEXT: vmsltu.vx v0, v16, a2
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v16, v8, a2
; CHECK-NEXT: vid.v v8
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v0, v8, a2
; CHECK-NEXT: vsetivli zero, 4, e8, mf2, tu, ma
; CHECK-NEXT: vslideup.vi v0, v16, 2
; CHECK-NEXT: lui a0, %hi(.LCPI9_1)
Expand All @@ -157,15 +157,15 @@ define <64 x i1> @fv64(ptr %p, i64 %index, i64 %tc) {
define <128 x i1> @fv128(ptr %p, i64 %index, i64 %tc) {
; CHECK-LABEL: fv128:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: lui a0, %hi(.LCPI10_0)
; CHECK-NEXT: addi a0, a0, %lo(.LCPI10_0)
; CHECK-NEXT: vsetivli zero, 16, e64, m8, ta, ma
; CHECK-NEXT: vle64.v v8, (a0)
; CHECK-NEXT: vid.v v16
; CHECK-NEXT: vsaddu.vx v16, v16, a1
; CHECK-NEXT: vmsltu.vx v0, v16, a2
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v16, v8, a2
; CHECK-NEXT: vid.v v8
; CHECK-NEXT: vsaddu.vx v8, v8, a1
; CHECK-NEXT: vmsltu.vx v0, v8, a2
; CHECK-NEXT: vsetivli zero, 4, e8, m1, tu, ma
; CHECK-NEXT: vslideup.vi v0, v16, 2
; CHECK-NEXT: lui a0, %hi(.LCPI10_1)
Expand Down
Loading

0 comments on commit a63bd7e

Please sign in to comment.