-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU/GlobalISel: lane masks merging #73337
AMDGPU/GlobalISel: lane masks merging #73337
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Petar Avramovic (petar-avramovic) ChangesBasic implementation of lane mask merging for GlobalISel. Patch is 279.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73337.diff 21 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
index bbd09d7d151ba07..765cabdb313097b 100644
--- a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
+++ b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
@@ -40,8 +40,8 @@ class MachineSSAUpdater {
//typedef DenseMap<MachineBasicBlock*, Register> AvailableValsTy;
void *AV = nullptr;
- /// VRC - Register class of the current virtual register.
- const TargetRegisterClass *VRC = nullptr;
+ /// RegAttrs - current virtual register, new registers copy its attributes.
+ Register RegAttrs;
/// InsertedPHIs - If this is non-null, the MachineSSAUpdater adds all PHI
/// nodes that it creates to the vector.
@@ -62,7 +62,6 @@ class MachineSSAUpdater {
/// Initialize - Reset this object to get ready for a new set of SSA
/// updates.
void Initialize(Register V);
- void Initialize(const TargetRegisterClass *RC);
/// AddAvailableValue - Indicate that a rewritten value is available at the
/// end of the specified block with the specified value.
diff --git a/llvm/lib/CodeGen/MachineSSAUpdater.cpp b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
index 48076663ddf5382..48537057e2031a8 100644
--- a/llvm/lib/CodeGen/MachineSSAUpdater.cpp
+++ b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
@@ -51,17 +51,13 @@ MachineSSAUpdater::~MachineSSAUpdater() {
/// Initialize - Reset this object to get ready for a new set of SSA
/// updates.
-void MachineSSAUpdater::Initialize(const TargetRegisterClass *RC) {
+void MachineSSAUpdater::Initialize(Register V) {
if (!AV)
AV = new AvailableValsTy();
else
getAvailableVals(AV).clear();
- VRC = RC;
-}
-
-void MachineSSAUpdater::Initialize(Register V) {
- Initialize(MRI->getRegClass(V));
+ RegAttrs = V;
}
/// HasValueForBlock - Return true if the MachineSSAUpdater already has a value for
@@ -115,13 +111,12 @@ Register LookForIdenticalPHI(MachineBasicBlock *BB,
/// InsertNewDef - Insert an empty PHI or IMPLICIT_DEF instruction which define
/// a value of the given register class at the start of the specified basic
/// block. It returns the virtual register defined by the instruction.
-static
-MachineInstrBuilder InsertNewDef(unsigned Opcode,
- MachineBasicBlock *BB, MachineBasicBlock::iterator I,
- const TargetRegisterClass *RC,
- MachineRegisterInfo *MRI,
- const TargetInstrInfo *TII) {
- Register NewVR = MRI->createVirtualRegister(RC);
+static MachineInstrBuilder InsertNewDef(unsigned Opcode, MachineBasicBlock *BB,
+ MachineBasicBlock::iterator I,
+ Register RegAttrs,
+ MachineRegisterInfo *MRI,
+ const TargetInstrInfo *TII) {
+ Register NewVR = MRI->cloneVirtualRegister(RegAttrs);
return BuildMI(*BB, I, DebugLoc(), TII->get(Opcode), NewVR);
}
@@ -158,9 +153,9 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
if (ExistingValueOnly)
return Register();
// Insert an implicit_def to represent an undef value.
- MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
- BB, BB->getFirstTerminator(),
- VRC, MRI, TII);
+ MachineInstr *NewDef =
+ InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstTerminator(),
+ RegAttrs, MRI, TII);
return NewDef->getOperand(0).getReg();
}
@@ -197,8 +192,8 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
// Otherwise, we do need a PHI: insert one now.
MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
- MachineInstrBuilder InsertedPHI = InsertNewDef(TargetOpcode::PHI, BB,
- Loc, VRC, MRI, TII);
+ MachineInstrBuilder InsertedPHI =
+ InsertNewDef(TargetOpcode::PHI, BB, Loc, RegAttrs, MRI, TII);
// Fill in all the predecessors of the PHI.
for (unsigned i = 0, e = PredValues.size(); i != e; ++i)
@@ -300,10 +295,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
static Register GetUndefVal(MachineBasicBlock *BB,
MachineSSAUpdater *Updater) {
// Insert an implicit_def to represent an undef value.
- MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
- BB, BB->getFirstNonPHI(),
- Updater->VRC, Updater->MRI,
- Updater->TII);
+ MachineInstr *NewDef =
+ InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstNonPHI(),
+ Updater->RegAttrs, Updater->MRI, Updater->TII);
return NewDef->getOperand(0).getReg();
}
@@ -312,9 +306,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
static Register CreateEmptyPHI(MachineBasicBlock *BB, unsigned NumPreds,
MachineSSAUpdater *Updater) {
MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
- MachineInstr *PHI = InsertNewDef(TargetOpcode::PHI, BB, Loc,
- Updater->VRC, Updater->MRI,
- Updater->TII);
+ MachineInstr *PHI =
+ InsertNewDef(TargetOpcode::PHI, BB, Loc, Updater->RegAttrs,
+ Updater->MRI, Updater->TII);
return PHI->getOperand(0).getReg();
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 323560a46f31de2..007d64944244a09 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -36,6 +36,7 @@ FunctionPass *createSIAnnotateControlFlowPass();
FunctionPass *createSIFoldOperandsPass();
FunctionPass *createSIPeepholeSDWAPass();
FunctionPass *createSILowerI1CopiesPass();
+FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
FunctionPass *createSIShrinkInstructionsPass();
FunctionPass *createSILoadStoreOptimizerPass();
FunctionPass *createSIWholeQuadModePass();
@@ -162,6 +163,9 @@ extern char &SILowerWWMCopiesID;
void initializeSILowerI1CopiesPass(PassRegistry &);
extern char &SILowerI1CopiesID;
+void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
+extern char &AMDGPUGlobalISelDivergenceLoweringID;
+
void initializeSILowerSGPRSpillsPass(PassRegistry &);
extern char &SILowerSGPRSpillsID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
new file mode 100644
index 000000000000000..a955a08b0a3776e
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
@@ -0,0 +1,213 @@
+//===-- AMDGPUGlobalISelDivergenceLowering.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// GlobalISel pass that selects divergent i1 phis as lane mask phis.
+// Lane mask merging uses same algorithm as SDAG in SILowerI1Copies.
+// Handles all cases of temporal divergence.
+//
+// For divergent non-phi i1 and uniform i1 uses outside of the cycle this pass
+// currently depends on LCSSA to insert phis with one incoming.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "SILowerI1Copies.h"
+#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineUniformityAnalysis.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "global-isel-divergence-lowering"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPUGlobalISelDivergenceLowering : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPUGlobalISelDivergenceLowering() : MachineFunctionPass(ID) {
+ initializeAMDGPUGlobalISelDivergenceLoweringPass(
+ *PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override {
+ return "GlobalISel divergence lowering";
+ }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.setPreservesCFG();
+ AU.addRequired<MachineCycleInfoWrapperPass>();
+ AU.addRequired<MachineDominatorTree>();
+ AU.addRequired<MachinePostDominatorTree>();
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+};
+
+class DivergenceLoweringHelper : public PhiLoweringHelper {
+public:
+ DivergenceLoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT,
+ MachineUniformityInfo *MUI);
+
+private:
+ MachineUniformityInfo *MUI = nullptr;
+
+public:
+ void markAsLaneMask(Register DstReg) const override;
+ void getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const override;
+ void collectIncomingValuesFromPhi(
+ const MachineInstr *MI,
+ SmallVectorImpl<Incoming> &Incomings) const override;
+ void replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) override;
+ void buildMergeLaneMasks(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg,
+ Register CurReg) override;
+ void constrainIncomingRegisterTakenAsIs(Incoming &In) override;
+};
+
+DivergenceLoweringHelper::DivergenceLoweringHelper(
+ MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT, MachineUniformityInfo *MUI)
+ : PhiLoweringHelper(MF, DT, PDT), MUI(MUI) {}
+
+// _(s1) -> SReg_32/64(s1)
+void DivergenceLoweringHelper::markAsLaneMask(Register DstReg) const {
+ assert(MRI->getType(DstReg) == LLT::scalar(1));
+
+ // Can't blindly set a register class on phi, users could have reg class
+ // constraints (e.g. sreg_32/64..._xexec classes for control flow intrinsics).
+ if (MRI->getRegClassOrNull(DstReg))
+ return;
+
+ MRI->setRegClass(DstReg, ST->getBoolRC());
+ return;
+}
+
+void DivergenceLoweringHelper::getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const {
+ LLT S1 = LLT::scalar(1);
+
+ // Add divergent i1 phis to the list
+ for (MachineBasicBlock &MBB : *MF) {
+ for (MachineInstr &MI : MBB.phis()) {
+ Register Dst = MI.getOperand(0).getReg();
+ if (MRI->getType(Dst) == S1 && MUI->isDivergent(Dst))
+ Vreg1Phis.push_back(&MI);
+ }
+ }
+
+ return;
+}
+
+void DivergenceLoweringHelper::collectIncomingValuesFromPhi(
+ const MachineInstr *MI, SmallVectorImpl<Incoming> &Incomings) const {
+ for (unsigned i = 1; i < MI->getNumOperands(); i += 2) {
+ Incomings.emplace_back(MI->getOperand(i).getReg(),
+ MI->getOperand(i + 1).getMBB(), Register());
+ }
+}
+
+void DivergenceLoweringHelper::replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) {
+ BuildMI(*MBB, MBB->getFirstNonPHI(), {}, TII->get(AMDGPU::COPY), OldReg)
+ .addReg(NewReg);
+}
+
+// Get pointers to build instruction just after MI (skips phis if needed)
+static std::pair<MachineBasicBlock *, MachineBasicBlock::iterator>
+getInsertAfterPtrs(MachineInstr *MI) {
+ MachineBasicBlock *InsertMBB = MI->getParent();
+ return std::make_pair(
+ InsertMBB, InsertMBB->SkipPHIsAndLabels(std::next(MI->getIterator())));
+}
+
+void DivergenceLoweringHelper::buildMergeLaneMasks(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg, Register CurReg) {
+ // TODO: check if inputs are constants or results of a compare.
+
+ Register PrevRegCopy = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ auto [PrevMBB, AfterPrevReg] = getInsertAfterPtrs(MRI->getVRegDef(PrevReg));
+ BuildMI(*PrevMBB, AfterPrevReg, DL, TII->get(AMDGPU::COPY), PrevRegCopy)
+ .addReg(PrevReg);
+ Register PrevMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ BuildMI(MBB, I, DL, TII->get(AndN2Op), PrevMaskedReg)
+ .addReg(PrevRegCopy)
+ .addReg(ExecReg);
+
+ Register CurRegCopy = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ auto [CurMBB, AfterCurReg] = getInsertAfterPtrs(MRI->getVRegDef(CurReg));
+ BuildMI(*CurMBB, AfterCurReg, DL, TII->get(AMDGPU::COPY), CurRegCopy)
+ .addReg(CurReg);
+ Register CurMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ BuildMI(MBB, I, DL, TII->get(AndOp), CurMaskedReg)
+ .addReg(ExecReg)
+ .addReg(CurRegCopy);
+
+ BuildMI(MBB, I, DL, TII->get(OrOp), DstReg)
+ .addReg(PrevMaskedReg)
+ .addReg(CurMaskedReg);
+
+ return;
+}
+
+// GlobalISel has to constrain S1 incoming taken as-is with lane mask register
+// class. Insert a copy of Incoming.Reg to new lane mask inside Incoming.Block,
+// Incoming.Reg becomes that new lane mask.
+void DivergenceLoweringHelper::constrainIncomingRegisterTakenAsIs(
+ Incoming &In) {
+ MachineIRBuilder B(*MF);
+ B.setInsertPt(*In.Block, In.Block->getFirstTerminator());
+
+ auto Copy = B.buildCopy(LLT::scalar(1), In.Reg);
+ MRI->setRegClass(Copy.getReg(0), ST->getBoolRC());
+ In.Reg = Copy.getReg(0);
+
+ return;
+}
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
+ "GlobalISel divergence lowering", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineCycleInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
+INITIALIZE_PASS_END(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
+ "GlobalISel divergence lowering", false, false)
+
+char AMDGPUGlobalISelDivergenceLowering::ID = 0;
+
+char &llvm::AMDGPUGlobalISelDivergenceLoweringID =
+ AMDGPUGlobalISelDivergenceLowering::ID;
+
+FunctionPass *llvm::createAMDGPUGlobalISelDivergenceLoweringPass() {
+ return new AMDGPUGlobalISelDivergenceLowering();
+}
+
+bool AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction(
+ MachineFunction &MF) {
+ MachineCycleInfo &CycleInfo =
+ getAnalysis<MachineCycleInfoWrapperPass>().getCycleInfo();
+ MachineDominatorTree &DomTree = getAnalysis<MachineDominatorTree>();
+
+ MachineUniformityInfo MUI =
+ computeMachineUniformityInfo(MF, CycleInfo, DomTree.getBase(), true);
+
+ DivergenceLoweringHelper Helper(
+ &MF, &DomTree, &getAnalysis<MachinePostDominatorTree>(), &MUI);
+
+ Helper.lowerPhis();
+ return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index b772efe04c7141d..3146f7f5a218566 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -207,7 +207,39 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const {
return true;
}
+bool isLaneMask(Register Reg, MachineRegisterInfo *MRI,
+ const SIRegisterInfo &TRI) {
+ if (MRI->getType(Reg) != LLT::scalar(1))
+ return false;
+ const TargetRegisterClass *RC = MRI->getRegClassOrNull(Reg);
+ if (!RC || !TRI.isSGPRClass(RC))
+ return false;
+
+ return true;
+}
+
+// PHI where all register operands are sgpr(register class) with S1 LLT.
+bool isLaneMaskPhi(MachineInstr &I, MachineRegisterInfo *MRI,
+ const SIRegisterInfo &TRI) {
+ if (I.getOpcode() != AMDGPU::PHI)
+ return false;
+
+ if (!isLaneMask(I.getOperand(0).getReg(), MRI, TRI))
+ return false;
+
+ for (unsigned i = 1, e = I.getNumOperands(); i != e; e += 2) {
+ if (!isLaneMask(I.getOperand(i).getReg(), MRI, TRI))
+ return false;
+ }
+
+ return true;
+}
+
bool AMDGPUInstructionSelector::selectPHI(MachineInstr &I) const {
+ // Already selected in divergence lowering pass
+ if (isLaneMaskPhi(I, MRI, TRI))
+ return true;
+
const Register DefReg = I.getOperand(0).getReg();
const LLT DefTy = MRI->getType(DefReg);
if (DefTy == LLT::scalar(1)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 0c38fa32c6f33a8..1d0be8984604da2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -375,6 +375,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPUDAGToDAGISelPass(*PR);
initializeGCNDPPCombinePass(*PR);
initializeSILowerI1CopiesPass(*PR);
+ initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
initializeSILowerWWMCopiesPass(*PR);
initializeSILowerSGPRSpillsPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);
@@ -1255,6 +1256,7 @@ bool GCNPassConfig::addLegalizeMachineIR() {
void GCNPassConfig::addPreRegBankSelect() {
bool IsOptNone = getOptLevel() == CodeGenOptLevel::None;
addPass(createAMDGPUPostLegalizeCombiner(IsOptNone));
+ addPass(createAMDGPUGlobalISelDivergenceLoweringPass());
}
bool GCNPassConfig::addRegBankSelect() {
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index 53a33f8210d2a84..2c92e7a07388553 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -55,6 +55,7 @@ add_llvm_target(AMDGPUCodeGen
AMDGPUCtorDtorLowering.cpp
AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp
+ AMDGPUGlobalISelDivergenceLowering.cpp
AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp
AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
index 68c8f4024e73007..59c05bab531f4b5 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
@@ -21,57 +21,26 @@
//
//===----------------------------------------------------------------------===//
+#include "SILowerI1Copies.h"
#include "AMDGPU.h"
-#include "GCNSubtarget.h"
-#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
-#include "llvm/CodeGen/MachineDominators.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
-#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/CodeGen/MachineSSAUpdater.h"
#include "llvm/InitializePasses.h"
+#include "llvm/Target/CGPassBuilderOption.h"
#define DEBUG_TYPE "si-i1-copies"
using namespace llvm;
-static unsigned createLaneMaskReg(MachineFunction &MF);
-static unsigned insertUndefLaneMask(MachineBasicBlock &MBB);
+static Register insertUndefLaneMask(MachineBasicBlock *MBB,
+ MachineRegisterInfo *MRI,
+ Register *LaneMaskRegAttrs);
namespace {
-struct Incoming {
- Register Reg;
- MachineBasicBlock *Block;
- Register UpdatedReg;
-
- Incoming(Register Reg, MachineBasicBlock *Block, Register UpdatedReg)
- : Reg(Reg), Block(Block), UpdatedReg(UpdatedReg) {}
-};
-
class SILowerI1Copies : public MachineFunctionPass {
public:
static char ID;
-private:
- bool IsWave32 = false;
- MachineFunction *MF = nullptr;
- MachineDominatorTree *DT = nullptr;
- MachinePostDominatorTree *PDT = nullptr;
- MachineRegisterInfo *MRI = nullptr;
- const GCNSubtarget *ST = nullptr;
- const SIInstrInfo *TII = nullptr;
-
- unsigned ExecReg;
- unsigned MovOp;
- unsigned AndOp;
- unsigned OrOp;
- unsigned XorOp;
- unsigned AndN2Op;
- unsigned OrN2Op;
-
- DenseSet<unsigned> ConstrainRegs;
-
-public:
SILowerI1Copies() : MachineFunctionPass(ID) {
initializeSILowerI1CopiesPass(*PassRegistry::getPassRegistry());
}
@@ -86,29 +55,53 @@ class SILowerI1Copies : public MachineFunctionPass {
AU.addRequired<MachinePostDominatorTree>();
MachineFunctionPass::getAnalysisUsage(AU);
}
+};
+
+class Vreg1LoweringHelper : public PhiLoweringHelper {
+public:
+ Vreg1LoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT);
private:
- bool lowerCopiesFromI1();
- bool lowerPhis();
- bool lowerCopiesToI1();
- bool isConstantLaneMask(Register Reg, bool &Val) const;
+ DenseSet<Register> ConstrainRegs;
+
+public:
+ void markAsLaneMask(Register DstReg) const...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Petar Avramovic (petar-avramovic) ChangesBasic implementation of lane mask merging for GlobalISel. Patch is 279.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73337.diff 21 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
index bbd09d7d151ba07..765cabdb313097b 100644
--- a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
+++ b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
@@ -40,8 +40,8 @@ class MachineSSAUpdater {
//typedef DenseMap<MachineBasicBlock*, Register> AvailableValsTy;
void *AV = nullptr;
- /// VRC - Register class of the current virtual register.
- const TargetRegisterClass *VRC = nullptr;
+ /// RegAttrs - current virtual register, new registers copy its attributes.
+ Register RegAttrs;
/// InsertedPHIs - If this is non-null, the MachineSSAUpdater adds all PHI
/// nodes that it creates to the vector.
@@ -62,7 +62,6 @@ class MachineSSAUpdater {
/// Initialize - Reset this object to get ready for a new set of SSA
/// updates.
void Initialize(Register V);
- void Initialize(const TargetRegisterClass *RC);
/// AddAvailableValue - Indicate that a rewritten value is available at the
/// end of the specified block with the specified value.
diff --git a/llvm/lib/CodeGen/MachineSSAUpdater.cpp b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
index 48076663ddf5382..48537057e2031a8 100644
--- a/llvm/lib/CodeGen/MachineSSAUpdater.cpp
+++ b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
@@ -51,17 +51,13 @@ MachineSSAUpdater::~MachineSSAUpdater() {
/// Initialize - Reset this object to get ready for a new set of SSA
/// updates.
-void MachineSSAUpdater::Initialize(const TargetRegisterClass *RC) {
+void MachineSSAUpdater::Initialize(Register V) {
if (!AV)
AV = new AvailableValsTy();
else
getAvailableVals(AV).clear();
- VRC = RC;
-}
-
-void MachineSSAUpdater::Initialize(Register V) {
- Initialize(MRI->getRegClass(V));
+ RegAttrs = V;
}
/// HasValueForBlock - Return true if the MachineSSAUpdater already has a value for
@@ -115,13 +111,12 @@ Register LookForIdenticalPHI(MachineBasicBlock *BB,
/// InsertNewDef - Insert an empty PHI or IMPLICIT_DEF instruction which define
/// a value of the given register class at the start of the specified basic
/// block. It returns the virtual register defined by the instruction.
-static
-MachineInstrBuilder InsertNewDef(unsigned Opcode,
- MachineBasicBlock *BB, MachineBasicBlock::iterator I,
- const TargetRegisterClass *RC,
- MachineRegisterInfo *MRI,
- const TargetInstrInfo *TII) {
- Register NewVR = MRI->createVirtualRegister(RC);
+static MachineInstrBuilder InsertNewDef(unsigned Opcode, MachineBasicBlock *BB,
+ MachineBasicBlock::iterator I,
+ Register RegAttrs,
+ MachineRegisterInfo *MRI,
+ const TargetInstrInfo *TII) {
+ Register NewVR = MRI->cloneVirtualRegister(RegAttrs);
return BuildMI(*BB, I, DebugLoc(), TII->get(Opcode), NewVR);
}
@@ -158,9 +153,9 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
if (ExistingValueOnly)
return Register();
// Insert an implicit_def to represent an undef value.
- MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
- BB, BB->getFirstTerminator(),
- VRC, MRI, TII);
+ MachineInstr *NewDef =
+ InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstTerminator(),
+ RegAttrs, MRI, TII);
return NewDef->getOperand(0).getReg();
}
@@ -197,8 +192,8 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
// Otherwise, we do need a PHI: insert one now.
MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
- MachineInstrBuilder InsertedPHI = InsertNewDef(TargetOpcode::PHI, BB,
- Loc, VRC, MRI, TII);
+ MachineInstrBuilder InsertedPHI =
+ InsertNewDef(TargetOpcode::PHI, BB, Loc, RegAttrs, MRI, TII);
// Fill in all the predecessors of the PHI.
for (unsigned i = 0, e = PredValues.size(); i != e; ++i)
@@ -300,10 +295,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
static Register GetUndefVal(MachineBasicBlock *BB,
MachineSSAUpdater *Updater) {
// Insert an implicit_def to represent an undef value.
- MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
- BB, BB->getFirstNonPHI(),
- Updater->VRC, Updater->MRI,
- Updater->TII);
+ MachineInstr *NewDef =
+ InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstNonPHI(),
+ Updater->RegAttrs, Updater->MRI, Updater->TII);
return NewDef->getOperand(0).getReg();
}
@@ -312,9 +306,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
static Register CreateEmptyPHI(MachineBasicBlock *BB, unsigned NumPreds,
MachineSSAUpdater *Updater) {
MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
- MachineInstr *PHI = InsertNewDef(TargetOpcode::PHI, BB, Loc,
- Updater->VRC, Updater->MRI,
- Updater->TII);
+ MachineInstr *PHI =
+ InsertNewDef(TargetOpcode::PHI, BB, Loc, Updater->RegAttrs,
+ Updater->MRI, Updater->TII);
return PHI->getOperand(0).getReg();
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 323560a46f31de2..007d64944244a09 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -36,6 +36,7 @@ FunctionPass *createSIAnnotateControlFlowPass();
FunctionPass *createSIFoldOperandsPass();
FunctionPass *createSIPeepholeSDWAPass();
FunctionPass *createSILowerI1CopiesPass();
+FunctionPass *createAMDGPUGlobalISelDivergenceLoweringPass();
FunctionPass *createSIShrinkInstructionsPass();
FunctionPass *createSILoadStoreOptimizerPass();
FunctionPass *createSIWholeQuadModePass();
@@ -162,6 +163,9 @@ extern char &SILowerWWMCopiesID;
void initializeSILowerI1CopiesPass(PassRegistry &);
extern char &SILowerI1CopiesID;
+void initializeAMDGPUGlobalISelDivergenceLoweringPass(PassRegistry &);
+extern char &AMDGPUGlobalISelDivergenceLoweringID;
+
void initializeSILowerSGPRSpillsPass(PassRegistry &);
extern char &SILowerSGPRSpillsID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
new file mode 100644
index 000000000000000..a955a08b0a3776e
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGlobalISelDivergenceLowering.cpp
@@ -0,0 +1,213 @@
+//===-- AMDGPUGlobalISelDivergenceLowering.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// GlobalISel pass that selects divergent i1 phis as lane mask phis.
+// Lane mask merging uses same algorithm as SDAG in SILowerI1Copies.
+// Handles all cases of temporal divergence.
+//
+// For divergent non-phi i1 and uniform i1 uses outside of the cycle this pass
+// currently depends on LCSSA to insert phis with one incoming.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "SILowerI1Copies.h"
+#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineUniformityAnalysis.h"
+#include "llvm/InitializePasses.h"
+
+#define DEBUG_TYPE "global-isel-divergence-lowering"
+
+using namespace llvm;
+
+namespace {
+
+class AMDGPUGlobalISelDivergenceLowering : public MachineFunctionPass {
+public:
+ static char ID;
+
+public:
+ AMDGPUGlobalISelDivergenceLowering() : MachineFunctionPass(ID) {
+ initializeAMDGPUGlobalISelDivergenceLoweringPass(
+ *PassRegistry::getPassRegistry());
+ }
+
+ bool runOnMachineFunction(MachineFunction &MF) override;
+
+ StringRef getPassName() const override {
+ return "GlobalISel divergence lowering";
+ }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.setPreservesCFG();
+ AU.addRequired<MachineCycleInfoWrapperPass>();
+ AU.addRequired<MachineDominatorTree>();
+ AU.addRequired<MachinePostDominatorTree>();
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+};
+
+class DivergenceLoweringHelper : public PhiLoweringHelper {
+public:
+ DivergenceLoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT,
+ MachineUniformityInfo *MUI);
+
+private:
+ MachineUniformityInfo *MUI = nullptr;
+
+public:
+ void markAsLaneMask(Register DstReg) const override;
+ void getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const override;
+ void collectIncomingValuesFromPhi(
+ const MachineInstr *MI,
+ SmallVectorImpl<Incoming> &Incomings) const override;
+ void replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) override;
+ void buildMergeLaneMasks(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg,
+ Register CurReg) override;
+ void constrainIncomingRegisterTakenAsIs(Incoming &In) override;
+};
+
+DivergenceLoweringHelper::DivergenceLoweringHelper(
+ MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT, MachineUniformityInfo *MUI)
+ : PhiLoweringHelper(MF, DT, PDT), MUI(MUI) {}
+
+// _(s1) -> SReg_32/64(s1)
+void DivergenceLoweringHelper::markAsLaneMask(Register DstReg) const {
+ assert(MRI->getType(DstReg) == LLT::scalar(1));
+
+ // Can't blindly set a register class on phi, users could have reg class
+ // constraints (e.g. sreg_32/64..._xexec classes for control flow intrinsics).
+ if (MRI->getRegClassOrNull(DstReg))
+ return;
+
+ MRI->setRegClass(DstReg, ST->getBoolRC());
+ return;
+}
+
+void DivergenceLoweringHelper::getCandidatesForLowering(
+ SmallVectorImpl<MachineInstr *> &Vreg1Phis) const {
+ LLT S1 = LLT::scalar(1);
+
+ // Add divergent i1 phis to the list
+ for (MachineBasicBlock &MBB : *MF) {
+ for (MachineInstr &MI : MBB.phis()) {
+ Register Dst = MI.getOperand(0).getReg();
+ if (MRI->getType(Dst) == S1 && MUI->isDivergent(Dst))
+ Vreg1Phis.push_back(&MI);
+ }
+ }
+
+ return;
+}
+
+void DivergenceLoweringHelper::collectIncomingValuesFromPhi(
+ const MachineInstr *MI, SmallVectorImpl<Incoming> &Incomings) const {
+ for (unsigned i = 1; i < MI->getNumOperands(); i += 2) {
+ Incomings.emplace_back(MI->getOperand(i).getReg(),
+ MI->getOperand(i + 1).getMBB(), Register());
+ }
+}
+
+void DivergenceLoweringHelper::replaceDstReg(Register NewReg, Register OldReg,
+ MachineBasicBlock *MBB) {
+ BuildMI(*MBB, MBB->getFirstNonPHI(), {}, TII->get(AMDGPU::COPY), OldReg)
+ .addReg(NewReg);
+}
+
+// Get pointers to build instruction just after MI (skips phis if needed)
+static std::pair<MachineBasicBlock *, MachineBasicBlock::iterator>
+getInsertAfterPtrs(MachineInstr *MI) {
+ MachineBasicBlock *InsertMBB = MI->getParent();
+ return std::make_pair(
+ InsertMBB, InsertMBB->SkipPHIsAndLabels(std::next(MI->getIterator())));
+}
+
+void DivergenceLoweringHelper::buildMergeLaneMasks(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL,
+ Register DstReg, Register PrevReg, Register CurReg) {
+ // TODO: check if inputs are constants or results of a compare.
+
+ Register PrevRegCopy = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ auto [PrevMBB, AfterPrevReg] = getInsertAfterPtrs(MRI->getVRegDef(PrevReg));
+ BuildMI(*PrevMBB, AfterPrevReg, DL, TII->get(AMDGPU::COPY), PrevRegCopy)
+ .addReg(PrevReg);
+ Register PrevMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ BuildMI(MBB, I, DL, TII->get(AndN2Op), PrevMaskedReg)
+ .addReg(PrevRegCopy)
+ .addReg(ExecReg);
+
+ Register CurRegCopy = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ auto [CurMBB, AfterCurReg] = getInsertAfterPtrs(MRI->getVRegDef(CurReg));
+ BuildMI(*CurMBB, AfterCurReg, DL, TII->get(AMDGPU::COPY), CurRegCopy)
+ .addReg(CurReg);
+ Register CurMaskedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
+ BuildMI(MBB, I, DL, TII->get(AndOp), CurMaskedReg)
+ .addReg(ExecReg)
+ .addReg(CurRegCopy);
+
+ BuildMI(MBB, I, DL, TII->get(OrOp), DstReg)
+ .addReg(PrevMaskedReg)
+ .addReg(CurMaskedReg);
+
+ return;
+}
+
+// GlobalISel has to constrain S1 incoming taken as-is with lane mask register
+// class. Insert a copy of Incoming.Reg to new lane mask inside Incoming.Block,
+// Incoming.Reg becomes that new lane mask.
+void DivergenceLoweringHelper::constrainIncomingRegisterTakenAsIs(
+ Incoming &In) {
+ MachineIRBuilder B(*MF);
+ B.setInsertPt(*In.Block, In.Block->getFirstTerminator());
+
+ auto Copy = B.buildCopy(LLT::scalar(1), In.Reg);
+ MRI->setRegClass(Copy.getReg(0), ST->getBoolRC());
+ In.Reg = Copy.getReg(0);
+
+ return;
+}
+
+} // End anonymous namespace.
+
+INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
+ "GlobalISel divergence lowering", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineCycleInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
+INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
+INITIALIZE_PASS_END(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE,
+ "GlobalISel divergence lowering", false, false)
+
+char AMDGPUGlobalISelDivergenceLowering::ID = 0;
+
+char &llvm::AMDGPUGlobalISelDivergenceLoweringID =
+ AMDGPUGlobalISelDivergenceLowering::ID;
+
+FunctionPass *llvm::createAMDGPUGlobalISelDivergenceLoweringPass() {
+ return new AMDGPUGlobalISelDivergenceLowering();
+}
+
+bool AMDGPUGlobalISelDivergenceLowering::runOnMachineFunction(
+ MachineFunction &MF) {
+ MachineCycleInfo &CycleInfo =
+ getAnalysis<MachineCycleInfoWrapperPass>().getCycleInfo();
+ MachineDominatorTree &DomTree = getAnalysis<MachineDominatorTree>();
+
+ MachineUniformityInfo MUI =
+ computeMachineUniformityInfo(MF, CycleInfo, DomTree.getBase(), true);
+
+ DivergenceLoweringHelper Helper(
+ &MF, &DomTree, &getAnalysis<MachinePostDominatorTree>(), &MUI);
+
+ Helper.lowerPhis();
+ return true;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index b772efe04c7141d..3146f7f5a218566 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -207,7 +207,39 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const {
return true;
}
+bool isLaneMask(Register Reg, MachineRegisterInfo *MRI,
+ const SIRegisterInfo &TRI) {
+ if (MRI->getType(Reg) != LLT::scalar(1))
+ return false;
+ const TargetRegisterClass *RC = MRI->getRegClassOrNull(Reg);
+ if (!RC || !TRI.isSGPRClass(RC))
+ return false;
+
+ return true;
+}
+
+// PHI where all register operands are sgpr(register class) with S1 LLT.
+bool isLaneMaskPhi(MachineInstr &I, MachineRegisterInfo *MRI,
+ const SIRegisterInfo &TRI) {
+ if (I.getOpcode() != AMDGPU::PHI)
+ return false;
+
+ if (!isLaneMask(I.getOperand(0).getReg(), MRI, TRI))
+ return false;
+
+ for (unsigned i = 1, e = I.getNumOperands(); i != e; e += 2) {
+ if (!isLaneMask(I.getOperand(i).getReg(), MRI, TRI))
+ return false;
+ }
+
+ return true;
+}
+
bool AMDGPUInstructionSelector::selectPHI(MachineInstr &I) const {
+ // Already selected in divergence lowering pass
+ if (isLaneMaskPhi(I, MRI, TRI))
+ return true;
+
const Register DefReg = I.getOperand(0).getReg();
const LLT DefTy = MRI->getType(DefReg);
if (DefTy == LLT::scalar(1)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 0c38fa32c6f33a8..1d0be8984604da2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -375,6 +375,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPUDAGToDAGISelPass(*PR);
initializeGCNDPPCombinePass(*PR);
initializeSILowerI1CopiesPass(*PR);
+ initializeAMDGPUGlobalISelDivergenceLoweringPass(*PR);
initializeSILowerWWMCopiesPass(*PR);
initializeSILowerSGPRSpillsPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);
@@ -1255,6 +1256,7 @@ bool GCNPassConfig::addLegalizeMachineIR() {
void GCNPassConfig::addPreRegBankSelect() {
bool IsOptNone = getOptLevel() == CodeGenOptLevel::None;
addPass(createAMDGPUPostLegalizeCombiner(IsOptNone));
+ addPass(createAMDGPUGlobalISelDivergenceLoweringPass());
}
bool GCNPassConfig::addRegBankSelect() {
diff --git a/llvm/lib/Target/AMDGPU/CMakeLists.txt b/llvm/lib/Target/AMDGPU/CMakeLists.txt
index 53a33f8210d2a84..2c92e7a07388553 100644
--- a/llvm/lib/Target/AMDGPU/CMakeLists.txt
+++ b/llvm/lib/Target/AMDGPU/CMakeLists.txt
@@ -55,6 +55,7 @@ add_llvm_target(AMDGPUCodeGen
AMDGPUCtorDtorLowering.cpp
AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp
+ AMDGPUGlobalISelDivergenceLowering.cpp
AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp
AMDGPUInsertDelayAlu.cpp
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
index 68c8f4024e73007..59c05bab531f4b5 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
@@ -21,57 +21,26 @@
//
//===----------------------------------------------------------------------===//
+#include "SILowerI1Copies.h"
#include "AMDGPU.h"
-#include "GCNSubtarget.h"
-#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
-#include "llvm/CodeGen/MachineDominators.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
-#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/CodeGen/MachineSSAUpdater.h"
#include "llvm/InitializePasses.h"
+#include "llvm/Target/CGPassBuilderOption.h"
#define DEBUG_TYPE "si-i1-copies"
using namespace llvm;
-static unsigned createLaneMaskReg(MachineFunction &MF);
-static unsigned insertUndefLaneMask(MachineBasicBlock &MBB);
+static Register insertUndefLaneMask(MachineBasicBlock *MBB,
+ MachineRegisterInfo *MRI,
+ Register *LaneMaskRegAttrs);
namespace {
-struct Incoming {
- Register Reg;
- MachineBasicBlock *Block;
- Register UpdatedReg;
-
- Incoming(Register Reg, MachineBasicBlock *Block, Register UpdatedReg)
- : Reg(Reg), Block(Block), UpdatedReg(UpdatedReg) {}
-};
-
class SILowerI1Copies : public MachineFunctionPass {
public:
static char ID;
-private:
- bool IsWave32 = false;
- MachineFunction *MF = nullptr;
- MachineDominatorTree *DT = nullptr;
- MachinePostDominatorTree *PDT = nullptr;
- MachineRegisterInfo *MRI = nullptr;
- const GCNSubtarget *ST = nullptr;
- const SIInstrInfo *TII = nullptr;
-
- unsigned ExecReg;
- unsigned MovOp;
- unsigned AndOp;
- unsigned OrOp;
- unsigned XorOp;
- unsigned AndN2Op;
- unsigned OrN2Op;
-
- DenseSet<unsigned> ConstrainRegs;
-
-public:
SILowerI1Copies() : MachineFunctionPass(ID) {
initializeSILowerI1CopiesPass(*PassRegistry::getPassRegistry());
}
@@ -86,29 +55,53 @@ class SILowerI1Copies : public MachineFunctionPass {
AU.addRequired<MachinePostDominatorTree>();
MachineFunctionPass::getAnalysisUsage(AU);
}
+};
+
+class Vreg1LoweringHelper : public PhiLoweringHelper {
+public:
+ Vreg1LoweringHelper(MachineFunction *MF, MachineDominatorTree *DT,
+ MachinePostDominatorTree *PDT);
private:
- bool lowerCopiesFromI1();
- bool lowerPhis();
- bool lowerCopiesToI1();
- bool isConstantLaneMask(Register Reg, bool &Val) const;
+ DenseSet<Register> ConstrainRegs;
+
+public:
+ void markAsLaneMask(Register DstReg) const...
[truncated]
|
#include "AMDGPU.h" | ||
#include "llvm/CodeGen/MachineFunctionPass.h" | ||
|
||
#define DEBUG_TYPE "global-isel-divergence-lowering" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have better suggestions for pass name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should get an amdgpu- prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amdgpu-gi-divergence-lowering
to keep it short?
// Incoming.Reg becomes that new lane mask. | ||
void DivergenceLoweringHelper::constrainIncomingRegisterTakenAsIs( | ||
Incoming &In) { | ||
MachineIRBuilder B(*MF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't construct a fresh MIR builder, use the same one throughout the pass
// PHI where all register operands are sgpr(register class) with S1 LLT. | ||
bool isLaneMaskPhi(MachineInstr &I, MachineRegisterInfo *MRI, | ||
const SIRegisterInfo &TRI) { | ||
if (I.getOpcode() != AMDGPU::PHI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_PHI. Not sure we should bother handling non-G_* phi at all, but isPHI also works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to verify that lane mask PHIs are correctly selected if they were selected before instruction-selection. Idea was to not split selection of lane mask PHIs across multiple passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point is that if we see an PHI
here (as opposed to a G_PHI
), that's because it has already been selected.
Having this as a separate helper is confusing IMHO. Better to inline a check against PHI
into selectPHI
, and add an assert
there that the defined register is a lane mask in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point is that if we see an PHI here (as opposed to a G_PHI), that's because it has already been selected.
Yes at the moment, but I did not want to be that much restrictive and put an assert.
It was meant to report selected lane mask PHI as already selected.
If it was not properly instruction selected leave it (ideally to fail selection).
There will be changes to how we select phis later so maybe then we can add some assert.
Can we inline it for now with a return?
|
||
; There is a divergent, according to machine uniformity info, g_brcond branch | ||
; here, not lowered to si_if because of "amdgpu-flat-work-group-size"="1,1". | ||
define dllexport amdgpu_cs void @_amdgpu_cs_main( i32 inreg noundef %.userdata0, <3 x i32> inreg noundef %.WorkgroupId, <3 x i32> noundef %.LocalInvocationId) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need all the dllexports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more docs in AMDGPUGlobalISelDivergenceLowering.cpp
would be nice.
Seems like it's going to be a core part of the GlobalISel pipeline so IMO it should be well documented so it's not too hard to understand and contribute to.
bool runOnMachineFunction(MachineFunction &MF) override; | ||
|
||
StringRef getPassName() const override { | ||
return "GlobalISel divergence lowering"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs AMDGPU
prefix too
#include "AMDGPU.h" | ||
#include "llvm/CodeGen/MachineFunctionPass.h" | ||
|
||
#define DEBUG_TYPE "global-isel-divergence-lowering" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amdgpu-gi-divergence-lowering
to keep it short?
// GlobalISel pass that selects divergent i1 phis as lane mask phis. | ||
// Lane mask merging uses same algorithm as SDAG in SILowerI1Copies. | ||
// Handles all cases of temporal divergence. | ||
// | ||
// For divergent non-phi i1 and uniform i1 uses outside of the cycle this pass | ||
// currently depends on LCSSA to insert phis with one incoming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really small nit: Use three slashes + start with \file
, first line should be /// \file
return; | ||
|
||
MRI->setRegClass(DstReg, ST->getBoolRC()); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra return
|
||
void DivergenceLoweringHelper::collectIncomingValuesFromPhi( | ||
const MachineInstr *MI, SmallVectorImpl<Incoming> &Incomings) const { | ||
for (unsigned i = 1; i < MI->getNumOperands(); i += 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
-> I
(uppercase)
} | ||
} | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra return
MRI->setRegClass(Copy.getReg(0), ST->getBoolRC()); | ||
In.Reg = Copy.getReg(0); | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra return
in constrainIncomingRegisterTakenAsIs
should still be removed.
getAnalysis<MachineCycleInfoWrapperPass>().getCycleInfo(); | ||
MachineDominatorTree &DomTree = getAnalysis<MachineDominatorTree>(); | ||
|
||
MachineUniformityInfo MUI = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you depend on MachineUniformityAnalysis
instead?
#include "llvm/CodeGen/MachinePostDominators.h" | ||
#include "llvm/CodeGen/MachineSSAUpdater.h" | ||
|
||
using namespace llvm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using namespace
in headers
✅ With the latest revision this PR passed the C/C++ code formatter. |
fb97bf5
to
e14b035
Compare
Ping. Updated for comments about code quality. Unchanged for comments about functional changes: |
|
||
|
||
|
||
define amdgpu_ps void @divergent_i1_phi_uniform_branch(i32 addrspace(1)* %out, i32 %tid, i32 inreg %cond, i32 addrspace(1)* %dummyaddr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test should be updated to use opaque pointers
|
||
; There is a divergent, according to machine uniformity info, g_brcond branch | ||
; here, not lowered to si_if because of "amdgpu-flat-work-group-size"="1,1". | ||
define amdgpu_cs void @_amdgpu_cs_main( i32 inreg noundef %.userdata0, <3 x i32> inreg noundef %.WorkgroupId, <3 x i32> noundef %.LocalInvocationId) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name the function to indicate it's the 1,1 case
return MRI.createVirtualRegister(ST.isWave32() ? &AMDGPU::SReg_32RegClass | ||
: &AMDGPU::SReg_64RegClass); | ||
Register llvm::createLaneMaskReg(MachineRegisterInfo *MRI, | ||
Register *LaneMaskRegAttrs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of these parameters being pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea was to crash if Register *LaneMaskRegAttrs was not initialized, it might not be that obvious if uninitialized Register(0) was used and bad code was generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Register() used in just about anything will fail the verifier so that's fine
@@ -895,3 +910,7 @@ void SILowerI1Copies::buildMergeLaneMasks(MachineBasicBlock &MBB, | |||
.addReg(CurMaskedReg ? CurMaskedReg : ExecReg); | |||
} | |||
} | |||
|
|||
void Vreg1LoweringHelper::constrainIncomingRegisterTakenAsIs(Incoming &In) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this function? Don't really understand the name
; GFX10-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32(s1) = S_OR_B32 [[S_ANDN2_B32_]](s1), [[S_AND_B32_]](s1), implicit-def $scc | ||
; GFX10-NEXT: {{ $}} | ||
; GFX10-NEXT: bb.2: | ||
; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[ICMP]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1 | ||
; GFX10-NEXT: [[COPY6:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]](s1) | ||
; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[COPY4]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this function? Don't really understand the name constrainIncomingRegisterTakenAsIs
But here register operand with name COPY4 has lane-mask register class and after this patch lane mask PHI is fully selected. I wanted to avoid having to split selection of lane mask phis across multiple passes.
; GFX10-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
; GFX10-NEXT: [[COPY4:%[0-9]+]]:sreg_32(s1) = COPY [[ICMP]](s1)
; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[COPY4]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1
; GFX10-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32(s1) = S_OR_B32 [[S_ANDN2_B32_]](s1), [[S_AND_B32_]](s1), implicit-def $scc | ||
; GFX10-NEXT: {{ $}} | ||
; GFX10-NEXT: bb.2: | ||
; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[ICMP]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this function? Don't really understand the name constrainIncomingRegisterTakenAsIs
That Incoming is from the block that is visited first in RPOT, so it is taken as is. But it does not have reg class (the register operand with name ICMP), so PHI instruction is only partially selected
; GFX10-NEXT: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(uge), [[COPY2]](s32), [[C]]
; GFX10-NEXT: [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[ICMP]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1
e14b035
to
d144cb0
Compare
Ping. Nicolai and Jay do you have comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably for the best to land this PR soon. I have a few detail comments still inline, but hopefully they're not too complex.
I am somewhat surprised about the order in which the changes appear. Though you did update the tests, so did you actually run a check (e.g. using git rebase -x "ninja -C build check-llvm-codegen-amdgpu" ...
) that passes after each individual commit? If yes, then great. That would make me feel a bit more confident about the structure of this.
} // End anonymous namespace. | ||
|
||
INITIALIZE_PASS_BEGIN(AMDGPUGlobalISelDivergenceLowering, DEBUG_TYPE, | ||
"GlobalISel divergence lowering", false, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was addressed in the wrong commit. This makes it harder to review the PR :(
// PHI where all register operands are sgpr(register class) with S1 LLT. | ||
bool isLaneMaskPhi(MachineInstr &I, MachineRegisterInfo *MRI, | ||
const SIRegisterInfo &TRI) { | ||
if (I.getOpcode() != AMDGPU::PHI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the point is that if we see an PHI
here (as opposed to a G_PHI
), that's because it has already been selected.
Having this as a separate helper is confusing IMHO. Better to inline a check against PHI
into selectPHI
, and add an assert
there that the defined register is a lane mask in that case.
MRI->setRegClass(Copy.getReg(0), ST->getBoolRC()); | ||
In.Reg = Copy.getReg(0); | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra return
in constrainIncomingRegisterTakenAsIs
should still be removed.
Each individual patch passes check-llvm, ir is correct after the last patch but mir updates are relevant and give good overview of the changes (PHIs go from partially selected to fully selected). I wanted to open separate pr for each of the patches if they are ok together (don't know if there is a better way). |
d144cb0
to
c22d59c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM except for one nitty nit.
bool AMDGPUInstructionSelector::selectPHI(MachineInstr &I) const { | ||
const Register DefReg = I.getOperand(0).getReg(); | ||
const LLT DefTy = MRI->getType(DefReg); | ||
// Lane mask PHIs, PHI where all register operands have sgpr register class | ||
// with S1 LLT, are already selected in divergence lowering pass. | ||
if (I.getOpcode() == AMDGPU::PHI && isLaneMask(DefReg, MRI, TRI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I should be clearer. I don't think we ever have a PHI
here other than for lane masks. So it would be better for understanding the overall context to say something like:
if (I.getOpcode() == AMDGPU::PHI) {
assert(isLaneMask(DefReg, MRI, TRI);
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one regression test with PHI in phi_s32_ss_sbranch - GlobalISel/inst-select-phi.mir
. It can be disabled. Other then that assert is fine.
I wanted to check isLaneMask for all operand, can I bring helper function back then? it would look like this without helper
if (I.getOpcode() == AMDGPU::PHI) {
assert(isLaneMask(DefReg, MRI, TRI));
for (unsigned i = 1, e = I.getNumOperands(); i != e; e += 2)
assert(isLaneMask(I.getOperand(i).getReg(), MRI, TRI));
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I wonder if @arsenm knows where that test comes from, whether that's on purpose or perhaps a mistake and the test should be removed.
If my understanding is wrong and other PHIs are possible, it would be good to understand why. If they aren't, then the test should be removed and it's not so important to check all the operands IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then, we can merge first two patches in the meantime
Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will implement - selection of divergent i1 phis as lane mask phis, requires lane mask merging in some cases - lower uses of divergent i1 values outside of the cycle using lane mask merging - lowering of all cases of temporal divergence: - lower uses of uniform i1 values outside of the cycle using lane mask merging - lower uses of uniform non-i1 values outside of the cycle using a copy to vgpr inside of the cycle Add very detailed set of regression tests for cases mentioned above. patch 1 from: #73337
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll
Outdated
Show resolved
Hide resolved
Make abstract class PhiLoweringHelper and expose it for use in GlobalISel path. SILowerI1Copies implements PhiLoweringHelper as Vreg1LoweringHelper and it is equivalent to SILowerI1Copies. Notable change that createLaneMaskReg now clones attributes from register that has lane mask attributes instead of creating register with lane mask register class. This is because lane masks have different(more) attributes in GlobalISel. patch 2 from: #73337
c22d59c
to
dca496b
Compare
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
1f62a55
to
c3df429
Compare
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
c3df429
to
1237d7f
Compare
…6145) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
1237d7f
to
38b427d
Compare
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: #73337
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
38b427d
to
ed018de
Compare
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
ed018de
to
0601066
Compare
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: #73337
GlobalISel works with registers that could have register class, register bank and LLT as attributes. When initializing MachineSSAUpdater save all attributes of register and create new registers with same attributes instead of only using register class. patch 4 from: llvm#73337
When initializing MachineSSAUpdater save all attributes of current virtual register and create new virtual registers with same attributes. Now new virtual registers have same both register class or bank and LLT. Previously new virtual registers had same register class but LLT was not set (LLT was set to default/empty LLT). Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers created by MachineSSAUpdater need to have both register class and LLT. patch 4 from: llvm#73337
…0003) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: llvm#73337
When initializing MachineSSAUpdater save all attributes of current virtual register and create new virtual registers with same attributes. Now new virtual registers have same both register class or bank and LLT. Previously new virtual registers had same register class but LLT was not set (LLT was set to default/empty LLT). Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers created by MachineSSAUpdater need to have both register class and LLT. patch 4 from: llvm#73337
…78431) When initializing MachineSSAUpdater save all attributes of current virtual register and create new virtual registers with same attributes. Now new virtual registers have same both register class or bank and LLT. Previously new virtual registers had same register class but LLT was not set (LLT was set to default/empty LLT). Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers created by MachineSSAUpdater need to have both register class and LLT. patch 4 from: #73337
Needs update to resolve conflict |
Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: llvm#75340 patch 2: llvm#75349 patch 3: llvm#80003 patch 4: llvm#78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.
0601066
to
618508b
Compare
Updated with final patch from original PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the amdgpu-global-isel-risky-select flag be deleted?
Yes, will do it in another patch, from the upcoming patches:
|
Make abstract class PhiLoweringHelper and expose it for use in GlobalISel path. SILowerI1Copies implements PhiLoweringHelper as Vreg1LoweringHelper and it is equivalent to SILowerI1Copies. Notable change that createLaneMaskReg now clones attributes from register that has lane mask attributes instead of creating register with lane mask register class. This is because lane masks have different(more) attributes in GlobalISel. patch 2 from: llvm/llvm-project#73337
Basic implementation of lane mask merging for GlobalISel.
Lane masks on GlobalISel are registers with sgpr register class
and S1 LLT - required by machine uniformity analysis.
Implements equivalent of lowerPhis from SILowerI1Copies.cpp in:
patch 1: #75340
patch 2: #75349
patch 3: #80003
patch 4: #78431
patch 5: is in this commit:
AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers
Previously, in PHIs that represent lane masks, incoming registers
taken as-is were not selected as lane masks. Such registers are not
being merged with another lane mask and most often only have S1 LLT.
Implement constrainAsLaneMask by constraining incoming registers
taken as-is with lane mask attributes, essentially transforming them
to lane masks. This is final step in having PHI instructions created
in this pass to be fully instruction-selected.