Skip to content

Make SILowerSGPRSpills::determineRegsForWWMAllocation Accept Already Reserved Whole Wave Mode Registers from llvm::SIMachineFunctionInfo #167203

@matinraayai

Description

@matinraayai

Hi,

I want to pass an llvm::MachineFunction with the AMDGPU C calling convention a physical WWM register as a live-in. The physical WWM reg gets copied into a virtual register marked WWM in the entry basic block, and gets copied back to its original place at the end of each return block. For example, in a single MBB function with $vgpr2 WWM as an argument, the MF's contents should look something like this:

Function Live Ins: $vgpr2
bb.0:
  liveins: $vgpr2
  %2:vgpr_32 = WWM_COPY $vgpr2
  ...
  $vgpr2 = WWM_COPY %2:vgpr32
  SI_RETURN implicit-def $vgpr2

The machine function is obtained in my modified codegen pipeline that can be found here, and I add the copy between the virtual and physical WWMs in one of the very early passes after ISEL.

IIUC before the WWM allocation happens in the AMDGPU Codegen pipeline, the SILowerSGPRSpills pass ends up selecting a set of physical VGPRs that are not used throughout the function, starting from the highest available on the subtarget as seen here:

void SILowerSGPRSpills::determineRegsForWWMAllocation(MachineFunction &MF,
BitVector &RegMask) {
// Determine an optimal number of VGPRs for WWM allocation. The complement
// list will be available for allocating other VGPR virtual registers.
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
MachineRegisterInfo &MRI = MF.getRegInfo();
BitVector ReservedRegs = TRI->getReservedRegs(MF);
BitVector NonWwmAllocMask(TRI->getNumRegs());
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
// FIXME: MaxNumVGPRsForWwmAllocation might need to be adjusted in the future
// to have a balanced allocation between WWM values and per-thread vector
// register operands.
unsigned NumRegs = MaxNumVGPRsForWwmAllocation;
NumRegs =
std::min(static_cast<unsigned>(MFI->getSGPRSpillVGPRs().size()), NumRegs);
auto [MaxNumVGPRs, MaxNumAGPRs] = ST.getMaxNumVectorRegs(MF.getFunction());
// Try to use the highest available registers for now. Later after
// vgpr-regalloc, they can be shifted to the lowest range.
unsigned I = 0;
for (unsigned Reg = AMDGPU::VGPR0 + MaxNumVGPRs - 1;
(I < NumRegs) && (Reg >= AMDGPU::VGPR0); --Reg) {
if (!ReservedRegs.test(Reg) &&
!MRI.isPhysRegUsed(Reg, /*SkipRegMaskTest=*/true)) {
TRI->markSuperRegs(RegMask, Reg);
++I;
}
}
if (I != NumRegs) {
// Reserve an arbitrary register and report the error.
TRI->markSuperRegs(RegMask, AMDGPU::VGPR0);
MF.getFunction().getContext().emitError(
"cannot find enough VGPRs for wwm-regalloc");
}
}

In situations where my machine function uses all physical V/AGPRs available to it, this function fails to spot that there is at least one WWM register that can be used in the WWM register allocation. This is despite the fact that I copy those used physical registers to equivalent virtual registers to signal to the reg allocator that they can be spilled.

One possible fix for this is to make this function consider if there are any physical registers already in WWMReservedRegs field of llvm::SIMachineFunctionInfo. This way I can reserve my physical WWM early in the pipeline when I also emit the WWM copies, and set --amdgpu-num-vgprs-for-wwm-alloc to 1 if I face a VGPR shortage.

@shiltian @arsenm

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions