Skip to content

Commit

Permalink
[ADMGPU] SDWA peephole optimization pass.
Browse files Browse the repository at this point in the history
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
  • Loading branch information
SamWot committed Mar 21, 2017
1 parent 60e9249 commit f60ad58
Show file tree
Hide file tree
Showing 8 changed files with 1,092 additions and 1 deletion.
4 changes: 4 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPU.h
Expand Up @@ -37,6 +37,7 @@ FunctionPass *createAMDGPUCFGStructurizerPass();
FunctionPass *createSITypeRewriter();
FunctionPass *createSIAnnotateControlFlowPass();
FunctionPass *createSIFoldOperandsPass();
FunctionPass *createSIPeepholeSDWAPass();
FunctionPass *createSILowerI1CopiesPass();
FunctionPass *createSIShrinkInstructionsPass();
FunctionPass *createSILoadStoreOptimizerPass(TargetMachine &tm);
Expand All @@ -58,6 +59,9 @@ extern char &AMDGPULowerIntrinsicsID;
void initializeSIFoldOperandsPass(PassRegistry &);
extern char &SIFoldOperandsID;

void initializeSIPeepholeSDWAPass(PassRegistry &);
extern char &SIPeepholeSDWAID;

void initializeSIShrinkInstructionsPass(PassRegistry&);
extern char &SIShrinkInstructionsID;

Expand Down
10 changes: 10 additions & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Expand Up @@ -94,6 +94,11 @@ static cl::opt<bool> InternalizeSymbols(
cl::init(false),
cl::Hidden);

static cl::opt<bool> EnableSDWAPeephole(
"amdgpu-sdwa-peephole",
cl::desc("Enable SDWA peepholer"),
cl::init(false));

// Enable address space based alias analysis
static cl::opt<bool> EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
cl::desc("Enable AMDGPU Alias Analysis"),
Expand All @@ -109,6 +114,7 @@ extern "C" void LLVMInitializeAMDGPUTarget() {
initializeSIFixSGPRCopiesPass(*PR);
initializeSIFixVGPRCopiesPass(*PR);
initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);
initializeSIFixControlFlowLiveIntervalsPass(*PR);
initializeSILoadStoreOptimizerPass(*PR);
Expand Down Expand Up @@ -683,6 +689,10 @@ bool GCNPassConfig::addGlobalInstructionSelect() {

void GCNPassConfig::addPreRegAlloc() {
addPass(createSIShrinkInstructionsPass());
if (EnableSDWAPeephole) {
addPass(&SIPeepholeSDWAID);
addPass(&DeadMachineInstructionElimID);
}
addPass(createSIWholeQuadModePass());
}

Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/AMDGPU/CMakeLists.txt
Expand Up @@ -89,6 +89,7 @@ add_llvm_target(AMDGPUCodeGen
SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp
SIOptimizeExecMasking.cpp
SIPeepholeSDWA.cpp
SIRegisterInfo.cpp
SIShrinkInstructions.cpp
SITypeRewriter.cpp
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.h
Expand Up @@ -768,6 +768,9 @@ namespace AMDGPU {
LLVM_READONLY
int getVOPe32(uint16_t Opcode);

LLVM_READONLY
int getSDWAOp(uint16_t Opcode);

LLVM_READONLY
int getCommuteRev(uint16_t Opcode);

Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Target/AMDGPU/SIInstrInfo.td
Expand Up @@ -1441,6 +1441,15 @@ def getVOPe32 : InstrMapping {
let ValueCols = [["4", "0"]];
}

// Maps ordinary instructions to their SDWA counterparts
def getSDWAOp : InstrMapping {
let FilterClass = "VOP";
let RowFields = ["OpName"];
let ColFields = ["AsmVariantName"];
let KeyCol = ["Default"];
let ValueCols = [["SDWA"]];
}

def getMaskedMIMGOp : InstrMapping {
let FilterClass = "MIMG_Mask";
let RowFields = ["Op"];
Expand Down

0 comments on commit f60ad58

Please sign in to comment.