-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass #161764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) ChangesWith true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. There is a bug in folding pass that are not using the correct src index for v_mov_b16_t16. Full diff: https://github.com/llvm/llvm-project/pull/161764.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index fed37788802b9..c0eee325b9114 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -931,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII,
for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg);
SubDef && TII.isFoldableCopy(*SubDef);
SubDef = MRI.getVRegDef(Sub->getReg())) {
- MachineOperand &SrcOp = SubDef->getOperand(1);
+ unsigned SrcIdx = TII.getFoldableCopySrcIdx(*SubDef);
+ MachineOperand &SrcOp = SubDef->getOperand(SrcIdx);
+
if (SrcOp.isImm())
return &SrcOp;
if (!SrcOp.isReg() || SrcOp.getReg().isPhysical())
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 56435a50c87ad..28dfae5c116cb 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3435,6 +3435,32 @@ bool SIInstrInfo::isFoldableCopy(const MachineInstr &MI) {
}
}
+unsigned SIInstrInfo::getFoldableCopySrcIdx(const MachineInstr &MI) {
+ switch (MI.getOpcode()) {
+ case AMDGPU::V_MOV_B16_t16_e32:
+ case AMDGPU::V_MOV_B16_t16_e64:
+ return 2;
+ case AMDGPU::V_MOV_B32_e32:
+ case AMDGPU::V_MOV_B32_e64:
+ case AMDGPU::V_MOV_B64_PSEUDO:
+ case AMDGPU::V_MOV_B64_e32:
+ case AMDGPU::V_MOV_B64_e64:
+ case AMDGPU::S_MOV_B32:
+ case AMDGPU::S_MOV_B64:
+ case AMDGPU::S_MOV_B64_IMM_PSEUDO:
+ case AMDGPU::COPY:
+ case AMDGPU::WWM_COPY:
+ case AMDGPU::V_ACCVGPR_WRITE_B32_e64:
+ case AMDGPU::V_ACCVGPR_READ_B32_e64:
+ case AMDGPU::V_ACCVGPR_MOV_B32:
+ case AMDGPU::AV_MOV_B32_IMM_PSEUDO:
+ case AMDGPU::AV_MOV_B64_IMM_PSEUDO:
+ return 1;
+ default:
+ assert(0 && "MI is not a foldable copy");
+ }
+}
+
static constexpr AMDGPU::OpName ModifierOpNames[] = {
AMDGPU::OpName::src0_modifiers, AMDGPU::OpName::src1_modifiers,
AMDGPU::OpName::src2_modifiers, AMDGPU::OpName::clamp,
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index a21089f8e0fcc..cc59acf1ebd94 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -417,6 +417,7 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
const MachineInstr &MIb) const override;
static bool isFoldableCopy(const MachineInstr &MI);
+ static unsigned getFoldableCopySrcIdx(const MachineInstr &MI);
void removeModOperands(MachineInstr &MI) const;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test
@@ -931,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII, | |||
for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg); | |||
SubDef && TII.isFoldableCopy(*SubDef); | |||
SubDef = MRI.getVRegDef(Sub->getReg())) { | |||
MachineOperand &SrcOp = SubDef->getOperand(1); | |||
unsigned SrcIdx = TII.getFoldableCopySrcIdx(*SubDef); | |||
MachineOperand &SrcOp = SubDef->getOperand(SrcIdx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use getNetNamedOperandIdx? Also probably should just fix the instruction to keep this in index 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not all foldable copy has named operand src0
the v_mov_b16_t16 has opsel thus the operand 1 is src0_mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the mismatched operand name then, I don't want to maintain this switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case that will fall is COPY but you can just special case that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a helper is better since people might forget about this special case when they are adding new code. But if you think maintaining the switch is more costly, I'll create a follow up patch to remove it.
3b37d78
to
eb27475
Compare
added a test |
eb27475
to
c3a4a1e
Compare
c3a4a1e
to
921341f
Compare
CI error is unrelated. windows ci is passing |
ping! This is blocking ROCm/rocMLIR#2011 so would be a bit urgent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -932,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII, | |||
for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg); | |||
SubDef && TII.isFoldableCopy(*SubDef); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked the whole call chain if this property is already checked, but we should probably check that src_modifiers are 0 on the v_mov_b16 inside isFoldableCopy. I don't think we will set them, but safer to check. That can be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think this FoldableCopy need to updated as well. I'll create another patch for the clean up, might also replace the if else
check with the helper
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index.
Use the correct src index for v_mov_b16_t16.