Skip to content

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Oct 3, 2025

With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index.

Use the correct src index for v_mov_b16_t16.

@broxigarchen broxigarchen marked this pull request as ready for review October 3, 2025 02:23
@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index.

There is a bug in folding pass that are not using the correct src index for v_mov_b16_t16.


Full diff: https://github.com/llvm/llvm-project/pull/161764.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+3-1)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+26)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+1)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index fed37788802b9..c0eee325b9114 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -931,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII,
   for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg);
        SubDef && TII.isFoldableCopy(*SubDef);
        SubDef = MRI.getVRegDef(Sub->getReg())) {
-    MachineOperand &SrcOp = SubDef->getOperand(1);
+    unsigned SrcIdx = TII.getFoldableCopySrcIdx(*SubDef);
+    MachineOperand &SrcOp = SubDef->getOperand(SrcIdx);
+
     if (SrcOp.isImm())
       return &SrcOp;
     if (!SrcOp.isReg() || SrcOp.getReg().isPhysical())
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 56435a50c87ad..28dfae5c116cb 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3435,6 +3435,32 @@ bool SIInstrInfo::isFoldableCopy(const MachineInstr &MI) {
   }
 }
 
+unsigned SIInstrInfo::getFoldableCopySrcIdx(const MachineInstr &MI) {
+  switch (MI.getOpcode()) {
+  case AMDGPU::V_MOV_B16_t16_e32:
+  case AMDGPU::V_MOV_B16_t16_e64:
+    return 2;
+  case AMDGPU::V_MOV_B32_e32:
+  case AMDGPU::V_MOV_B32_e64:
+  case AMDGPU::V_MOV_B64_PSEUDO:
+  case AMDGPU::V_MOV_B64_e32:
+  case AMDGPU::V_MOV_B64_e64:
+  case AMDGPU::S_MOV_B32:
+  case AMDGPU::S_MOV_B64:
+  case AMDGPU::S_MOV_B64_IMM_PSEUDO:
+  case AMDGPU::COPY:
+  case AMDGPU::WWM_COPY:
+  case AMDGPU::V_ACCVGPR_WRITE_B32_e64:
+  case AMDGPU::V_ACCVGPR_READ_B32_e64:
+  case AMDGPU::V_ACCVGPR_MOV_B32:
+  case AMDGPU::AV_MOV_B32_IMM_PSEUDO:
+  case AMDGPU::AV_MOV_B64_IMM_PSEUDO:
+    return 1;
+  default:
+    assert(0 && "MI is not a foldable copy");
+  }
+}
+
 static constexpr AMDGPU::OpName ModifierOpNames[] = {
     AMDGPU::OpName::src0_modifiers, AMDGPU::OpName::src1_modifiers,
     AMDGPU::OpName::src2_modifiers, AMDGPU::OpName::clamp,
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index a21089f8e0fcc..cc59acf1ebd94 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -417,6 +417,7 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
                                   const MachineInstr &MIb) const override;
 
   static bool isFoldableCopy(const MachineInstr &MI);
+  static unsigned getFoldableCopySrcIdx(const MachineInstr &MI);
 
   void removeModOperands(MachineInstr &MI) const;
 

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test

@@ -931,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII,
for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg);
SubDef && TII.isFoldableCopy(*SubDef);
SubDef = MRI.getVRegDef(Sub->getReg())) {
MachineOperand &SrcOp = SubDef->getOperand(1);
unsigned SrcIdx = TII.getFoldableCopySrcIdx(*SubDef);
MachineOperand &SrcOp = SubDef->getOperand(SrcIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use getNetNamedOperandIdx? Also probably should just fix the instruction to keep this in index 1

Copy link
Contributor Author

@broxigarchen broxigarchen Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not all foldable copy has named operand src0

the v_mov_b16_t16 has opsel thus the operand 1 is src0_mod

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the mismatched operand name then, I don't want to maintain this switch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case that will fall is COPY but you can just special case that one

Copy link
Contributor Author

@broxigarchen broxigarchen Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a helper is better since people might forget about this special case when they are adding new code. But if you think maintaining the switch is more costly, I'll create a follow up patch to remove it.

@broxigarchen
Copy link
Contributor Author

Missing test

added a test

@broxigarchen broxigarchen force-pushed the main-fix-true16-si-fold branch from eb27475 to c3a4a1e Compare October 3, 2025 14:28
@broxigarchen broxigarchen force-pushed the main-fix-true16-si-fold branch from c3a4a1e to 921341f Compare October 3, 2025 14:33
@broxigarchen
Copy link
Contributor Author

CI error is unrelated. windows ci is passing

@broxigarchen
Copy link
Contributor Author

ping! This is blocking ROCm/rocMLIR#2011 so would be a bit urgent

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -932,7 +931,9 @@ static MachineOperand *lookUpCopyChain(const SIInstrInfo &TII,
for (MachineInstr *SubDef = MRI.getVRegDef(SrcReg);
SubDef && TII.isFoldableCopy(*SubDef);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the whole call chain if this property is already checked, but we should probably check that src_modifiers are 0 on the v_mov_b16 inside isFoldableCopy. I don't think we will set them, but safer to check. That can be a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think this FoldableCopy need to updated as well. I'll create another patch for the clean up, might also replace the if else check with the helper

@broxigarchen broxigarchen enabled auto-merge (squash) October 3, 2025 21:34
@broxigarchen broxigarchen disabled auto-merge October 3, 2025 21:34
@broxigarchen broxigarchen merged commit b8127cc into llvm:main Oct 3, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants