Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
// Make sure to get the 32-bit version of the commuted opcode.
unsigned MaybeCommutedOpc = MI->getOpcode();
Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
if (TII->pseudoToMCOpcode(Op32) == -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIInstrInfo::commuteOpcode already checks this, is this being skipped somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference here is that it is trying to find the VOPe32 variant - which might not exist, even if the commuteOpcode did. (Part of the shrink).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be pulled into a helper function in SIInstrInfo instead of directly using pseudoToMCOpcode. We already have hasVALU32BitEncoding, but it returns a bool instead of the new opcode. Can you generalize that to return the shrunk opcode if valid?

return false;
}

appendFoldCandidate(FoldList, MI, CommuteOpNo, OpToFold, /*Commuted=*/true,
Expand Down
39 changes: 39 additions & 0 deletions llvm/test/CodeGen/AMDGPU/fold-abs64.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-fold-operands %s -o - | FileCheck %s

--- |
@sym = external constant i32
define void @fn() { ret void }
define void @fn2() { ret void }
...

---
name: fn
tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: fn
; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef [[DEF]].sub0, undef [[S_MOV_B64_]].sub0, 0, implicit $exec
; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
%0:vreg_64 = IMPLICIT_DEF
%1:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
%2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
S_ENDPGM 0, implicit %2
...

---
name: fn2
tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: fn2
; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 591751049, undef [[DEF]].sub0, 0, implicit $exec
; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
%0:vreg_64 = IMPLICIT_DEF
%1:sreg_64 = S_MOV_B64 4886718345
%2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
S_ENDPGM 0, implicit %2
...
Loading