AMDGPU: Simplify foldImmediate with register class based checks #154682

arsenm · 2025-08-21T06:35:32Z

Generalize the code over the properties of the mov instruction,
rather than maintaining parallel logic to figure out the type
of mov to use. I've maintained the behavior with 16-bit physical
SGPRs, though I think the behavior here is broken and corrupting
any value that happens to be live in the high bits. It just happens
there's no way to separately write to those with a real instruction
but I don't think we should be trying to make assumptions around
that property.

This is NFC-ish. It now does a better job with imm pseudos which
practically won't reach here. This also will make it easier
to support more folds in a future patch.

I added a couple of new tests with 16-bit extract of 64-bit sources.

arsenm · 2025-08-21T06:35:47Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-08-21T06:36:09Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Generalize the code over the properties of the mov instruction,
rather than maintaining parallel logic to figure out the type
of mov to use. I've maintained the behavior with 16-bit physical
SGPRs, though I think the behavior here is broken and corrupting
any value that happens to be live in the high bits. It just happens
there's no way to separately write to those with a real instruction
but I don't think we should be trying to make assumptions around
that property.

This is NFC-ish. It now does a better job with imm pseudos which
practically won't reach here. This also will make it easier
to support more folds in a future patch.

I added a couple of new tests with 16-bit extract of 64-bit sources.
The only other test change is an immediate rendering change from
zero extended to sign extended.

Full diff: https://github.com/llvm/llvm-project/pull/154682.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+75-36)
(modified) llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir (+65-8)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index df638bd65bdaa..75b303086163b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3573,54 +3573,93 @@ bool SIInstrInfo::foldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
     assert(!UseMI.getOperand(0).getSubReg() && "Expected SSA form");
 
     Register DstReg = UseMI.getOperand(0).getReg();
-    unsigned OpSize = getOpSize(UseMI, 0);
-    bool Is16Bit = OpSize == 2;
-    bool Is64Bit = OpSize == 8;
-    bool isVGPRCopy = RI.isVGPR(*MRI, DstReg);
-    unsigned NewOpc = isVGPRCopy ? Is64Bit ? AMDGPU::V_MOV_B64_PSEUDO
-                                           : AMDGPU::V_MOV_B32_e32
-                                 : Is64Bit ? AMDGPU::S_MOV_B64_IMM_PSEUDO
-                                           : AMDGPU::S_MOV_B32;
-
-    std::optional<int64_t> SubRegImm =
-        extractSubregFromImm(Imm, UseMI.getOperand(1).getSubReg());
-
-    APInt Imm(Is64Bit ? 64 : 32, *SubRegImm,
-              /*isSigned=*/true, /*implicitTrunc=*/true);
-
-    if (RI.isAGPR(*MRI, DstReg)) {
-      if (Is64Bit || !isInlineConstant(Imm))
-        return false;
-      NewOpc = AMDGPU::V_ACCVGPR_WRITE_B32_e64;
-    }
+    Register UseSubReg = UseMI.getOperand(1).getSubReg();
+
+    const TargetRegisterClass *DstRC = RI.getRegClassForReg(*MRI, DstReg);
+
+    bool Is16Bit = UseSubReg != AMDGPU::NoSubRegister &&
+                   RI.getSubRegIdxSize(UseSubReg) == 16;
 
     if (Is16Bit) {
-      if (isVGPRCopy)
+      if (RI.hasVGPRs(DstRC))
         return false; // Do not clobber vgpr_hi16
 
-      if (DstReg.isVirtual() && UseMI.getOperand(0).getSubReg() != AMDGPU::lo16)
+      if (DstReg.isVirtual() && UseSubReg != AMDGPU::lo16)
         return false;
-
-      UseMI.getOperand(0).setSubReg(0);
-      if (DstReg.isPhysical()) {
-        DstReg = RI.get32BitRegister(DstReg);
-        UseMI.getOperand(0).setReg(DstReg);
-      }
-      assert(UseMI.getOperand(1).getReg().isVirtual());
     }
 
     MachineFunction *MF = UseMI.getMF();
-    const MCInstrDesc &NewMCID = get(NewOpc);
-    const TargetRegisterClass *NewDefRC = getRegClass(NewMCID, 0, &RI, *MF);
 
-    if (DstReg.isPhysical()) {
-      if (!NewDefRC->contains(DstReg))
-        return false;
-    } else if (!MRI->constrainRegClass(DstReg, NewDefRC))
+    unsigned NewOpc = AMDGPU::INSTRUCTION_LIST_END;
+    MCRegister MovDstPhysReg =
+        DstReg.isPhysical() ? MCRegister(DstReg) : MCRegister();
+
+    std::optional<int64_t> SubRegImm = extractSubregFromImm(Imm, UseSubReg);
+
+    // TODO: Try to fold with AMDGPU::V_MOV_B16_t16_e64
+    for (unsigned MovOp :
+         {AMDGPU::S_MOV_B32, AMDGPU::V_MOV_B32_e32, AMDGPU::S_MOV_B64,
+          AMDGPU::V_MOV_B64_PSEUDO, AMDGPU::V_ACCVGPR_WRITE_B32_e64}) {
+      const MCInstrDesc &MovDesc = get(MovOp);
+
+      const TargetRegisterClass *MovDstRC = getRegClass(MovDesc, 0, &RI, *MF);
+      if (Is16Bit) {
+        // We just need to find a correctly sized register class, so the
+        // subregister index compatibility doesn't matter since we're statically
+        // extracting the immediate value.
+        MovDstRC = RI.getMatchingSuperRegClass(MovDstRC, DstRC, AMDGPU::lo16);
+        if (!MovDstRC)
+          continue;
+
+        if (MovDstPhysReg) {
+          // FIXME: We probably should not do this. If there is a live value in
+          // the high half of the register, it will be corrupted.
+          MovDstPhysReg = RI.getMatchingSuperReg(MCRegister(DstReg),
+                                                 AMDGPU::lo16, MovDstRC);
+          if (!MovDstPhysReg)
+            continue;
+        }
+      }
+
+      // Result class isn't the right size, try the next instruction.
+      if (MovDstPhysReg) {
+        if (!MovDstRC->contains(MovDstPhysReg))
+          return false;
+      } else if (!MRI->constrainRegClass(DstReg, MovDstRC)) {
+        // TODO: This will be overly conservative in the case of 16-bit virtual
+        // SGPRs. We could hack up the virtual register uses to use a compatible
+        // 32-bit class.
+        continue;
+      }
+
+      const MCOperandInfo &OpInfo = MovDesc.operands()[1];
+
+      // Ensure the interpreted immediate value is a valid operand in the new
+      // mov.
+      //
+      // FIXME: isImmOperandLegal should have form that doesn't require existing
+      // MachineInstr or MachineOperand
+      if (!RI.opCanUseLiteralConstant(OpInfo.OperandType) &&
+          !isInlineConstant(*SubRegImm, OpInfo.OperandType))
+        break;
+
+      NewOpc = MovOp;
+      break;
+    }
+
+    if (NewOpc == AMDGPU::INSTRUCTION_LIST_END)
       return false;
 
+    if (Is16Bit) {
+      UseMI.getOperand(0).setSubReg(AMDGPU::NoSubRegister);
+      if (MovDstPhysReg)
+        UseMI.getOperand(0).setReg(MovDstPhysReg);
+      assert(UseMI.getOperand(1).getReg().isVirtual());
+    }
+
+    const MCInstrDesc &NewMCID = get(NewOpc);
     UseMI.setDesc(NewMCID);
-    UseMI.getOperand(1).ChangeToImmediate(Imm.getSExtValue());
+    UseMI.getOperand(1).ChangeToImmediate(*SubRegImm);
     UseMI.addImplicitDefUseOperands(*MF);
     return true;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir b/llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir
index 770e7c048620d..7b3f924d32e89 100644
--- a/llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir
+++ b/llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir
@@ -128,7 +128,7 @@ body:             |
 
     ; GCN-LABEL: name: fold_sreg_64_sub0_to_vgpr_32
     ; GCN: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO 1311768467750121200
-    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 -1412567312, implicit $exec
+    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2882399984, implicit $exec
     ; GCN-NEXT: SI_RETURN_TO_EPILOG [[V_MOV_B32_e32_]]
     %0:sreg_64 = S_MOV_B64_IMM_PSEUDO 1311768467750121200
     %1:vgpr_32 = COPY killed %0.sub0
@@ -188,8 +188,7 @@ body:             |
 
     ; GCN-LABEL: name: fold_sreg_64_to_sreg_64
     ; GCN: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 1311768467750121200
-    ; GCN-NEXT: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO 1311768467750121200
-    ; GCN-NEXT: SI_RETURN_TO_EPILOG [[S_MOV_B]]
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG [[S_MOV_B64_]]
     %0:sreg_64 = S_MOV_B64 1311768467750121200
     %1:sreg_64 = COPY killed %0
     SI_RETURN_TO_EPILOG %1
@@ -382,7 +381,7 @@ body:             |
 
     ; GCN-LABEL: name: fold_v_mov_b64_64_sub0_to_vgpr_32
     ; GCN: [[V_MOV_B64_e32_:%[0-9]+]]:vreg_64_align2 = V_MOV_B64_e32 1311768467750121200, implicit $exec
-    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 -1412567312, implicit $exec
+    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2882399984, implicit $exec
     ; GCN-NEXT: SI_RETURN_TO_EPILOG [[V_MOV_B32_e32_]]
     %0:vreg_64_align2 = V_MOV_B64_e32 1311768467750121200, implicit $exec
     %1:vgpr_32 = COPY killed %0.sub0
@@ -761,8 +760,8 @@ body:             |
   bb.0:
     ; GCN-LABEL: name: fold_av_mov_b32_imm_pseudo_inlineimm_to_av
     ; GCN: [[AV_MOV_:%[0-9]+]]:av_32 = AV_MOV_B32_IMM_PSEUDO 64, implicit $exec
-    ; GCN-NEXT: [[COPY:%[0-9]+]]:av_32 = COPY killed [[AV_MOV_]]
-    ; GCN-NEXT: SI_RETURN_TO_EPILOG implicit [[COPY]]
+    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 64, implicit $exec
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG implicit [[V_MOV_B32_e32_]]
     %0:av_32 = AV_MOV_B32_IMM_PSEUDO 64, implicit $exec
     %1:av_32 = COPY killed %0
     SI_RETURN_TO_EPILOG implicit %1
@@ -800,9 +799,67 @@ body:             |
   bb.0:
     ; GCN-LABEL: name: fold_av_mov_b64_imm_pseudo_inlineimm_to_av
     ; GCN: [[AV_MOV_:%[0-9]+]]:av_64_align2 = AV_MOV_B64_IMM_PSEUDO 64, implicit $exec
-    ; GCN-NEXT: [[COPY:%[0-9]+]]:av_64_align2 = COPY killed [[AV_MOV_]]
-    ; GCN-NEXT: SI_RETURN_TO_EPILOG implicit [[COPY]]
+    ; GCN-NEXT: [[V_MOV_B:%[0-9]+]]:vreg_64_align2 = V_MOV_B64_PSEUDO 64, implicit $exec
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG implicit [[V_MOV_B]]
     %0:av_64_align2 = AV_MOV_B64_IMM_PSEUDO 64, implicit $exec
     %1:av_64_align2 = COPY killed %0
     SI_RETURN_TO_EPILOG implicit %1
 ...
+
+---
+name:            fold_simm_16_sub_to_lo_from_mov_64_virt_sgpr16
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: fold_simm_16_sub_to_lo_from_mov_64_virt_sgpr16
+    ; GCN: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 64
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:sgpr_lo16 = COPY killed [[S_MOV_B64_]].lo16
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG [[COPY]]
+    %0:sreg_64 = S_MOV_B64 64
+    %1:sgpr_lo16 = COPY killed %0.lo16
+    SI_RETURN_TO_EPILOG %1
+
+...
+---
+name:            fold_simm_16_sub_to_hi_from_mov_64_inline_imm_virt_sgpr16
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: fold_simm_16_sub_to_hi_from_mov_64_inline_imm_virt_sgpr16
+    ; GCN: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 64
+    ; GCN-NEXT: [[COPY:%[0-9]+]]:sgpr_lo16 = COPY killed [[S_MOV_B64_]].hi16
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG [[COPY]]
+    %0:sreg_64 = S_MOV_B64 64
+    %1:sgpr_lo16 = COPY killed %0.hi16
+    SI_RETURN_TO_EPILOG %1
+
+...
+
+---
+name:            fold_simm_16_sub_to_lo_from_mov_64_phys_sgpr16_lo
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: fold_simm_16_sub_to_lo_from_mov_64_phys_sgpr16_lo
+    ; GCN: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 64
+    ; GCN-NEXT: $sgpr0 = S_MOV_B32 64
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG $sgpr0_lo16
+    %0:sreg_64 = S_MOV_B64 64
+    $sgpr0_lo16 = COPY killed %0.lo16
+    SI_RETURN_TO_EPILOG $sgpr0_lo16
+
+...
+---
+name:            fold_simm_16_sub_to_hi_from_mov_64_inline_imm_phys_sgpr16_lo
+body:             |
+  bb.0:
+
+    ; GCN-LABEL: name: fold_simm_16_sub_to_hi_from_mov_64_inline_imm_phys_sgpr16_lo
+    ; GCN: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 64
+    ; GCN-NEXT: $sgpr0 = S_MOV_B32 0
+    ; GCN-NEXT: SI_RETURN_TO_EPILOG $sgpr0_lo16
+    %0:sreg_64 = S_MOV_B64 64
+    $sgpr0_lo16 = COPY killed %0.hi16
+    SI_RETURN_TO_EPILOG $sgpr0_lo16
+
+...

Sisyph · 2025-08-21T13:48:43Z

llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir

    ; GCN-LABEL: name: fold_sreg_64_sub0_to_vgpr_32
    ; GCN: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO 1311768467750121200
-    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 -1412567312, implicit $exec
+    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2882399984, implicit $exec


I think this is the immediate rendering change from zero extended to sign extended. Why is it ok for the transformation to choose which one to use?

Yes. The read value is only the 32-bits since it's a 32-bit operand t

I see, thanks. There are some implicit transformations later such that these are the same value.

Although I'd consider this a bug, we try to consistently keep immediates encoded as sign extended. The non-APInt forms of isInlineImmediate will miss cases that aren't properly sign extended

Sisyph

LGTM

Sisyph · 2025-08-21T15:28:05Z

llvm/test/CodeGen/AMDGPU/peephole-fold-imm.mir

    ; GCN-LABEL: name: fold_sreg_64_sub0_to_vgpr_32
    ; GCN: [[S_MOV_B:%[0-9]+]]:sreg_64 = S_MOV_B64_IMM_PSEUDO 1311768467750121200
-    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 -1412567312, implicit $exec
+    ; GCN-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2882399984, implicit $exec


I see, thanks. There are some implicit transformations later such that these are the same value.

Generalize the code over the properties of the mov instruction, rather than maintaining parallel logic to figure out the type of mov to use. I've maintained the behavior with 16-bit physical SGPRs, though I think the behavior here is broken and corrupting any value that happens to be live in the high bits. It just happens there's no way to separately write to those with a real instruction but I don't think we should be trying to make assumptions around that property. This is NFC-ish. It now does a better job with imm pseudos which practically won't reach here. This also will make it easier to support more folds in a future patch. I added a couple of new tests with 16-bit extract of 64-bit sources. The only other test change is an immediate rendering change from zero extended to sign extended.

llvm-ci · 2025-08-23T03:04:20Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-04 while building llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/10221

Here is the relevant piece of the build log for the reference

Step 7 (ninja check 1) failure: stage 1 checked (failure)
...
[485/890] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/SelectionDAGAddressAnalysisTest.cpp.o
[486/890] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/SchedBoundary.cpp.o
[487/890] Building CXX object unittests/ExecutionEngine/Orc/CMakeFiles/OrcJITTests.dir/IndirectionUtilsTest.cpp.o
[488/890] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/SelectionDAGNodeConstructionTest.cpp.o
[489/890] Linking CXX executable tools/clang/unittests/Format/FormatTests
[490/890] Building CXX object unittests/DebugInfo/DWARF/CMakeFiles/DebugInfoDWARFTests.dir/DwarfGenerator.cpp.o
[491/890] Building CXX object unittests/ExecutionEngine/Orc/CMakeFiles/OrcJITTests.dir/ExecutorAddressTest.cpp.o
[492/890] Building CXX object unittests/CodeGen/CMakeFiles/CodeGenTests.dir/MachineOperandTest.cpp.o
[492/890] cd /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins && /usr/local/bin/cmake --build /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins/ --target runtimes-test-depends --config Release
[1/5] Linking CXX executable flang-rt/unittests/Runtime/RuntimeTests
FAILED: flang-rt/unittests/Runtime/RuntimeTests 
: && /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/clang++ --target=aarch64-unknown-linux-gnu -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics    -Wl,--gc-sections flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/AccessTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ArrayConstructor.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Assign.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/BufferTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/CharacterTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/CommandTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Complex.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/CrashHandlerFixture.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Derived.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Descriptor.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ExternalIOTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Format.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/InputExtensions.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Inquiry.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ListInputTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/LogicalFormatTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Matmul.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/MatmulTranspose.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/MiscIntrinsic.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Namelist.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Numeric.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/NumericalFormatTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Pointer.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Ragged.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Random.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Reduction.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/RuntimeCrashTest.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Stop.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Support.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Time.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/TemporaryStack.cpp.o flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Transformational.cpp.o -o flang-rt/unittests/Runtime/RuntimeTests  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/libLLVMSupport.a  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/libllvm_gtest_main.a  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/libllvm_gtest.a  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/clang/22/lib/aarch64-unknown-linux-gnu/libflang_rt.runtime.a  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/libLLVMSupport.a  -lrt  -ldl  -lm  /usr/lib/aarch64-linux-gnu/libz.so  /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/lib/libLLVMDemangle.a  -lpthread && :
ld.lld: error: undefined symbol: testing::internal::GetBoolAssertionFailureMessage[abi:cxx11](testing::AssertionResult const&, char const*, char const*, char const*)
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(AllocatableTest_MoveAlloc_Test::TestBody())
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(AllocatableTest_MoveAlloc_Test::TestBody())
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(AllocatableTest_MoveAlloc_Test::TestBody())
>>> referenced 338 more times

ld.lld: error: undefined symbol: testing::internal::DeathTest::Create(char const*, testing::Matcher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&>, char const*, int, testing::internal::DeathTest**)
>>> referenced by ArrayConstructor.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ArrayConstructor.cpp.o:(ArrayConstructor_CharacterRuntimeCheck_Test::TestBody())
>>> referenced by CommandTest.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/CommandTest.cpp.o:(ZeroArguments_ECLInvalidCommandTerminatedSync_Test::TestBody())
>>> referenced by ListInputTest.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ListInputTest.cpp.o:(InputTest_TestListInputInvalidFormatWithSingleSuccess_Test::TestBody())
>>> referenced 33 more times

ld.lld: error: undefined symbol: testing::internal::EqFailure(char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, bool)
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(testing::AssertionResult testing::internal::CmpHelperEQ<std::basic_string_view<char, std::char_traits<char>>, char [50]>(char const*, char const*, std::basic_string_view<char, std::char_traits<char>> const&, char const (&) [50]))
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(testing::AssertionResult testing::internal::CmpHelperEQ<unsigned long, unsigned int>(char const*, char const*, unsigned long const&, unsigned int const&))
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(testing::AssertionResult testing::internal::CmpHelperEQ<float, float>(char const*, char const*, float const&, float const&))
>>> referenced 46 more times

ld.lld: error: undefined symbol: testing::internal::PrintStringTo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, std::ostream*)
>>> referenced by Allocatable.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/Allocatable.cpp.o:(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> testing::PrintToString<std::basic_string_view<char, std::char_traits<char>>>(std::basic_string_view<char, std::char_traits<char>> const&))
>>> referenced by ArrayConstructor.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ArrayConstructor.cpp.o:(testing::internal::MatchesRegexMatcher::DescribeTo(std::ostream*) const)
>>> referenced by ArrayConstructor.cpp
>>>               flang-rt/unittests/Runtime/CMakeFiles/RuntimeTests.dir/ArrayConstructor.cpp.o:(testing::internal::MatchesRegexMatcher::DescribeNegationTo(std::ostream*) const)
>>> referenced 4 more times

ld.lld: error: undefined symbol: llvm::Twine::str[abi:cxx11]() const

arsenm added the backend:AMDGPU label Aug 21, 2025 — with Graphite App

arsenm requested review from Sisyph, broxigarchen, cdevadas, jayfoad, kosarev, rampitec and shiltian August 21, 2025 06:36

arsenm marked this pull request as ready for review August 21, 2025 06:36

arsenm force-pushed the users/arsenm/amdgpu/simplify-foldImmediate branch from 10ca969 to a5d4616 Compare August 21, 2025 13:10

arsenm mentioned this pull request Aug 21, 2025

AMDGPU: Allow folding multiple uses of some immediates into copies #154757

Merged

Sisyph reviewed Aug 21, 2025

View reviewed changes

Sisyph approved these changes Aug 21, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/amdgpu/simplify-foldImmediate branch from a5d4616 to a32d42e Compare August 22, 2025 11:31

arsenm enabled auto-merge (squash) August 22, 2025 11:35

arsenm force-pushed the users/arsenm/amdgpu/simplify-foldImmediate branch 2 times, most recently from dfd9b00 to c9ad541 Compare August 23, 2025 01:10

arsenm force-pushed the users/arsenm/amdgpu/simplify-foldImmediate branch from c9ad541 to 0a06817 Compare August 23, 2025 01:48

arsenm merged commit 52ed03d into main Aug 23, 2025
9 checks passed

arsenm deleted the users/arsenm/amdgpu/simplify-foldImmediate branch August 23, 2025 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Simplify foldImmediate with register class based checks #154682

AMDGPU: Simplify foldImmediate with register class based checks #154682

Uh oh!

arsenm commented Aug 21, 2025 •

edited

Loading

Uh oh!

arsenm commented Aug 21, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

Sisyph Aug 21, 2025

Uh oh!

arsenm Aug 21, 2025

Uh oh!

Sisyph Aug 21, 2025

Uh oh!

arsenm Aug 22, 2025

Uh oh!

arsenm Aug 22, 2025

Uh oh!

Sisyph left a comment

Uh oh!

Sisyph Aug 21, 2025

Uh oh!

Uh oh!

llvm-ci commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AMDGPU: Simplify foldImmediate with register class based checks #154682

AMDGPU: Simplify foldImmediate with register class based checks #154682

Uh oh!

Conversation

arsenm commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 21, 2025

Uh oh!

Sisyph Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Sisyph Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

Sisyph Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arsenm commented Aug 21, 2025 •

edited

Loading

arsenm commented Aug 21, 2025 •

edited

Loading