Greedy: Make eviction broken hint cost use CopyCost units #160084

arsenm · 2025-09-22T12:22:48Z

Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

%argN = COPY $vgprN
%tuple = inst %argN
$vgpr0 = COPY %tuple.sub0
$vgpr1 = COPY %tuple.sub1
$vgpr2 = COPY %tuple.sub2
$vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing the double counting badly
regresses a number of RISCV tests, which seem to rely on overestimating
the cost in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop (RISCV is possibly underestimating the copy costs for
vector registers).

Change the eviction advisor heuristic cost based on number of broken hints to work in units of copy cost, rather than a magic number 1. The intent is to allow breaking hints for cheap subregisters in favor of more expensive register tuples. The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of the case I am attempting to solve. Use of tuples in ABI contexts ends up looking like this: %argN = COPY $vgprN %tuple = inst %argN $vgpr0 = COPY %tuple.sub0 $vgpr1 = COPY %tuple.sub1 $vgpr2 = COPY %tuple.sub2 $vgpr3 = COPY %tuple.sub3 Since there are physreg copies in the input and output sequence, both have hints to a physreg. The wider tuple hint on the output should win though, since this satisfies 4 hints instead of 1. This is the obvious part of a larger change to better handle subregister interference with register tuples, and is not sufficient to handle the original case I am looking at. There are several bugs here that are proving tricky to untangle. In particular, there is a double counting bug for all registers with multiple regunits; the cost of breaking the interfering hint is added for each interfering virtual register, which have repeat visits across regunits. Fixing this badly regresses a number of RISCV tests, which seem to rely on overestimating the cost in tryFindEvictionCandidate to avoid early-exiting the eviction candidate loop.

arsenm · 2025-09-22T12:23:10Z

Greedy: Make eviction broken hint cost use CopyCost units #160084 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-09-22T12:23:22Z

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

%argN = COPY $vgprN
%tuple = inst %argN
$vgpr0 = COPY %tuple.sub0
$vgpr1 = COPY %tuple.sub1
$vgpr2 = COPY %tuple.sub2
$vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing this badly regresses
a number of RISCV tests, which seem to rely on overestimating the cost
in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop.

Patch is 2.40 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160084.diff

7 Files Affected:

(modified) llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp (+5-3)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll (+8-11)
(modified) llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll (+1556-1556)
(modified) llvm/test/CodeGen/RISCV/rvv/vloxseg-rv64.ll (+1892-1892)
(modified) llvm/test/CodeGen/RISCV/rvv/vluxseg-rv32.ll (+1556-1556)
(modified) llvm/test/CodeGen/RISCV/rvv/vluxseg-rv64.ll (+1906-1906)
(modified) llvm/test/CodeGen/RISCV/zilsd.ll (+3-4)

diff --git a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
index 0d44ddc428570..f2c2f74755ace 100644
--- a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
+++ b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
@@ -224,7 +224,7 @@ bool DefaultEvictionAdvisor::canEvictHintInterference(
     const LiveInterval &VirtReg, MCRegister PhysReg,
     const SmallVirtRegSet &FixedRegisters) const {
   EvictionCost MaxCost;
-  MaxCost.setBrokenHints(1);
+  MaxCost.setBrokenHints(MRI->getRegClass(VirtReg.reg())->getCopyCost());
   return canEvictInterferenceBasedOnCost(VirtReg, PhysReg, true, MaxCost,
                                          FixedRegisters);
 }
@@ -300,12 +300,14 @@ bool DefaultEvictionAdvisor::canEvictInterferenceBasedOnCost(
           return false;
         // We permit breaking cascades for urgent evictions. It should be the
         // last resort, though, so make it really expensive.
-        Cost.BrokenHints += 10;
+        Cost.BrokenHints += 10 * MRI->getRegClass(Intf->reg())->getCopyCost();
       }
       // Would this break a satisfied hint?
       bool BreaksHint = VRM->hasPreferredPhys(Intf->reg());
       // Update eviction cost.
-      Cost.BrokenHints += BreaksHint;
+      if (BreaksHint)
+        Cost.BrokenHints += MRI->getRegClass(Intf->reg())->getCopyCost();
+
       Cost.MaxWeight = std::max(Cost.MaxWeight, Intf->weight());
       // Abort if this would be too expensive.
       if (Cost >= MaxCost)
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
index bb4a607fc62d0..44a4e8171ff33 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
@@ -18,22 +18,19 @@ define amdgpu_ps <4 x float> @load_1d_lwe(<8 x i32> inreg %rsrc, ptr addrspace(1
 ; GCN-LABEL: load_1d_lwe:
 ; GCN:       ; %bb.0: ; %main_body
 ; GCN-NEXT:    v_mov_b32_e32 v8, 0
+; GCN-NEXT:    v_mov_b32_e32 v6, v0
 ; GCN-NEXT:    v_mov_b32_e32 v9, v8
 ; GCN-NEXT:    v_mov_b32_e32 v10, v8
 ; GCN-NEXT:    v_mov_b32_e32 v11, v8
 ; GCN-NEXT:    v_mov_b32_e32 v12, v8
-; GCN-NEXT:    v_mov_b32_e32 v2, v8
-; GCN-NEXT:    v_mov_b32_e32 v3, v9
-; GCN-NEXT:    v_mov_b32_e32 v4, v10
-; GCN-NEXT:    v_mov_b32_e32 v5, v11
-; GCN-NEXT:    v_mov_b32_e32 v6, v12
-; GCN-NEXT:    image_load v[2:6], v0, s[0:7] dmask:0xf unorm lwe
+; GCN-NEXT:    v_mov_b32_e32 v0, v8
+; GCN-NEXT:    v_mov_b32_e32 v1, v9
+; GCN-NEXT:    v_mov_b32_e32 v2, v10
+; GCN-NEXT:    v_mov_b32_e32 v3, v11
+; GCN-NEXT:    v_mov_b32_e32 v4, v12
+; GCN-NEXT:    image_load v[0:4], v6, s[0:7] dmask:0xf unorm lwe
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
-; GCN-NEXT:    v_mov_b32_e32 v0, v2
-; GCN-NEXT:    v_mov_b32_e32 v1, v3
-; GCN-NEXT:    v_mov_b32_e32 v2, v4
-; GCN-NEXT:    v_mov_b32_e32 v3, v5
-; GCN-NEXT:    global_store_dword v8, v6, s[8:9]
+; GCN-NEXT:    global_store_dword v8, v4, s[8:9]
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
 ; GCN-NEXT:    ; return to shader part epilog
 main_body:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
index 68d3f85cd1f23..6d70d191ba8b6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
@@ -9,8 +9,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i8> %index, i32 %vl, i32 3)
@@ -22,8 +22,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i8> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -38,8 +38,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i16> %index, i32 %vl, i32 3)
@@ -51,8 +51,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i16> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -67,8 +67,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i32> %index, i32 %vl, i32 3)
@@ -80,8 +80,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i32> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -96,8 +96,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i8(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i8> %index, i32 %vl, i32 3)
@@ -109,8 +109,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i8(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i8> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -125,8 +125,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i16(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i16> %index, i32 %vl, i32 3)
@@ -138,8 +138,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i16(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i16> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -154,8 +154,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i32(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i32> %index, i32 %vl, i32 3)
@@ -167,8 +167,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i32(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i32> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -183,8 +183,8 @@ define <vscale x 4 x i8> @test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv4i8_2t.nxv4i8(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i8> %index, i32 %vl, i32 3)
@@ -196,8 +196,8 @@ define <vscale x 4 x i8> @test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv4i8_2t.nxv4i1.nxv4i8(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i8> %index, <vscale x 4 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -212,8 +212,8 @@ define <vscale x 4 x i8> @test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv4i8_2t.nxv4i16(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i16> %index, i32 %vl, i32 3)
@@ -225,8 +225,8 @@ define <vscale x 4 x i8> @test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv4i8_2t.nxv4i1.nxv4i16(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i16> %index, <vscale x 4 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -270,8 +270,8 @@ define <vscale x 8 x i8> @test_vloxseg2_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv8i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 8 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv8i8_2t.nxv8i8(target("riscv.vector.tuple", <vscale x 8 x i8>, 2) poison, ptr %base, <vscale x 8 x i8> %index, i32 %vl, i32 3)
@@ -283,8 +283,8 @@ define <vscale x 8 x i8> @test_vloxseg2_mask_nxv8i8_triscv.vector.tuple_nxv8i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv8i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 8 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv8i8_2t.nxv8i1.nxv8i8(target("riscv.vector.tuple", <vscale x 8 x i8>, 2) poison, ptr %base, <vscale x 8 x i8> %index, <vscale x 8 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -502,8 +502,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei8.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i8> %index, i32 %vl, i32 3)
@@ -515,8 +515,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei8.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i8> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -531,8 +531,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei16.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i16> %index, i32 %vl, i32 3)
@@ -544,8 +544,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei16.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i16> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -560,8 +560,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei32.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i32> %index, i32 %vl, i32 3)
@@ -573,8 +573,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei32.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i32(target("riscv....
[truncated]

preames

Basic idea here seems entirely reasonable. Can you update the comment in the header?

Just to confirm, this part from the review description: "ixing this badly regresses a number of RISCV tests, which seem to rely on overestimating the cost in tryFindEvictionCandidate to avoid early-exiting the eviction candidate loop (RISCV is possibly underestimating the copy costs for vector registers)." is applying to the possible future change, not this one right? I think so, but the wording got slightly confusing, and I don't want to get this backwards. If so, happy to help debug/fix that one.

preames

LGTM w/prior comment addressed.

preames · 2025-09-22T21:54:03Z

llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll

-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9


I realized something afterward when thinking about this diff. I think we're missing a sub-register check or a split heuristic somewhere because in this case, we wouldn't be copying the tuple register class. Only the second register in the tuple is actually returned, so the cost of the hint should be on that sub-register instead.

This is non-blocking observation for this review. The codegen difference seems small, and unimportant, it's just the realization that it might be the hint of something more interesting that I wanted to note.

We definitely lack a proper subrange split somewhere

arsenm · 2025-09-22T22:52:35Z

Basic idea here seems entirely reasonable. Can you update the comment in the header?

is applying to the possible future change, not this one right? I think so, but the wording got slightly confusing, and I don't want to get this backwards.

Yes, this is the minimum viable patch that shows something, the problems are in fixing the double counting bug

nico · 2025-09-22T23:29:14Z

Looks like this might break tests on macOS: http://45.33.8.238/macm1/114346/step_10.txt

arsenm · 2025-09-22T23:34:49Z

Looks like this might break tests on macOS: http://45.33.8.238/macm1/114346/step_10.txt

No, that's a known flaky test

llvm-ci · 2025-09-23T00:09:41Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-msan running on sanitizer-buildbot5 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/164/builds/13640

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88867 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: shtest-readfile-external.py (85659 of 88867)
******************** TEST 'lit :: shtest-readfile-external.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile | FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# executed command: env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile
# note: command had no output on stdout or stderr
# executed command: FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py:10:10: error: CHECK: expected string not found in input
# | # CHECK: + echo hello
# |          ^
# | <stdin>:16:316: note: scanning from here
# | echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3
# |                                                                                                                                                                                                                                                                                                                            ^
# | <stdin>:21:1: note: possible intended match here
# | + not echo return
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            11: -- 
# |            12: Command Output (stderr): 
# |            13: -- 
# |            14: echo -n "hello" > /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 2 
# |            15: + echo -n hello 
# |            16: echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3 
# | check:10'0                                                                                                                                                                                                                                                                                                                                X error: no match found
Step 14 (stage3/msan check) failure: stage3/msan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88867 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: shtest-readfile-external.py (85659 of 88867)
******************** TEST 'lit :: shtest-readfile-external.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile | FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# executed command: env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile
# note: command had no output on stdout or stderr
# executed command: FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py:10:10: error: CHECK: expected string not found in input
# | # CHECK: + echo hello
# |          ^
# | <stdin>:16:316: note: scanning from here
# | echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3
# |                                                                                                                                                                                                                                                                                                                            ^
# | <stdin>:21:1: note: possible intended match here
# | + not echo return
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            11: -- 
# |            12: Command Output (stderr): 
# |            13: -- 
# |            14: echo -n "hello" > /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 2 
# |            15: + echo -n hello 
# |            16: echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3 
# | check:10'0                                                                                                                                                                                                                                                                                                                                X error: no match found

Change the eviction advisor heuristic cost based on number of broken hints to work in units of copy cost, rather than a magic number 1. The intent is to allow breaking hints for cheap subregisters in favor of more expensive register tuples. The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of the case I am attempting to solve. Use of tuples in ABI contexts ends up looking like this: %argN = COPY $vgprN %tuple = inst %argN $vgpr0 = COPY %tuple.sub0 $vgpr1 = COPY %tuple.sub1 $vgpr2 = COPY %tuple.sub2 $vgpr3 = COPY %tuple.sub3 Since there are physreg copies in the input and output sequence, both have hints to a physreg. The wider tuple hint on the output should win though, since this satisfies 4 hints instead of 1. This is the obvious part of a larger change to better handle subregister interference with register tuples, and is not sufficient to handle the original case I am looking at. There are several bugs here that are proving tricky to untangle. In particular, there is a double counting bug for all registers with multiple regunits; the cost of breaking the interfering hint is added for each interfering virtual register, which have repeat visits across regunits. Fixing the double counting badly regresses a number of RISCV tests, which seem to rely on overestimating the cost in tryFindEvictionCandidate to avoid early-exiting the eviction candidate loop (RISCV is possibly underestimating the copy costs for vector registers).

bjope · 2025-09-29T14:30:25Z

llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp

        // We permit breaking cascades for urgent evictions. It should be the
        // last resort, though, so make it really expensive.
-        Cost.BrokenHints += 10;
+        Cost.BrokenHints += 10 * MRI->getRegClass(Intf->reg())->getCopyCost();


@arsenm , is this change considering that getCopyCost may return -1 to indicate "extremely high cost".
IIUC that actually would result in subtracting 10 instead of adding a huge penalty here.

(Context is that I'm debugging some downstream problems, as we've been seeing a number of "ran out of registers" failures after merging this PR. Haven't gotten that far in the debugging yet. But the question above popped up.)

I tried a hack when I changed all three getCopyCost() calls added here to basically do this (pseudo code),
getCopyCost() < 0 ? 100 : getCopyCost()
and then at least the "ran out of registers" fault I was looking at disappeared (and I got the same result as before this patch).

Our target has some quad-registers that we define with CopyCost=-1. So I figure that is what caused the problem here.

I believe -1 was supposed to mean impossible cost, not very high

But this should probably be an unsigned value

Or we should explicitly use saturating arithmetic here. Either way, probably worth a revert and reapply with a fix.

Ok, so what do we do here? CopyCost=-1 is used in several places.

@preames suggested a revert and reapply with fix. Will you do that @arsenm?

I'm leaning towards no and just rejecting using -1 as a CopyCost. I haven't found anywhere that does anything with that as an impossible value

All of the in-tree cases using -1 copy costs appear to be cargo-cult application to non-allocatable physical registers. The only place making use of the negative cost check is in InstrEmitter, which is probably better served by a non-allocatable check. I actually see a number of test improvements when I swap this out for the direct allocatability check

Tolerate setting negative values in tablegen, and store them as a saturated uint8_t value. This will allow naive uses of the copy cost to directly add it as a cost without considering the degenerate negative case. The degenerate negative cases are only used in InstrEmitter / DAG scheduling, so leave the special case processing there. There are also fixmes about this system already there. This is the expedient fix for an out of tree target regression after llvm#160084. Currently targets can set a negative copy cost to mark copies as "impossible". However essentially all the in-tree uses only uses this for non-allocatable condition registers. We probably should replace the InstrEmitter/DAG scheduler uses with a more direct check for a copyable register but that has test changes.

…1786) Tolerate setting negative values in tablegen, and store them as a saturated uint8_t value. This will allow naive uses of the copy cost to directly add it as a cost without considering the degenerate negative case. The degenerate negative cases are only used in InstrEmitter / DAG scheduling, so leave the special case processing there. There are also fixmes about this system already there. This is the expedient fix for an out of tree target regression after #160084. Currently targets can set a negative copy cost to mark copies as "impossible". However essentially all the in-tree uses only uses this for non-allocatable condition registers. We probably should replace the InstrEmitter/DAG scheduler uses with a more direct check for a copyable register but that has test changes.

arsenm added the llvm:regalloc label Sep 22, 2025 — with Graphite App

arsenm requested review from jayfoad, jrbyrnes, kparzysz, lukel97, perlfu, preames, qcolombet, topperc, cdevadas, MatzeB and wangpc-pp September 22, 2025 12:23

arsenm marked this pull request as ready for review September 22, 2025 12:25

llvmbot added backend:AMDGPU backend:RISC-V llvm:codegen labels Sep 22, 2025

lenary mentioned this pull request Sep 22, 2025

[RISCV] Increase GPRPair Allocation Priority #159169

Closed

preames reviewed Sep 22, 2025

View reviewed changes

preames approved these changes Sep 22, 2025

View reviewed changes

preames reviewed Sep 22, 2025

View reviewed changes

arsenm merged commit d08e445 into main Sep 22, 2025
16 checks passed

arsenm deleted the users/arsenm/greedy/use-class-copy-cost-evict-hint-break branch September 22, 2025 23:03

bjope reviewed Sep 29, 2025

View reviewed changes

arsenm mentioned this pull request Oct 3, 2025

CodeGen: Do not store RegisterClass copy costs as a signed value #161786

Merged

Greedy: Make eviction broken hint cost use CopyCost units #160084

Greedy: Make eviction broken hint cost use CopyCost units #160084

Uh oh!

Conversation

arsenm commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Sep 22, 2025

Uh oh!

llvmbot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Sep 22, 2025

Uh oh!

Uh oh!

nico commented Sep 22, 2025

Uh oh!

arsenm commented Sep 22, 2025

Uh oh!

llvm-ci commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm commented Sep 22, 2025 •

edited

Loading

llvmbot commented Sep 22, 2025 •

edited

Loading