Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 22, 2025

Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

%argN = COPY $vgprN
%tuple = inst %argN
$vgpr0 = COPY %tuple.sub0
$vgpr1 = COPY %tuple.sub1
$vgpr2 = COPY %tuple.sub2
$vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing the double counting badly
regresses a number of RISCV tests, which seem to rely on overestimating
the cost in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop (RISCV is possibly underestimating the copy costs for
vector registers).

Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

  %argN = COPY $vgprN
  %tuple = inst %argN
  $vgpr0 = COPY %tuple.sub0
  $vgpr1 = COPY %tuple.sub1
  $vgpr2 = COPY %tuple.sub2
  $vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing this badly regresses
a number of RISCV tests, which seem to rely on overestimating the cost
in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop.
Copy link
Contributor Author

arsenm commented Sep 22, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

%argN = COPY $vgprN
%tuple = inst %argN
$vgpr0 = COPY %tuple.sub0
$vgpr1 = COPY %tuple.sub1
$vgpr2 = COPY %tuple.sub2
$vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing this badly regresses
a number of RISCV tests, which seem to rely on overestimating the cost
in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop.


Patch is 2.40 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160084.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp (+5-3)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll (+8-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll (+1556-1556)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vloxseg-rv64.ll (+1892-1892)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vluxseg-rv32.ll (+1556-1556)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vluxseg-rv64.ll (+1906-1906)
  • (modified) llvm/test/CodeGen/RISCV/zilsd.ll (+3-4)
diff --git a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
index 0d44ddc428570..f2c2f74755ace 100644
--- a/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
+++ b/llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp
@@ -224,7 +224,7 @@ bool DefaultEvictionAdvisor::canEvictHintInterference(
     const LiveInterval &VirtReg, MCRegister PhysReg,
     const SmallVirtRegSet &FixedRegisters) const {
   EvictionCost MaxCost;
-  MaxCost.setBrokenHints(1);
+  MaxCost.setBrokenHints(MRI->getRegClass(VirtReg.reg())->getCopyCost());
   return canEvictInterferenceBasedOnCost(VirtReg, PhysReg, true, MaxCost,
                                          FixedRegisters);
 }
@@ -300,12 +300,14 @@ bool DefaultEvictionAdvisor::canEvictInterferenceBasedOnCost(
           return false;
         // We permit breaking cascades for urgent evictions. It should be the
         // last resort, though, so make it really expensive.
-        Cost.BrokenHints += 10;
+        Cost.BrokenHints += 10 * MRI->getRegClass(Intf->reg())->getCopyCost();
       }
       // Would this break a satisfied hint?
       bool BreaksHint = VRM->hasPreferredPhys(Intf->reg());
       // Update eviction cost.
-      Cost.BrokenHints += BreaksHint;
+      if (BreaksHint)
+        Cost.BrokenHints += MRI->getRegClass(Intf->reg())->getCopyCost();
+
       Cost.MaxWeight = std::max(Cost.MaxWeight, Intf->weight());
       // Abort if this would be too expensive.
       if (Cost >= MaxCost)
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
index bb4a607fc62d0..44a4e8171ff33 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.gfx90a.ll
@@ -18,22 +18,19 @@ define amdgpu_ps <4 x float> @load_1d_lwe(<8 x i32> inreg %rsrc, ptr addrspace(1
 ; GCN-LABEL: load_1d_lwe:
 ; GCN:       ; %bb.0: ; %main_body
 ; GCN-NEXT:    v_mov_b32_e32 v8, 0
+; GCN-NEXT:    v_mov_b32_e32 v6, v0
 ; GCN-NEXT:    v_mov_b32_e32 v9, v8
 ; GCN-NEXT:    v_mov_b32_e32 v10, v8
 ; GCN-NEXT:    v_mov_b32_e32 v11, v8
 ; GCN-NEXT:    v_mov_b32_e32 v12, v8
-; GCN-NEXT:    v_mov_b32_e32 v2, v8
-; GCN-NEXT:    v_mov_b32_e32 v3, v9
-; GCN-NEXT:    v_mov_b32_e32 v4, v10
-; GCN-NEXT:    v_mov_b32_e32 v5, v11
-; GCN-NEXT:    v_mov_b32_e32 v6, v12
-; GCN-NEXT:    image_load v[2:6], v0, s[0:7] dmask:0xf unorm lwe
+; GCN-NEXT:    v_mov_b32_e32 v0, v8
+; GCN-NEXT:    v_mov_b32_e32 v1, v9
+; GCN-NEXT:    v_mov_b32_e32 v2, v10
+; GCN-NEXT:    v_mov_b32_e32 v3, v11
+; GCN-NEXT:    v_mov_b32_e32 v4, v12
+; GCN-NEXT:    image_load v[0:4], v6, s[0:7] dmask:0xf unorm lwe
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
-; GCN-NEXT:    v_mov_b32_e32 v0, v2
-; GCN-NEXT:    v_mov_b32_e32 v1, v3
-; GCN-NEXT:    v_mov_b32_e32 v2, v4
-; GCN-NEXT:    v_mov_b32_e32 v3, v5
-; GCN-NEXT:    global_store_dword v8, v6, s[8:9]
+; GCN-NEXT:    global_store_dword v8, v4, s[8:9]
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
 ; GCN-NEXT:    ; return to shader part epilog
 main_body:
diff --git a/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
index 68d3f85cd1f23..6d70d191ba8b6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vloxseg-rv32.ll
@@ -9,8 +9,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i8> %index, i32 %vl, i32 3)
@@ -22,8 +22,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i8> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -38,8 +38,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i16> %index, i32 %vl, i32 3)
@@ -51,8 +51,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i16> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -67,8 +67,8 @@ define <vscale x 1 x i8> @test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv1i8_2t.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i32> %index, i32 %vl, i32 3)
@@ -80,8 +80,8 @@ define <vscale x 1 x i8> @test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv1i8_triscv.vector.tuple_nxv1i8_2t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv1i8_2t.nxv1i1.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 2) poison, ptr %base, <vscale x 1 x i32> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -96,8 +96,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i8(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i8> %index, i32 %vl, i32 3)
@@ -109,8 +109,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i8(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i8> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -125,8 +125,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i16(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i16> %index, i32 %vl, i32 3)
@@ -138,8 +138,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i16(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i16> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -154,8 +154,8 @@ define <vscale x 2 x i8> @test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv2i8_2t.nxv2i32(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i32> %index, i32 %vl, i32 3)
@@ -167,8 +167,8 @@ define <vscale x 2 x i8> @test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv2i8_triscv.vector.tuple_nxv2i8_2t_nxv2i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT:    vloxseg2ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei32.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 2 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv2i8_2t.nxv2i1.nxv2i32(target("riscv.vector.tuple", <vscale x 2 x i8>, 2) poison, ptr %base, <vscale x 2 x i32> %index, <vscale x 2 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -183,8 +183,8 @@ define <vscale x 4 x i8> @test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv4i8_2t.nxv4i8(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i8> %index, i32 %vl, i32 3)
@@ -196,8 +196,8 @@ define <vscale x 4 x i8> @test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv4i8_2t.nxv4i1.nxv4i8(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i8> %index, <vscale x 4 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -212,8 +212,8 @@ define <vscale x 4 x i8> @test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv4i8_2t.nxv4i16(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i16> %index, i32 %vl, i32 3)
@@ -225,8 +225,8 @@ define <vscale x 4 x i8> @test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv4i8_triscv.vector.tuple_nxv4i8_2t_nxv4i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT:    vloxseg2ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei16.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 4 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv4i8_2t.nxv4i1.nxv4i16(target("riscv.vector.tuple", <vscale x 4 x i8>, 2) poison, ptr %base, <vscale x 4 x i16> %index, <vscale x 4 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -270,8 +270,8 @@ define <vscale x 8 x i8> @test_vloxseg2_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv
 ; CHECK-LABEL: test_vloxseg2_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv8i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 8 x i8>, 2) @llvm.riscv.vloxseg2.triscv.vector.tuple_nxv8i8_2t.nxv8i8(target("riscv.vector.tuple", <vscale x 8 x i8>, 2) poison, ptr %base, <vscale x 8 x i8> %index, i32 %vl, i32 3)
@@ -283,8 +283,8 @@ define <vscale x 8 x i8> @test_vloxseg2_mask_nxv8i8_triscv.vector.tuple_nxv8i8_2
 ; CHECK-LABEL: test_vloxseg2_mask_nxv8i8_triscv.vector.tuple_nxv8i8_2t_nxv8i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT:    vloxseg2ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vloxseg2ei8.v v7, (a0), v9, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 8 x i8>, 2) @llvm.riscv.vloxseg2.mask.triscv.vector.tuple_nxv8i8_2t.nxv8i1.nxv8i8(target("riscv.vector.tuple", <vscale x 8 x i8>, 2) poison, ptr %base, <vscale x 8 x i8> %index, <vscale x 8 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -502,8 +502,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei8.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei8.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i8> %index, i32 %vl, i32 3)
@@ -515,8 +515,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i8:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei8.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei8.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i8(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i8> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -531,8 +531,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei16.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei16.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i16> %index, i32 %vl, i32 3)
@@ -544,8 +544,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i16:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei16.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei16.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i16(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i16> %index, <vscale x 1 x i1> %mask, i32 %vl, i32 1, i32 3)
@@ -560,8 +560,8 @@ define <vscale x 1 x i8> @test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv
 ; CHECK-LABEL: test_vloxseg3_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei32.v v9, (a0), v8
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei32.v v7, (a0), v10
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.triscv.vector.tuple_nxv1i8_3t.nxv1i32(target("riscv.vector.tuple", <vscale x 1 x i8>, 3) poison, ptr %base, <vscale x 1 x i32> %index, i32 %vl, i32 3)
@@ -573,8 +573,8 @@ define <vscale x 1 x i8> @test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3
 ; CHECK-LABEL: test_vloxseg3_mask_nxv1i8_triscv.vector.tuple_nxv1i8_3t_nxv1i32:
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT:    vloxseg3ei32.v v9, (a0), v8, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v10
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vloxseg3ei32.v v7, (a0), v10, v0.t
 ; CHECK-NEXT:    ret
 entry:
   %0 = tail call target("riscv.vector.tuple", <vscale x 1 x i8>, 3) @llvm.riscv.vloxseg3.mask.triscv.vector.tuple_nxv1i8_3t.nxv1i1.nxv1i32(target("riscv....
[truncated]

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basic idea here seems entirely reasonable. Can you update the comment in the header?

Just to confirm, this part from the review description: "ixing this badly regresses a number of RISCV tests, which seem to rely on overestimating the cost in tryFindEvictionCandidate to avoid early-exiting the eviction candidate loop (RISCV is possibly underestimating the copy costs for vector registers)." is applying to the possible future change, not this one right? I think so, but the wording got slightly confusing, and I don't want to get this backwards. If so, happy to help debug/fix that one.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/prior comment addressed.

; CHECK-NEXT: vloxseg2ei8.v v9, (a0), v8
; CHECK-NEXT: vmv1r.v v8, v10
; CHECK-NEXT: vmv1r.v v9, v8
; CHECK-NEXT: vloxseg2ei8.v v7, (a0), v9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized something afterward when thinking about this diff. I think we're missing a sub-register check or a split heuristic somewhere because in this case, we wouldn't be copying the tuple register class. Only the second register in the tuple is actually returned, so the cost of the hint should be on that sub-register instead.

This is non-blocking observation for this review. The codegen difference seems small, and unimportant, it's just the realization that it might be the hint of something more interesting that I wanted to note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely lack a proper subrange split somewhere

@arsenm
Copy link
Contributor Author

arsenm commented Sep 22, 2025

Basic idea here seems entirely reasonable. Can you update the comment in the header?

is applying to the possible future change, not this one right? I think so, but the wording got slightly confusing, and I don't want to get this backwards.

Yes, this is the minimum viable patch that shows something, the problems are in fixing the double counting bug

@arsenm arsenm merged commit d08e445 into main Sep 22, 2025
16 checks passed
@arsenm arsenm deleted the users/arsenm/greedy/use-class-copy-cost-evict-hint-break branch September 22, 2025 23:03
@nico
Copy link
Contributor

nico commented Sep 22, 2025

Looks like this might break tests on macOS: http://45.33.8.238/macm1/114346/step_10.txt

@arsenm
Copy link
Contributor Author

arsenm commented Sep 22, 2025

Looks like this might break tests on macOS: http://45.33.8.238/macm1/114346/step_10.txt

No, that's a known flaky test

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 23, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-msan running on sanitizer-buildbot5 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/164/builds/13640

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88867 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: shtest-readfile-external.py (85659 of 88867)
******************** TEST 'lit :: shtest-readfile-external.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile | FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# executed command: env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile
# note: command had no output on stdout or stderr
# executed command: FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py:10:10: error: CHECK: expected string not found in input
# | # CHECK: + echo hello
# |          ^
# | <stdin>:16:316: note: scanning from here
# | echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3
# |                                                                                                                                                                                                                                                                                                                            ^
# | <stdin>:21:1: note: possible intended match here
# | + not echo return
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            11: -- 
# |            12: Command Output (stderr): 
# |            13: -- 
# |            14: echo -n "hello" > /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 2 
# |            15: + echo -n hello 
# |            16: echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3 
# | check:10'0                                                                                                                                                                                                                                                                                                                                X error: no match found
Step 14 (stage3/msan check) failure: stage3/msan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:530: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88867 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: lit :: shtest-readfile-external.py (85659 of 88867)
******************** TEST 'lit :: shtest-readfile-external.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS "/usr/bin/python3" /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile | FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# executed command: env LIT_USE_INTERNAL_SHELL=0 not env -u FILECHECK_OPTS /usr/bin/python3 /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical -a -v Inputs/shtest-readfile
# note: command had no output on stdout or stderr
# executed command: FileCheck -match-full-lines -DTEMP_PATH=/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py:10:10: error: CHECK: expected string not found in input
# | # CHECK: + echo hello
# |          ^
# | <stdin>:16:316: note: scanning from here
# | echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3
# |                                                                                                                                                                                                                                                                                                                            ^
# | <stdin>:21:1: note: possible intended match here
# | + not echo return
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/shtest-readfile-external.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            11: -- 
# |            12: Command Output (stderr): 
# |            13: -- 
# |            14: echo -n "hello" > /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 2 
# |            15: + echo -n hello 
# |            16: echo $(cat /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp) && test -e /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build2_msan/utils/lit/tests/Inputs/shtest-readfile/Output/absolute-paths.txt.tmp # RUN: at line 3 
# | check:10'0                                                                                                                                                                                                                                                                                                                                X error: no match found

YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
Change the eviction advisor heuristic cost based on number of broken
hints to work in units of copy cost, rather than a magic number 1.
The intent is to allow breaking hints for cheap subregisters in favor
of more expensive register tuples.

The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of
the case I am attempting to solve. Use of tuples in ABI contexts ends up
looking like this:

  %argN = COPY $vgprN
  %tuple = inst %argN
  $vgpr0 = COPY %tuple.sub0
  $vgpr1 = COPY %tuple.sub1
  $vgpr2 = COPY %tuple.sub2
  $vgpr3 = COPY %tuple.sub3

Since there are physreg copies in the input and output sequence,
both have hints to a physreg. The wider tuple hint on the output
should win though, since this satisfies 4 hints instead of 1.

This is the obvious part of a larger change to better handle
subregister interference with register tuples, and is not sufficient
to handle the original case I am looking at. There are several bugs here
that are proving tricky to untangle. In particular, there is a double
counting bug for all registers with multiple regunits; the cost of
breaking
the interfering hint is added for each interfering virtual register,
which have repeat visits across regunits. Fixing the double counting
badly
regresses a number of RISCV tests, which seem to rely on overestimating 
the cost in tryFindEvictionCandidate to avoid early-exiting the eviction
candidate loop (RISCV is possibly underestimating the copy costs for
vector registers).
// We permit breaking cascades for urgent evictions. It should be the
// last resort, though, so make it really expensive.
Cost.BrokenHints += 10;
Cost.BrokenHints += 10 * MRI->getRegClass(Intf->reg())->getCopyCost();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arsenm , is this change considering that getCopyCost may return -1 to indicate "extremely high cost".
IIUC that actually would result in subtracting 10 instead of adding a huge penalty here.

(Context is that I'm debugging some downstream problems, as we've been seeing a number of "ran out of registers" failures after merging this PR. Haven't gotten that far in the debugging yet. But the question above popped up.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a hack when I changed all three getCopyCost() calls added here to basically do this (pseudo code),
getCopyCost() < 0 ? 100 : getCopyCost()
and then at least the "ran out of registers" fault I was looking at disappeared (and I got the same result as before this patch).

Our target has some quad-registers that we define with CopyCost=-1. So I figure that is what caused the problem here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe -1 was supposed to mean impossible cost, not very high

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this should probably be an unsigned value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we should explicitly use saturating arithmetic here. Either way, probably worth a revert and reapply with a fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what do we do here? CopyCost=-1 is used in several places.

@preames suggested a revert and reapply with fix. Will you do that @arsenm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards no and just rejecting using -1 as a CopyCost. I haven't found anywhere that does anything with that as an impossible value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the in-tree cases using -1 copy costs appear to be cargo-cult application to non-allocatable physical registers. The only place making use of the negative cost check is in InstrEmitter, which is probably better served by a non-allocatable check. I actually see a number of test improvements when I swap this out for the direct allocatability check

kraj pushed a commit to kraj/llvm-project that referenced this pull request Oct 3, 2025
Tolerate setting negative values in tablegen, and store them as a
saturated uint8_t value. This will allow naive uses of the copy cost
to directly add it as a cost without considering the degenerate negative
case. The degenerate negative cases are only used in InstrEmitter / DAG
scheduling, so leave the special case processing there. There are also
fixmes about this system already there.

This is the expedient fix for an out of tree target regression
after llvm#160084. Currently targets can set a negative copy cost to mark
copies as "impossible". However essentially all the in-tree uses only
uses this for non-allocatable condition registers. We probably should
replace the InstrEmitter/DAG scheduler uses with a more direct check
for a copyable register but that has test changes.
arsenm added a commit that referenced this pull request Oct 3, 2025
…1786)

Tolerate setting negative values in tablegen, and store them as a
saturated uint8_t value. This will allow naive uses of the copy cost
to directly add it as a cost without considering the degenerate negative
case. The degenerate negative cases are only used in InstrEmitter / DAG
scheduling, so leave the special case processing there. There are also
fixmes about this system already there.

This is the expedient fix for an out of tree target regression
after #160084. Currently targets can set a negative copy cost to mark
copies as "impossible". However essentially all the in-tree uses only
uses this for non-allocatable condition registers. We probably should
replace the InstrEmitter/DAG scheduler uses with a more direct check
for a copyable register but that has test changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants