Skip to content

[AMDGPU]: Do not provide AGPR for AV if !mayNeedAGPRs #151063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jrbyrnes
Copy link
Contributor

@jrbyrnes jrbyrnes commented Jul 29, 2025

This is aligned with the idea that we should not be using AGPRs unless absolutely necessary.

For the unified register file case, instead of providing an AllocationOrder with 50/50 split of VGPR and AGPR for %av registers, provide a 100/0 split if !mayNeedAGPRs and we are not in the 1 waves-per-eu case.

jrbyrnes added 2 commits July 28, 2025 17:08
Change-Id: I4f338577d5506fa74232e13746e40f50a1e83a9d
Change-Id: Ia7616b2054050ac092186c5a7c1e7fd1c81d1a77
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2025

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

This is aligned with the idea that we should not be using AGPRs unless absolutely necessary.

For the unified register file case, instead of providing an AllocationOrder with 50/50 split of VGPR and AGPR for %av registers, provide a 100/0 split if !mayNeedAGPRs


Patch is 28.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151063.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterClassInfo.cpp (+1)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+6-2)
  • (modified) llvm/test/CodeGen/AMDGPU/large-avgpr-assign-last.mir (+39-26)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll (+92-92)
diff --git a/llvm/lib/CodeGen/RegisterClassInfo.cpp b/llvm/lib/CodeGen/RegisterClassInfo.cpp
index 8ead83302c337..921a732de1021 100644
--- a/llvm/lib/CodeGen/RegisterClassInfo.cpp
+++ b/llvm/lib/CodeGen/RegisterClassInfo.cpp
@@ -154,6 +154,7 @@ void RegisterClassInfo::compute(const TargetRegisterClass *RC) const {
     // Remove reserved registers from the allocation order.
     if (Reserved.test(PhysReg))
       continue;
+
     uint8_t Cost = RegCosts[PhysReg];
     MinCost = std::min(MinCost, Cost);
 
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 84cfa878276fd..197fa1094156c 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -575,6 +575,7 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(
 std::pair<unsigned, unsigned>
 SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
   const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);
+  const unsigned MinWavesPerEU = ST.getWavesPerEU(MF.getFunction()).first;
 
   unsigned MaxNumVGPRs = MaxVectorRegs;
   unsigned MaxNumAGPRs = 0;
@@ -588,6 +589,7 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
   // TODO: it shall be possible to estimate maximum AGPR/VGPR pressure and split
   //       register file accordingly.
   if (ST.hasGFX90AInsts()) {
+    const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
     unsigned MinNumAGPRs = 0;
     const unsigned TotalNumAGPRs = AMDGPU::AGPR_32RegClass.getNumRegs();
     const unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
@@ -604,7 +606,8 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
 
     if (MinNumAGPRs == DefaultNumAGPR.first) {
       // Default to splitting half the registers if AGPRs are required.
-      MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
+      MinNumAGPRs = MaxNumAGPRs =
+          (MFI->mayNeedAGPRs() || MinWavesPerEU == 1) ? MaxVectorRegs / 2 : 0;
     } else {
       // Align to accum_offset's allocation granularity.
       MinNumAGPRs = alignTo(MinNumAGPRs, 4);
@@ -756,7 +759,8 @@ BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   }
 
   // Reserve all the AGPRs if there are no instructions to use it.
-  if (!ST.hasMAIInsts())
+  if (!ST.hasMAIInsts() || (ST.hasGFX90AInsts() && !MFI->mayNeedAGPRs() &&
+                            ST.getWavesPerEU(MF.getFunction()).first > 1))
     MaxNumAGPRs = 0;
   for (const TargetRegisterClass *RC : regclasses()) {
     if (RC->isBaseClass() && isAGPRClass(RC)) {
diff --git a/llvm/test/CodeGen/AMDGPU/large-avgpr-assign-last.mir b/llvm/test/CodeGen/AMDGPU/large-avgpr-assign-last.mir
index 58e9b0a1aedd4..db0bf35b06069 100644
--- a/llvm/test/CodeGen/AMDGPU/large-avgpr-assign-last.mir
+++ b/llvm/test/CodeGen/AMDGPU/large-avgpr-assign-last.mir
@@ -1,18 +1,19 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
-# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 -verify-regalloc -greedy-regclass-priority-trumps-globalness=1 -start-after=machine-scheduler -stop-after=virtregrewriter,2 -o - %s | FileCheck %s
+
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx950 -verify-regalloc -amdgpu-mfma-vgpr-form=1 --greedy-regclass-priority-trumps-globalness=1  -start-after=machine-scheduler -stop-after=virtregrewriter,2 -o - %s | FileCheck %s
 
 --- |
-  define void @temp_vgpr_to_agpr_should_not_undo_split_with_remat() #0 {
+  define void @av_allocation_order() #0 {
   entry:
     unreachable
   }
 
-  attributes #0 = { "amdgpu-agpr-alloc"="0,0" }
+  attributes #0 = {"amdgpu-waves-per-eu"="8,8"}
 ...
 
 
 ---
-name:            temp_vgpr_to_agpr_should_not_undo_split_with_remat
+name:            av_allocation_order
 tracksRegLiveness: true
 machineFunctionInfo:
   isEntryFunction: true
@@ -28,24 +29,12 @@ machineFunctionInfo:
 body:             |
   bb.0:
    liveins: $vgpr0, $sgpr4_sgpr5
-    ; CHECK-LABEL: name: temp_vgpr_to_agpr_should_not_undo_split_with_remat
+    ; CHECK-LABEL: name: av_allocation_order
     ; CHECK: liveins: $vgpr0, $sgpr4_sgpr5
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: renamable $vgpr0 = IMPLICIT_DEF
-    ; CHECK-NEXT: dead renamable $vgpr1 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr1 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr2 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr3 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr4 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr5 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr6 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr7 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr8 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr9 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr10 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr11 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr12 = IMPLICIT_DEF
-    ; CHECK-NEXT: renamable $vgpr13 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr51 = IMPLICIT_DEF
+    ; CHECK-NEXT: dead renamable $vgpr0 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr52 = IMPLICIT_DEF
     ; CHECK-NEXT: renamable $vgpr14 = IMPLICIT_DEF
     ; CHECK-NEXT: renamable $vgpr15 = IMPLICIT_DEF
     ; CHECK-NEXT: renamable $vgpr16 = IMPLICIT_DEF
@@ -55,11 +44,29 @@ body:             |
     ; CHECK-NEXT: renamable $vgpr20 = IMPLICIT_DEF
     ; CHECK-NEXT: renamable $vgpr21 = IMPLICIT_DEF
     ; CHECK-NEXT: renamable $vgpr22 = IMPLICIT_DEF
-    ; CHECK-NEXT: KILL killed renamable $vgpr2, killed renamable $vgpr3, killed renamable $vgpr4, killed renamable $vgpr5, killed renamable $vgpr6, killed renamable $vgpr7, killed renamable $vgpr8, killed renamable $vgpr9, killed renamable $vgpr10, killed renamable $vgpr11, killed renamable $vgpr12, killed renamable $vgpr13, killed renamable $vgpr14, killed renamable $vgpr15, killed renamable $vgpr16
-    ; CHECK-NEXT: S_NOP 0, implicit-def renamable $vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38
-    ; CHECK-NEXT: S_NOP 0, implicit-def renamable $vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47_vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54
-    ; CHECK-NEXT: KILL killed renamable $vgpr0, killed renamable $vgpr1, killed renamable $vgpr17, killed renamable $vgpr18, killed renamable $vgpr19, killed renamable $vgpr20, killed renamable $vgpr21, killed renamable $vgpr22
-    ; CHECK-NEXT: S_NOP 0, implicit killed renamable $vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38, implicit killed renamable $vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47_vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54
+    ; CHECK-NEXT: renamable $vgpr23 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr24 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr25 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr26 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr27 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr28 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr29 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr30 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr31 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr32 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr33 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr34 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr0_vgpr1 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr2_vgpr3 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr4_vgpr5 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr6_vgpr7 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr8_vgpr9 = IMPLICIT_DEF
+    ; CHECK-NEXT: renamable $vgpr10_vgpr11 = IMPLICIT_DEF
+    ; CHECK-NEXT: KILL killed renamable $vgpr14, killed renamable $vgpr15, killed renamable $vgpr16, killed renamable $vgpr17, killed renamable $vgpr18, killed renamable $vgpr19, killed renamable $vgpr20, killed renamable $vgpr21, killed renamable $vgpr22, killed renamable $vgpr23, killed renamable $vgpr24, killed renamable $vgpr25, killed renamable $vgpr26, killed renamable $vgpr27, killed renamable $vgpr28
+    ; CHECK-NEXT: S_NOP 0, implicit-def renamable $vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47_vgpr48_vgpr49_vgpr50
+    ; CHECK-NEXT: S_NOP 0, implicit-def renamable $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27
+    ; CHECK-NEXT: KILL killed renamable $vgpr51, killed renamable $vgpr52, killed renamable $vgpr29, killed renamable $vgpr30, killed renamable $vgpr31, killed renamable $vgpr32, killed renamable $vgpr33, killed renamable $vgpr34, killed renamable $vgpr0_vgpr1, killed renamable $vgpr2_vgpr3, killed renamable $vgpr4_vgpr5, killed renamable $vgpr6_vgpr7, killed renamable $vgpr8_vgpr9, killed renamable $vgpr10_vgpr11
+    ; CHECK-NEXT: S_NOP 0, implicit killed renamable $vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47_vgpr48_vgpr49_vgpr50, implicit killed renamable $vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27
     ; CHECK-NEXT: S_ENDPGM 0
     %1:vgpr_32 = IMPLICIT_DEF
     %2:vgpr_32 = IMPLICIT_DEF
@@ -85,10 +92,16 @@ body:             |
     %21:vgpr_32 = IMPLICIT_DEF
     %22:vgpr_32 = IMPLICIT_DEF
     %23:vgpr_32 = IMPLICIT_DEF
+    %24:vreg_64 = IMPLICIT_DEF
+    %25:vreg_64 = IMPLICIT_DEF
+    %26:vreg_64 = IMPLICIT_DEF
+    %27:vreg_64 = IMPLICIT_DEF
+    %28:vreg_64 = IMPLICIT_DEF
+    %29:vreg_64 = IMPLICIT_DEF
     KILL %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16, %17
     S_NOP 0, implicit-def %50:av_512
     S_NOP 0, implicit-def %51:av_512
-    KILL %1, %2, %18, %19, %20, %21, %22, %23
+    KILL %1, %2, %18, %19, %20, %21, %22, %23, %24, %25, %26, %27, %28, %29
     S_NOP 0, implicit %50, implicit %51
     S_ENDPGM 0
 ...
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll
index 21465beb21de7..d4a163472f48d 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll
@@ -1687,7 +1687,7 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_f16__vgprcd(<8 x half> %arg0,
 ; VGPRRC-NEXT:    s_load_dwordx8 s[24:31], s[4:5], 0x24
 ; VGPRRC-NEXT:    s_load_dwordx16 s[8:23], s[4:5], 0x64
 ; VGPRRC-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0xa4
-; VGPRRC-NEXT:    v_mov_b32_e32 v40, 0
+; VGPRRC-NEXT:    v_mov_b32_e32 v44, 0
 ; VGPRRC-NEXT:    s_waitcnt lgkmcnt(0)
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[34:35], s[26:27]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[32:33], s[24:25]
@@ -1701,41 +1701,41 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_f16__vgprcd(<8 x half> %arg0,
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[20:21], s[12:13]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[18:19], s[10:11]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[16:17], s[8:9]
-; VGPRRC-NEXT:    v_mov_b32_e32 v42, s20
-; VGPRRC-NEXT:    v_mov_b32_e32 v43, s21
+; VGPRRC-NEXT:    v_mov_b32_e32 v40, s20
+; VGPRRC-NEXT:    v_mov_b32_e32 v41, s21
 ; VGPRRC-NEXT:    v_mfma_f32_32x32x16_f16 v[0:15], v[32:35], v[36:39], v[16:31]
-; VGPRRC-NEXT:    v_mov_b32_e32 v44, s22
-; VGPRRC-NEXT:    v_mov_b32_e32 v45, s23
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[42:45], s[0:1] offset:48 sc0 sc1
+; VGPRRC-NEXT:    v_mov_b32_e32 v42, s22
+; VGPRRC-NEXT:    v_mov_b32_e32 v43, s23
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[40:43], s[0:1] offset:48 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 2
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s16
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s17
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s18
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s19
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] offset:32 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] offset:32 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 0
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s12
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s13
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s14
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s15
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] offset:16 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] offset:16 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 0
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s8
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s9
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s10
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s11
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[8:11], s[0:1] offset:32 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[8:11], s[0:1] offset:32 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[12:15], s[0:1] offset:48 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[12:15], s[0:1] offset:48 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[0:3], s[0:1] sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[0:3], s[0:1] sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[4:7], s[0:1] offset:16 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[4:7], s[0:1] offset:16 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_endpgm
 ; AGPR-LABEL: test_mfma_f32_32x32x16_f16__vgprcd:
@@ -2051,7 +2051,7 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_f16__vgprcd__flags(<8 x half>
 ; VGPRRC-NEXT:    s_load_dwordx8 s[24:31], s[4:5], 0x24
 ; VGPRRC-NEXT:    s_load_dwordx16 s[8:23], s[4:5], 0x64
 ; VGPRRC-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0xa4
-; VGPRRC-NEXT:    v_mov_b32_e32 v40, 0
+; VGPRRC-NEXT:    v_mov_b32_e32 v44, 0
 ; VGPRRC-NEXT:    s_waitcnt lgkmcnt(0)
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[34:35], s[26:27]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[32:33], s[24:25]
@@ -2065,41 +2065,41 @@ define amdgpu_kernel void @test_mfma_f32_32x32x16_f16__vgprcd__flags(<8 x half>
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[20:21], s[12:13]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[18:19], s[10:11]
 ; VGPRRC-NEXT:    v_mov_b64_e32 v[16:17], s[8:9]
-; VGPRRC-NEXT:    v_mov_b32_e32 v42, s20
-; VGPRRC-NEXT:    v_mov_b32_e32 v43, s21
+; VGPRRC-NEXT:    v_mov_b32_e32 v40, s20
+; VGPRRC-NEXT:    v_mov_b32_e32 v41, s21
 ; VGPRRC-NEXT:    v_mfma_f32_32x32x16_f16 v[0:15], v[32:35], v[36:39], v[16:31] cbsz:1 abid:2 blgp:3
-; VGPRRC-NEXT:    v_mov_b32_e32 v44, s22
-; VGPRRC-NEXT:    v_mov_b32_e32 v45, s23
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[42:45], s[0:1] offset:48 sc0 sc1
+; VGPRRC-NEXT:    v_mov_b32_e32 v42, s22
+; VGPRRC-NEXT:    v_mov_b32_e32 v43, s23
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[40:43], s[0:1] offset:48 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 2
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s16
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s17
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s18
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s19
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] offset:32 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] offset:32 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 0
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s12
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s13
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s14
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s15
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] offset:16 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] offset:16 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_nop 0
 ; VGPRRC-NEXT:    v_mov_b32_e32 v16, s8
 ; VGPRRC-NEXT:    v_mov_b32_e32 v17, s9
 ; VGPRRC-NEXT:    v_mov_b32_e32 v18, s10
 ; VGPRRC-NEXT:    v_mov_b32_e32 v19, s11
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[16:19], s[0:1] sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[16:19], s[0:1] sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[8:11], s[0:1] offset:32 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[8:11], s[0:1] offset:32 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[12:15], s[0:1] offset:48 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[12:15], s[0:1] offset:48 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[0:3], s[0:1] sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[0:3], s[0:1] sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
-; VGPRRC-NEXT:    global_store_dwordx4 v40, v[4:7], s[0:1] offset:16 sc0 sc1
+; VGPRRC-NEXT:    global_store_dwordx4 v44, v[4:7], s[0:1] offset:16 sc0 sc1
 ; VGPRRC-NEXT:    s_waitcnt vmcnt(0)
 ; VGPRRC-NEXT:    s_endpgm
 ; AGPR-LABEL: test_mfma_f32_32x32x16_f16__vgprcd__flags:
@@ -2852,24 +2852,24 @@ define amdgpu_kernel void @test_mfma_i32_16x16x64_i8_no_agpr__vgprcd(ptr addrspa
 ; VGPRRC-NEXT:    s_load_dwordx8 s[8:15], s[4:5], 0x34
 ; VGPRRC-NEXT:    s_load_dwordx2 s[6:7], s[4:5], 0x24
 ; VGPRRC-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x54
-; VGPRRC-NEXT:    v_mov_b32_e32 v4, 0
+; VGPRRC-NEXT:    v_mov_b32_e32 v12, 0
 ; VGPRRC-NEXT:    s_waitcnt lgkmcnt(0)
-; VGPRRC-NEXT:    v_mov_b32_e32 v6, s8
-; VGPRRC-NEXT:    v_mov_b32_e32 v7, s9
-; VGPRRC-NEXT:    v_mov_b32_e32 v8, s10
-; VGPRRC-NEXT:    v_mov_b32_e32 v9, s11
-; VGPRRC-NEXT:    v_mov_b32_e32 v10, s12
-; VGPRRC-NEXT:    v_mov_b32_e32 v11, s13
-; VGPRRC-NEXT:    v_mov_b32_e32 v12, s14
-; VGPRRC-NEXT:    v_mov_b32_e32 v13, s15
-; VGPRRC-NEXT:    v_mov_b32_e32 v0, s0
-; VGPRRC-NEXT:    v_mov_b32_e32 v1, s1
-; VGPRRC-NEXT:    v_mov_b32_e32 v2, s2
-; VGPRRC-NEXT:    v_mov_b32_e32 v3, s3
+; VGPRRC-NEXT:    v_mov_b32_e32 v0, s8
+; VGPRRC-NEXT:    v_mov_b32_e32 v1, s9
+; VGPRRC-NEXT:    v_mov_b32_e32 v2, s10
+; VGPRRC-NEXT:    v_mov_b32_e32 v3, s11
+; VGPRRC-NEXT:    v_mov_b32_e32 v4, s12
+; VGPRRC-NEXT:    v_mov_b32_e32 v5, s13
+; VGPRRC-NEXT:    v_mov_b32_e32 v6, s14
+; VGPRRC-NEXT:    v_mov_b32_e32 v7, s15
+; VGPRRC-NEXT:    v_mov_b32_e32 v8, s0
+; VGPRRC-NEXT:    v_mov_b32_e32 v9, s1
+; VGPRRC-NEXT:    v_mov_b32_e32 v10, s2
+; VGPRRC-NEXT:    v_mov_b32_e32 v11, s3
 ; VGPRRC-NEXT:    s_nop 1
-; VGPRRC-NEXT:    v_mfma_i32_16x16x64_i8 v[0:3], v[6:9], v[10:13], v[0:3]
+; VGPRRC-NEXT:    v_mfma_i32_16x16x64_i8 v[0:3], v[0:3], v[4:7], v[8:11]
 ; VGPRRC-NEXT:    s_nop 7
-; VGPRRC-NEXT:    global_store_dwordx4 v4, v[0:3], s[6:7]
+; VGPRRC-NEXT:    global_store_dwordx4 v12, v[0:3], s[6:7]
 ; VGPRRC-NEXT:    s_endpgm
 ; AGPR-LABEL: test_mfma_i32_16x16x64_i8_no_agpr__vgprcd:
 ; AGPR:       ; %bb.0:
@@ -3001,24 +3001,24 @@ define amdgpu_kernel void @test_mfma_i32_16x16x64_i8_no_agpr__vgprcd__flags(ptr
 ; VGPRRC-NEXT:    s_load_dwordx8 s[8:15], s[4:5], 0x34
 ; VGPRRC-NEXT:    s_load_dwordx2 s[6:7], s[4:5], 0x24
 ; VGPRRC-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x54
-; VGPRRC-NEXT:    v_mov_b32_e32 v4, 0
+; VGPRRC-NEXT:    v_mov_b32_e32 v12, 0
 ; VGPRRC-NEXT:    s_waitcnt lgkmcnt(0)
-; VGPRRC-NEXT:    v_mov_b32_e32 v6, s8
-; VGPRRC-NEXT:    v_mov_b32_e32 v7, s9
-; VGPRRC-NEXT:    v_mov_b32_e32 v8, s10
-; VGPRRC-NEXT:    v_mov_b32_e32 v9, s11
-; VGPRRC-NEXT:    v_mov_b32_e32 v10, s12
-; VGPRRC-NEXT:    v_mov_b32_e32 v11, s13
-; VGPRRC-NEXT:    v_mov_b32_e32 v12, s14
-; VGPRRC-NEXT:    v_mov_b32_e32 v13, s15
-; VGPRRC-NEXT:    v_mov_b32_e32 v0, s0
-; VGPRRC-NEXT:    v_mov_b32_e32 v1, s1
-; VGPRRC-NEXT:    v_mov_b32_e32 v2, s2
-; VGPRRC-NEXT:    v_mov_b32_e32 v3, s3
+; VGPRRC-NEXT:    v_mov_b32_e32 v0, s8
+; VGPRRC-NEXT:    v_mov_b32_e32 v1, s9
+; VGPRRC-NEXT:    v_mov_b32_e32 v2, s10
+; VGPRRC-NEXT:    v_mov_b32_e32 v3, s11
+; VGPRRC-NEXT:    v_mov_b32_e32 v4, s12
+; VGPRRC-NEXT:    v_mov_b32_e32 v5, s13
+; VGPRRC-NEXT:    v_mov_b32_e32 v6, s14
+; VGPRRC-NEXT:    v_mov_b32_e32 v7, s15
+; VGPRRC-NEXT:    v_mov_b32_e32 v8, s0
+; VGPRRC-NEXT:    v_mov_b32_e32 v9, s1
+; VGPRRC-NEXT:    v_mov_b32_e32 v10, s2
+; VGPRRC-NEXT:    v_mov_b32_e32 v11, s3
 ; VGPRRC-NEXT:    s_nop 1
-; VGPRRC-NEXT:    v_mfma_i32_16x16x64_i8 v[0:3], v[6:9], v[10:13], v[0:3] cbsz:3 abid:2 blgp:1
+; VGPRRC-NEXT:    v_mfma_i32_16x16x64_i8 v[0:3], v[0:3], v[4:7], v[8:11] cbsz:3 abid:2 blgp:1
 ; VGPRRC-NEXT:    s_nop 7
-; VGPRRC-NEXT:    global_store_dwordx4 v4, v[0:3], s[6:7]
+; VGPRRC-NEXT:    global_store_dwo...
[truncated]

@@ -604,7 +606,8 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {

if (MinNumAGPRs == DefaultNumAGPR.first) {
// Default to splitting half the registers if AGPRs are required.
MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
MinNumAGPRs = MaxNumAGPRs =
(MFI->mayNeedAGPRs() || MinWavesPerEU == 1) ? MaxVectorRegs / 2 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to compute mayNeedAGPRs, it cannot be used here. I just moved this function to no longer depend on the MF

@@ -154,6 +154,7 @@ void RegisterClassInfo::compute(const TargetRegisterClass *RC) const {
// Remove reserved registers from the allocation order.
if (Reserved.test(PhysReg))
continue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated whitespace change

@@ -756,7 +759,8 @@ BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
}

// Reserve all the AGPRs if there are no instructions to use it.
if (!ST.hasMAIInsts())
if (!ST.hasMAIInsts() || (ST.hasGFX90AInsts() && !MFI->mayNeedAGPRs() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be implicit based on the register count and not need explicit handling

entry:
unreachable
}

attributes #0 = { "amdgpu-agpr-alloc"="0,0" }
attributes #0 = {"amdgpu-waves-per-eu"="8,8"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this patch is working around MIR artifacts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants