[AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 #72129

JanekvO · 2023-11-13T16:03:48Z

Local functions will be ignored for (codegen) CGSCC order passes. However, they may still be referenced. This patch allows such local functions to exist in AMDGPUResourceUsageAnalysis and bypasses referencing the local function's MachineFunction in AMDGPUAsmPrinter's code emit by reinserting the CGSCC pass manager through DummyCGSCCPass.

Fixes #64863

…ctions at O0. Local functions will be ignored for (codegen) CGSCC order passes. However, they may still be referenced. This patch allows such local functions to exist in AMDGPUResourceUsageAnalysis and bypasses referencing the local function's MachineFunction in AMDGPUAsmPrinter's code emit by reinserting the CGSCC pass manager through DummyCGSCCPass.

llvmbot · 2023-11-13T16:04:13Z

@llvm/pr-subscribers-backend-amdgpu

Author: Janek van Oirschot (JanekvO)

Changes

Local functions will be ignored for (codegen) CGSCC order passes. However, they may still be referenced. This patch allows such local functions to exist in AMDGPUResourceUsageAnalysis and bypasses referencing the local function's MachineFunction in AMDGPUAsmPrinter's code emit by reinserting the CGSCC pass manager through DummyCGSCCPass.

Full diff: https://github.com/llvm/llvm-project/pull/72129.diff

8 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (+8)
(modified) llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp (+10-3)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/ipra.ll (+7-7)
(modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+40-25)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll (+13-13)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-49)
(modified) llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index eb30f31af6d6b68..67cc858a182c1e8 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1238,6 +1238,14 @@ bool AMDGPUAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,
 void AMDGPUAsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequired<AMDGPUResourceUsageAnalysis>();
   AU.addPreserved<AMDGPUResourceUsageAnalysis>();
+
+  // The Dummy pass is necessary because AMDGPUResourceUsageAnalysis will pop
+  // the CGSCC pass manager off of the active pass managers stack. Adding the
+  // Dummy pass will re-insert the CGSCC pass manager into said stack again
+  // through CallGraphSCCPass::assignPassManager.
+  AU.addRequired<DummyCGSCCPass>();
+  AU.addPreserved<DummyCGSCCPass>();
+
   AsmPrinter::getAnalysisUsage(AU);
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
index db5d2bbcf5bbc71..25bdeff7ff6742c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
@@ -151,9 +151,16 @@ bool AMDGPUResourceUsageAnalysis::runOnModule(Module &M) {
 
     SIFunctionResourceInfo &Info = CI.first->second;
     MachineFunction *MF = MMI.getMachineFunction(*F);
-    assert(MF && "function must have been generated already");
-    Info = analyzeResourceUsage(*MF, TM);
-    HasIndirectCall |= Info.HasIndirectCall;
+    // We can only analyze resource usage of functions for which there exists a
+    // machinefunction equivalent. These may not exist as the (codegen) passes
+    // prior to this one are run in CGSCC order which will bypass any local
+    // functions that aren't called.
+    assert((MF || !TPC->requiresCodeGenSCCOrder()) &&
+            "function must have been generated already");
+    if (MF) {
+      Info = analyzeResourceUsage(*MF, TM);
+      HasIndirectCall |= Info.HasIndirectCall;
+    }
   }
 
   if (HasIndirectCall)
diff --git a/llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll b/llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll
index 220ea962b9e1dca..83b02e5b47f0e7e 100644
--- a/llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll
+++ b/llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll
@@ -3,6 +3,18 @@
 
 declare i32 @llvm.amdgcn.workitem.id.x()
 
+define <2 x i64> @f1() #0 {
+; GFX11-LABEL: f1:
+; GFX11:       ; %bb.0:
+; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    v_mov_b32_e32 v0, 0
+; GFX11-NEXT:    v_mov_b32_e32 v1, 0
+; GFX11-NEXT:    v_mov_b32_e32 v2, 0
+; GFX11-NEXT:    v_mov_b32_e32 v3, 0
+; GFX11-NEXT:    s_setpc_b64 s[30:31]
+  ret <2 x i64> zeroinitializer
+}
+
 define void @f0() {
 ; GFX11-LABEL: f0:
 ; GFX11:       ; %bb.0: ; %bb
@@ -36,18 +48,6 @@ bb:
   ret void
 }
 
-define <2 x i64> @f1() #0 {
-; GFX11-LABEL: f1:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_mov_b32_e32 v0, 0
-; GFX11-NEXT:    v_mov_b32_e32 v1, 0
-; GFX11-NEXT:    v_mov_b32_e32 v2, 0
-; GFX11-NEXT:    v_mov_b32_e32 v3, 0
-; GFX11-NEXT:    s_setpc_b64 s[30:31]
-  ret <2 x i64> zeroinitializer
-}
-
 ; FIXME: This generates "instid1(/* invalid instid value */)".
 define amdgpu_kernel void @f2(i32 %arg, i32 %arg1, i32 %arg2, i1 %arg3, i32 %arg4, i1 %arg5, ptr %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10, i1 %arg11) {
 ; GFX11-LABEL: f2:
diff --git a/llvm/test/CodeGen/AMDGPU/ipra.ll b/llvm/test/CodeGen/AMDGPU/ipra.ll
index 07693371278701c..766d8352b26f195 100644
--- a/llvm/test/CodeGen/AMDGPU/ipra.ll
+++ b/llvm/test/CodeGen/AMDGPU/ipra.ll
@@ -105,13 +105,6 @@ define void @test_funcx2() #0 {
   ret void
 }
 
-; GCN-LABEL: {{^}}wombat:
-define weak amdgpu_kernel void @wombat(ptr %arg, ptr %arg2) {
-bb:
-  call void @hoge() #0
-  ret void
-}
-
 ; Make sure we save/restore the return address around the call.
 ; Function Attrs: norecurse
 define internal void @hoge() #2 {
@@ -128,6 +121,13 @@ bb:
   ret void
 }
 
+; GCN-LABEL: {{^}}wombat:
+define weak amdgpu_kernel void @wombat(ptr %arg, ptr %arg2) {
+bb:
+  call void @hoge() #0
+  ret void
+}
+
 declare dso_local void @eggs()
 
 
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index 7abb789019f1f0c..55242bf353b82a2 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -142,11 +142,14 @@
 ; GCN-O0-NEXT:        Machine Optimization Remark Emitter
 ; GCN-O0-NEXT:        Stack Frame Layout Analysis
 ; GCN-O0-NEXT:    Function register usage analysis
-; GCN-O0-NEXT:    FunctionPass Manager
-; GCN-O0-NEXT:      Lazy Machine Block Frequency Analysis
-; GCN-O0-NEXT:      Machine Optimization Remark Emitter
-; GCN-O0-NEXT:      AMDGPU Assembly Printer
-; GCN-O0-NEXT:      Free MachineFunction
+; GCN-O0-NEXT:    CallGraph Construction
+; GCN-O0-NEXT:    Call Graph SCC Pass Manager
+; GCN-O0-NEXT:      DummyCGSCCPass
+; GCN-O0-NEXT:      FunctionPass Manager
+; GCN-O0-NEXT:        Lazy Machine Block Frequency Analysis
+; GCN-O0-NEXT:        Machine Optimization Remark Emitter
+; GCN-O0-NEXT:        AMDGPU Assembly Printer
+; GCN-O0-NEXT:        Free MachineFunction
 
 ; GCN-O1:Target Library Information
 ; GCN-O1-NEXT:Target Pass Configuration
@@ -409,11 +412,14 @@
 ; GCN-O1-NEXT:        Machine Optimization Remark Emitter
 ; GCN-O1-NEXT:        Stack Frame Layout Analysis
 ; GCN-O1-NEXT:    Function register usage analysis
-; GCN-O1-NEXT:    FunctionPass Manager
-; GCN-O1-NEXT:      Lazy Machine Block Frequency Analysis
-; GCN-O1-NEXT:      Machine Optimization Remark Emitter
-; GCN-O1-NEXT:      AMDGPU Assembly Printer
-; GCN-O1-NEXT:      Free MachineFunction
+; GCN-O1-NEXT:    CallGraph Construction
+; GCN-O1-NEXT:    Call Graph SCC Pass Manager
+; GCN-O1-NEXT:      DummyCGSCCPass
+; GCN-O1-NEXT:      FunctionPass Manager
+; GCN-O1-NEXT:        Lazy Machine Block Frequency Analysis
+; GCN-O1-NEXT:        Machine Optimization Remark Emitter
+; GCN-O1-NEXT:        AMDGPU Assembly Printer
+; GCN-O1-NEXT:        Free MachineFunction
 
 ; GCN-O1-OPTS:Target Library Information
 ; GCN-O1-OPTS-NEXT:Target Pass Configuration
@@ -698,11 +704,14 @@
 ; GCN-O1-OPTS-NEXT:        Machine Optimization Remark Emitter
 ; GCN-O1-OPTS-NEXT:        Stack Frame Layout Analysis
 ; GCN-O1-OPTS-NEXT:    Function register usage analysis
-; GCN-O1-OPTS-NEXT:    FunctionPass Manager
-; GCN-O1-OPTS-NEXT:      Lazy Machine Block Frequency Analysis
-; GCN-O1-OPTS-NEXT:      Machine Optimization Remark Emitter
-; GCN-O1-OPTS-NEXT:      AMDGPU Assembly Printer
-; GCN-O1-OPTS-NEXT:      Free MachineFunction
+; GCN-O1-OPTS-NEXT:    CallGraph Construction
+; GCN-O1-OPTS-NEXT:    Call Graph SCC Pass Manager
+; GCN-O1-OPTS-NEXT:      DummyCGSCCPass
+; GCN-O1-OPTS-NEXT:      FunctionPass Manager
+; GCN-O1-OPTS-NEXT:        Lazy Machine Block Frequency Analysis
+; GCN-O1-OPTS-NEXT:        Machine Optimization Remark Emitter
+; GCN-O1-OPTS-NEXT:        AMDGPU Assembly Printer
+; GCN-O1-OPTS-NEXT:        Free MachineFunction
 
 ; GCN-O2:Target Library Information
 ; GCN-O2-NEXT:Target Pass Configuration
@@ -999,11 +1008,14 @@
 ; GCN-O2-NEXT:        Machine Optimization Remark Emitter
 ; GCN-O2-NEXT:        Stack Frame Layout Analysis
 ; GCN-O2-NEXT:    Function register usage analysis
-; GCN-O2-NEXT:    FunctionPass Manager
-; GCN-O2-NEXT:      Lazy Machine Block Frequency Analysis
-; GCN-O2-NEXT:      Machine Optimization Remark Emitter
-; GCN-O2-NEXT:      AMDGPU Assembly Printer
-; GCN-O2-NEXT:      Free MachineFunction
+; GCN-O2-NEXT:    CallGraph Construction
+; GCN-O2-NEXT:    Call Graph SCC Pass Manager
+; GCN-O2-NEXT:      DummyCGSCCPass
+; GCN-O2-NEXT:      FunctionPass Manager
+; GCN-O2-NEXT:        Lazy Machine Block Frequency Analysis
+; GCN-O2-NEXT:        Machine Optimization Remark Emitter
+; GCN-O2-NEXT:        AMDGPU Assembly Printer
+; GCN-O2-NEXT:        Free MachineFunction
 
 ; GCN-O3:Target Library Information
 ; GCN-O3-NEXT:Target Pass Configuration
@@ -1312,11 +1324,14 @@
 ; GCN-O3-NEXT:        Machine Optimization Remark Emitter
 ; GCN-O3-NEXT:        Stack Frame Layout Analysis
 ; GCN-O3-NEXT:    Function register usage analysis
-; GCN-O3-NEXT:    FunctionPass Manager
-; GCN-O3-NEXT:      Lazy Machine Block Frequency Analysis
-; GCN-O3-NEXT:      Machine Optimization Remark Emitter
-; GCN-O3-NEXT:      AMDGPU Assembly Printer
-; GCN-O3-NEXT:      Free MachineFunction
+; GCN-O3-NEXT:    CallGraph Construction
+; GCN-O3-NEXT:    Call Graph SCC Pass Manager
+; GCN-O3-NEXT:      DummyCGSCCPass
+; GCN-O3-NEXT:      FunctionPass Manager
+; GCN-O3-NEXT:        Lazy Machine Block Frequency Analysis
+; GCN-O3-NEXT:        Machine Optimization Remark Emitter
+; GCN-O3-NEXT:        AMDGPU Assembly Printer
+; GCN-O3-NEXT:        Free MachineFunction
 
 define void @empty() {
   ret void
diff --git a/llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll b/llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll
index c4449756949e521..44eff0d67399014 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll
@@ -9,6 +9,19 @@
 @lds.size.1.align.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1
 @lds.size.16.align.16 = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16
 
+; GCN-LABEL: {{^}}f0:
+; GCN-DAG: v_mov_b32_e32 [[NULL:v[0-9]+]], 0
+; GCN-DAG: v_mov_b32_e32 [[TREE:v[0-9]+]], 3
+; GCN:     ds_write_b8 [[NULL]], [[TREE]]
+define void @f0() {
+; OPT-LABEL: @f0(
+; OPT-NEXT:    store i8 3, ptr addrspace(3) @llvm.amdgcn.module.lds, align 1
+; OPT-NEXT:    ret void
+;
+  store i8 3, ptr addrspace(3) @lds.size.1.align.1, align 1
+  ret void
+}
+
 ; GCN-LABEL: {{^}}k0:
 ; GCN-DAG: v_mov_b32_e32 [[NULL:v[0-9]+]], 0
 ; GCN-DAG: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
@@ -29,16 +42,3 @@ define amdgpu_kernel void @k0() {
   call void @f0()
   ret void
 }
-
-; GCN-LABEL: {{^}}f0:
-; GCN-DAG: v_mov_b32_e32 [[NULL:v[0-9]+]], 0
-; GCN-DAG: v_mov_b32_e32 [[TREE:v[0-9]+]], 3
-; GCN:     ds_write_b8 [[NULL]], [[TREE]]
-define void @f0() {
-; OPT-LABEL: @f0() {
-; OPT-NEXT:    store i8 3, ptr addrspace(3) @llvm.amdgcn.module.lds, align 1
-; OPT-NEXT:    ret void
-;
-  store i8 3, ptr addrspace(3) @lds.size.1.align.1, align 1
-  ret void
-}
diff --git a/llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll b/llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll
index e557e0ce9b1be5e..7458fccb9e9f68a 100644
--- a/llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll
+++ b/llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll
@@ -24,6 +24,54 @@ store i32 0, ptr addrspace(3) @used_by_kernel
 }
 ; CHECK: ; LDSByteSize: 4 bytes
 
+define void @nonkernel() {
+; GFX9-LABEL: nonkernel:
+; GFX9:       ; %bb.0:
+; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT:    v_mov_b32_e32 v0, 0
+; GFX9-NEXT:    v_mov_b32_e32 v1, v0
+; GFX9-NEXT:    ds_write_b32 v0, v0 offset:8
+; GFX9-NEXT:    ds_write_b64 v0, v[0:1]
+; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; GFX10-LABEL: nonkernel:
+; GFX10:       ; %bb.0:
+; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; GFX10-NEXT:    v_mov_b32_e32 v1, v0
+; GFX10-NEXT:    ds_write_b32 v0, v0 offset:8
+; GFX10-NEXT:    ds_write_b64 v0, v[0:1]
+; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX10-NEXT:    s_setpc_b64 s[30:31]
+;
+; G_GFX9-LABEL: nonkernel:
+; G_GFX9:       ; %bb.0:
+; G_GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; G_GFX9-NEXT:    v_mov_b32_e32 v2, 0
+; G_GFX9-NEXT:    v_mov_b32_e32 v3, 8
+; G_GFX9-NEXT:    v_mov_b32_e32 v0, 0
+; G_GFX9-NEXT:    v_mov_b32_e32 v1, 0
+; G_GFX9-NEXT:    ds_write_b32 v3, v2
+; G_GFX9-NEXT:    ds_write_b64 v2, v[0:1]
+; G_GFX9-NEXT:    s_waitcnt lgkmcnt(0)
+; G_GFX9-NEXT:    s_setpc_b64 s[30:31]
+;
+; G_GFX10-LABEL: nonkernel:
+; G_GFX10:       ; %bb.0:
+; G_GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; G_GFX10-NEXT:    v_mov_b32_e32 v2, 0
+; G_GFX10-NEXT:    v_mov_b32_e32 v3, 8
+; G_GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; G_GFX10-NEXT:    v_mov_b32_e32 v1, 0
+; G_GFX10-NEXT:    ds_write_b32 v3, v2
+; G_GFX10-NEXT:    ds_write_b64 v2, v[0:1]
+; G_GFX10-NEXT:    s_waitcnt lgkmcnt(0)
+; G_GFX10-NEXT:    s_setpc_b64 s[30:31]
+  store i32 0, ptr addrspace(3) @used_by_both
+  store double 0.0, ptr addrspace(3) @used_by_function
+  ret void
+}
 ; Needs to allocate both variables, store to used_by_both is at sizeof(double)
 define amdgpu_kernel void @withcall() {
 ; GFX9-LABEL: withcall:
@@ -135,52 +183,3 @@ define amdgpu_kernel void @nocall_false_sharing() {
 }
 ; CHECK: ; LDSByteSize: 4 bytes
 
-
-define void @nonkernel() {
-; GFX9-LABEL: nonkernel:
-; GFX9:       ; %bb.0:
-; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; GFX9-NEXT:    v_mov_b32_e32 v1, v0
-; GFX9-NEXT:    ds_write_b32 v0, v0 offset:8
-; GFX9-NEXT:    ds_write_b64 v0, v[0:1]
-; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-LABEL: nonkernel:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_mov_b32_e32 v0, 0
-; GFX10-NEXT:    v_mov_b32_e32 v1, v0
-; GFX10-NEXT:    ds_write_b32 v0, v0 offset:8
-; GFX10-NEXT:    ds_write_b64 v0, v[0:1]
-; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX10-NEXT:    s_setpc_b64 s[30:31]
-;
-; G_GFX9-LABEL: nonkernel:
-; G_GFX9:       ; %bb.0:
-; G_GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; G_GFX9-NEXT:    v_mov_b32_e32 v2, 0
-; G_GFX9-NEXT:    v_mov_b32_e32 v3, 8
-; G_GFX9-NEXT:    v_mov_b32_e32 v0, 0
-; G_GFX9-NEXT:    v_mov_b32_e32 v1, 0
-; G_GFX9-NEXT:    ds_write_b32 v3, v2
-; G_GFX9-NEXT:    ds_write_b64 v2, v[0:1]
-; G_GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; G_GFX9-NEXT:    s_setpc_b64 s[30:31]
-;
-; G_GFX10-LABEL: nonkernel:
-; G_GFX10:       ; %bb.0:
-; G_GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; G_GFX10-NEXT:    v_mov_b32_e32 v2, 0
-; G_GFX10-NEXT:    v_mov_b32_e32 v3, 8
-; G_GFX10-NEXT:    v_mov_b32_e32 v0, 0
-; G_GFX10-NEXT:    v_mov_b32_e32 v1, 0
-; G_GFX10-NEXT:    ds_write_b32 v3, v2
-; G_GFX10-NEXT:    ds_write_b64 v2, v[0:1]
-; G_GFX10-NEXT:    s_waitcnt lgkmcnt(0)
-; G_GFX10-NEXT:    s_setpc_b64 s[30:31]
-  store i32 0, ptr addrspace(3) @used_by_both
-  store double 0.0, ptr addrspace(3) @used_by_function
-  ret void
-}
diff --git a/llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll b/llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll
index c30089a8dd32a0f..b606f3c6209a37d 100644
--- a/llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll
+++ b/llvm/test/CodeGen/AMDGPU/resource-usage-dead-function.ll
@@ -6,7 +6,7 @@
 
 @gv.fptr0 = external hidden unnamed_addr addrspace(4) constant ptr, align 4
 
-; GCN-LABEL: unreachable:
+; GCN-NOT: unreachable:
 ; Function info:
 ; codeLenInByte = 4
 define internal fastcc void @unreachable() {

github-actions · 2023-11-13T16:05:59Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jhuber6 · 2023-11-13T17:02:02Z

Does this fix #64863? We can put it in the commit message if so to close it if this lands.

arsenm · 2023-11-14T01:23:13Z

Does this fix #64863? We can put it in the commit message if so to close it if this lands.

This is also part of #71594

I've debugged this issue a few times and I'm not sure this is sufficient to fix every case. It also produces different behavior from other targets / when CodeGenInSCCOrder is not enabled. Namely, other targets will codegen and emit the code for the dead functions and here you're not emitting anything.

The problem isn't uncalled functions per-se, but functions which do not appear in the CallGraphSCC order. Functions which are only indirectly called still should report sensible register usage, but will also not necessarily codegen in the correct order. I had a patch which I believe did fix all cases by inserting a createBarrierNoopPass, such that all codegen was forced to happen prior to the AsmPrinter but this is a heavy hammer. Ideally the AsmPrinter wouldn't have any ordering dependencies.

I think the long term solution is to move the handling of the resource computation into MC. The assembler/linker should be able to resolve some kind of custom expression built in terms of referenced symbols

arsenm · 2023-11-14T01:19:37Z

llvm/test/CodeGen/AMDGPU/uncalled-local-functions.ll

@@ -0,0 +1,11 @@
+; RUN: llc -O0 -march=amdgcn < %s | FileCheck %s
+


Also should have some tests for functions which are not dead, but forced alive with some form of address capture. We need to verify we produce sensible resource results in those cases

arsenm · 2023-11-14T04:16:35Z

Another related issue is CallGraphAnalysis fails to look through aliases, so we have similar bugs from relying on SCC order through those

JanekvO · 2023-11-14T13:59:31Z

Does this fix #64863? We can put it in the commit message if so to close it if this lands.

This is also part of #71594

I've debugged this issue a few times and I'm not sure this is sufficient to fix every case. It also produces different behavior from other targets / when CodeGenInSCCOrder is not enabled. Namely, other targets will codegen and emit the code for the dead functions and here you're not emitting anything.

I thought it'd be best to be consistent in the codegen pass pipeline with the CallGraphSCC ordering. If the CodeGenInSCCOrder is enabled for other targets (i.e., using --enable-ipra) it also emit nothing. I was hoping for there to be a target independent solution but couldn't find anything beyond trying to convince CallGraphSCC to run all SCCs.

The problem isn't uncalled functions per-se, but functions which do not appear in the CallGraphSCC order. Functions which are only indirectly called still should report sensible register usage, but will also not necessarily codegen in the correct order.

Ah, I wasn't aware of any issues with indirect called functions. I thought the order of non-local functions was still considered by CallGraphSCC, including indirect calls.

I had a patch which I believe did fix all cases by inserting a createBarrierNoopPass, such that all codegen was forced to happen prior to the AsmPrinter but this is a heavy hammer. Ideally the AsmPrinter wouldn't have any ordering dependencies.

I'm not sure at what point the BarrierNoopPass would be inserted but if put too early it'd undo the CallGraphSCC ordering and if put too late the AsmPrinter will assert on trying to emit code for a MachineFunction without basic blocks, right?

I think the long term solution is to move the handling of the resource computation into MC. The assembler/linker should be able to resolve some kind of custom expression built in terms of referenced symbols

AFAICT, that would avoid the asserts from being hit but wouldn't stop the codegen passes ignoring any functions not in the CallGraphSCC (i.e., emitting nothing for said functions).

krzysz00 · 2024-01-12T22:55:39Z

As a somewhat naive question, what would it take to turn off requiring codegen to be in SCC order? We seem to be the only target doing that. The comments on that line say something about function calls and noinline

jhuber6 · 2024-01-13T01:32:49Z

As a somewhat naive question, what would it take to turn off requiring codegen to be in SCC order? We seem to be the only target doing that. The comments on that line say something about function calls and noinline

I believe this is also the reason parallel codegen via --lto-partitions creates incorrect code, so if there were a way to avoid that it would be beneficial in other ways. I'm by no means an expert, but as far as I'm aware the SCC order is used for some resource scheduling.

JanekvO added the backend:AMDGPU label Nov 13, 2023

JanekvO requested review from jayfoad, arsenm and JonChesterfield November 13, 2023 16:03

Fix format, assert and add codegen test for uncalled local functions

14901a7

arsenm reviewed Nov 14, 2023

View reviewed changes

JanekvO and others added 2 commits November 17, 2023 16:47

Merge branch 'llvm:main' into pr-local-function-assert-fix

0ec28bc

Improve test

6b28cb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 #72129

[AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 #72129

JanekvO commented Nov 13, 2023 •

edited

llvmbot commented Nov 13, 2023

github-actions bot commented Nov 13, 2023 •

edited

jhuber6 commented Nov 13, 2023

arsenm commented Nov 14, 2023

arsenm Nov 14, 2023

arsenm commented Nov 14, 2023

JanekvO commented Nov 14, 2023

krzysz00 commented Jan 12, 2024

jhuber6 commented Jan 13, 2024

		@@ -0,0 +1,11 @@
		; RUN: llc -O0 -march=amdgcn < %s \| FileCheck %s

[AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 #72129

Are you sure you want to change the base?

[AMDGPU] Avoid hitting AMDGPUAsmPrinter related asserts for local functions at O0 #72129

Conversation

JanekvO commented Nov 13, 2023 • edited

llvmbot commented Nov 13, 2023

github-actions bot commented Nov 13, 2023 • edited

jhuber6 commented Nov 13, 2023

arsenm commented Nov 14, 2023

arsenm Nov 14, 2023

Choose a reason for hiding this comment

arsenm commented Nov 14, 2023

JanekvO commented Nov 14, 2023

krzysz00 commented Jan 12, 2024

jhuber6 commented Jan 13, 2024

JanekvO commented Nov 13, 2023 •

edited

github-actions bot commented Nov 13, 2023 •

edited