AMDGPU: Move libcall simplify into PeepholeEP #88853

arsenm · 2024-04-16T08:15:01Z

We were running this immediately on the incoming IR, which
is still littered with temporary allocas obscuring trivial values.
This needs to run after initial SROA to handle sincos insertion.

llvmbot · 2024-04-16T08:15:20Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

We were running this immediately on the incoming IR, which
is still littered with temporary allocas obscuring trivial values.
This needs to run after initial SROA to handle sincos insertion.

Full diff: https://github.com/llvm/llvm-project/pull/88853.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+10-3)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll (+77)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index f7e552177d6f50..305a6c8c3b9262 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -655,9 +655,6 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(
   PB.registerPipelineStartEPCallback(
       [](ModulePassManager &PM, OptimizationLevel Level) {
         FunctionPassManager FPM;
-        FPM.addPass(AMDGPUUseNativeCallsPass());
-        if (EnableLibCallSimplify && Level != OptimizationLevel::O0)
-          FPM.addPass(AMDGPUSimplifyLibCallsPass());
         PM.addPass(createModuleToFunctionPassAdaptor(std::move(FPM)));
         if (EnableHipStdPar)
           PM.addPass(HipStdParAcceleratorCodeSelectionPass());
@@ -681,6 +678,16 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(
           PM.addPass(AMDGPUAlwaysInlinePass());
       });
 
+  PB.registerPeepholeEPCallback(
+      [](FunctionPassManager &FPM, OptimizationLevel Level) {
+        if (Level == OptimizationLevel::O0)
+          return;
+
+        FPM.addPass(AMDGPUUseNativeCallsPass());
+        if (EnableLibCallSimplify)
+          FPM.addPass(AMDGPUSimplifyLibCallsPass());
+      });
+
   PB.registerCGSCCOptimizerLateEPCallback(
       [this](CGSCCPassManager &PM, OptimizationLevel Level) {
         if (Level == OptimizationLevel::O0)
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll
new file mode 100644
index 00000000000000..4ae4b1031f47d9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll
@@ -0,0 +1,77 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -O1 -amdgpu-prelink %s | FileCheck %s
+
+; Make sure that sin+cos -> sincos simplification happens after
+; initial IR simplifications, otherwise we can't identify the common
+; argument value.
+
+@.str = private unnamed_addr addrspace(4) constant [21 x i8] c"x: %f, y: %f, z: %f\0A\00", align 1
+
+; Should have call to sincos declarations, not calls to the asm pseudo-libcalls
+define protected amdgpu_kernel void @sincosTest(ptr addrspace(1) %out0, ptr addrspace(1) %out1, ptr addrspace(1) %out2, float noundef %x) #0 {
+; CHECK-LABEL: define protected amdgpu_kernel void @sincosTest(
+; CHECK-SAME: ptr addrspace(1) nocapture writeonly [[OUT0:%.*]], ptr addrspace(1) nocapture writeonly [[OUT1:%.*]], ptr addrspace(1) nocapture writeonly [[OUT2:%.*]], float noundef [[X:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[__SINCOS_:%.*]] = alloca float, align 4, addrspace(5)
+; CHECK-NEXT:    [[I_I:%.*]] = call float @_Z6sincosfPU3AS5f(float [[X]], ptr addrspace(5) [[__SINCOS_]]) #[[ATTR1:[0-9]+]]
+; CHECK-NEXT:    [[I_I2:%.*]] = load float, ptr addrspace(5) [[__SINCOS_]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[I_I]], [[I_I2]]
+; CHECK-NEXT:    [[CONV:%.*]] = fpext float [[X]] to double
+; CHECK-NEXT:    [[CONV5:%.*]] = fpext float [[ADD]] to double
+; CHECK-NEXT:    store double [[CONV]], ptr addrspace(1) [[OUT0]], align 8
+; CHECK-NEXT:    store double [[CONV5]], ptr addrspace(1) [[OUT1]], align 8
+; CHECK-NEXT:    store double [[CONV5]], ptr addrspace(1) [[OUT2]], align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  %x.addr = alloca float, align 4, addrspace(5)
+  %y = alloca float, align 4, addrspace(5)
+  %z = alloca float, align 4, addrspace(5)
+  store float %x, ptr addrspace(5) %x.addr, align 4
+  call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) %y)
+  %i = load float, ptr addrspace(5) %x.addr, align 4
+  %call = call float @_Z3sinf(float noundef %i) #3
+  %i1 = load float, ptr addrspace(5) %x.addr, align 4
+  %call1 = call float @_Z3cosf(float noundef %i1) #3
+  %add = fadd float %call, %call1
+  store float %add, ptr addrspace(5) %y, align 4
+  call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) %z)
+  %i2 = load float, ptr addrspace(5) %x.addr, align 4
+  %call2 = call float @_Z3cosf(float noundef %i2) #3
+  %i3 = load float, ptr addrspace(5) %x.addr, align 4
+  %call3 = call float @_Z3sinf(float noundef %i3) #3
+  %add4 = fadd float %call2, %call3
+  store float %add4, ptr addrspace(5) %z, align 4
+  %i4 = load float, ptr addrspace(5) %x.addr, align 4
+  %conv = fpext float %i4 to double
+  %i5 = load float, ptr addrspace(5) %y, align 4
+  %conv5 = fpext float %i5 to double
+  %i6 = load float, ptr addrspace(5) %z, align 4
+  %conv6 = fpext float %i6 to double
+  store double %conv, ptr addrspace(1) %out0, align 8
+  store double %conv5, ptr addrspace(1) %out1, align 8
+  store double %conv6, ptr addrspace(1) %out2, align 8
+  call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %z)
+  call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %y)
+  ret void
+}
+
+declare void @llvm.lifetime.start.p5(i64 immarg, ptr addrspace(5) nocapture) #1
+declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture) #1
+
+define internal float @_Z3cosf(float noundef %arg) #2 {
+bb:
+  %i = tail call float asm "pseudo-libcall-cos %0, %1", "=v,v"(float noundef %arg) #2
+  ret float %i
+}
+
+define internal float @_Z3sinf(float noundef %arg) #2 {
+bb:
+  %i = tail call float asm "pseudo-libcall-sin %0, %1", "=v,v"(float noundef %arg) #2
+  ret float %i
+}
+
+attributes #0 = { norecurse nounwind }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
+attributes #2 = { mustprogress nofree norecurse nounwind willreturn memory(none) }
+attributes #3 = { nounwind willreturn memory(none) }

We were running this immediately on the incoming IR, which is still littered with temporary allocas obscuring trivial values. This needs to run after initial SROA to handle sincos insertion. Fixes: SWDEV-456865

Pierre-vh · 2024-04-16T08:29:16Z

llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll

+declare void @llvm.lifetime.start.p5(i64 immarg, ptr addrspace(5) nocapture) #1
+declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture) #1


Do we still need intrinsic declarations?

Kind of no, but it's still canonically printed

Pierre-vh · 2024-04-16T08:29:30Z

llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll

+attributes #0 = { norecurse nounwind }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
+attributes #2 = { mustprogress nofree norecurse nounwind willreturn memory(none) }
+attributes #3 = { nounwind willreturn memory(none) }


Do we need all these attributes?

We need enough to get the asm calls deleted

Pierre-vh · 2024-04-16T08:29:53Z

llvm/test/CodeGen/AMDGPU/amdgpu-libcall-sincos-pass-ordering.ll

+  call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %z)
+  call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %y)


are the lifetime intrinsics actually needed?

They're more representative of actual IR the pass will see, which is part of the point

lamb-j

This fixes the issue I was having with my sincos() example. Thanks!

We were running this immediately on the incoming IR, which is still littered with temporary allocas obscuring trivial values. This needs to run after initial SROA to handle sincos insertion. Cherry-pick bc3620d for SWDEV-436409 Change-Id: Ifa1945ff5b7eb2d3a60f68aca923f05882b03b4a

arsenm added the backend:AMDGPU label Apr 16, 2024

arsenm requested review from vpykhtin, rampitec, Pierre-vh and lamb-j April 16, 2024 08:15

arsenm added 2 commits April 16, 2024 10:16

AMDGPU: Add baseline test for sincos phase ordering

46e4256

AMDGPU: Move libcall simplify into PeepholeEP

0adb3eb

We were running this immediately on the incoming IR, which is still littered with temporary allocas obscuring trivial values. This needs to run after initial SROA to handle sincos insertion. Fixes: SWDEV-456865

arsenm force-pushed the amdgpu-libcall-move-to-peepholeep branch from 84aaed7 to 0adb3eb Compare April 16, 2024 08:17

Pierre-vh reviewed Apr 16, 2024

View reviewed changes

rampitec approved these changes Apr 16, 2024

View reviewed changes

Test updates

4d4f1f3

lamb-j approved these changes Apr 16, 2024

View reviewed changes

arsenm merged commit bc3620d into llvm:main Apr 17, 2024
3 of 4 checks passed

arsenm deleted the amdgpu-libcall-move-to-peepholeep branch April 17, 2024 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMDGPU: Move libcall simplify into PeepholeEP #88853

AMDGPU: Move libcall simplify into PeepholeEP #88853

arsenm commented Apr 16, 2024

llvmbot commented Apr 16, 2024

Pierre-vh Apr 16, 2024

arsenm Apr 16, 2024

Pierre-vh Apr 16, 2024

arsenm Apr 16, 2024

Pierre-vh Apr 16, 2024

arsenm Apr 16, 2024

lamb-j left a comment

		declare void @llvm.lifetime.start.p5(i64 immarg, ptr addrspace(5) nocapture) #1
		declare void @llvm.lifetime.end.p5(i64 immarg, ptr addrspace(5) nocapture) #1

		call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %z)
		call void @llvm.lifetime.end.p5(i64 4, ptr addrspace(5) %y)

AMDGPU: Move libcall simplify into PeepholeEP #88853

AMDGPU: Move libcall simplify into PeepholeEP #88853

Conversation

arsenm commented Apr 16, 2024

llvmbot commented Apr 16, 2024

Pierre-vh Apr 16, 2024

Choose a reason for hiding this comment

arsenm Apr 16, 2024

Choose a reason for hiding this comment

Pierre-vh Apr 16, 2024

Choose a reason for hiding this comment

arsenm Apr 16, 2024

Choose a reason for hiding this comment

Pierre-vh Apr 16, 2024

Choose a reason for hiding this comment

arsenm Apr 16, 2024

Choose a reason for hiding this comment

lamb-j left a comment

Choose a reason for hiding this comment