[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

saiislam · 2023-09-05T20:02:07Z

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

jhuber6 · 2023-09-05T20:07:07Z

The libc tests will need to be updated as well. I'll need to look into that.

Summary: There is currently effort to change over the default AMDGPU code object version llvm#65410. However, this unfortunately causes problems in the LLVM LibC test suite that leads to a hang while executing. This is most likely a bug to do with indirect call optimization, as it can be avoided without optimizations or with manually preventing inlining in the AMDGPU startup code. This patch sets the AMDGPU code object version to be four explicitly on the LibC test suite. This should unblock the efforts to move the default to 5 without breaking the test suite. This isn't a great solution, but there is currently some time pressure to get COV5 landed and this seems to be the easiest solution.

Summary: There is currently effort to change over the default AMDGPU code object version #65410. However, this unfortunately causes problems in the LLVM LibC test suite that leads to a hang while executing. This is most likely a bug to do with indirect call optimization, as it can be avoided without optimizations or with manually preventing inlining in the AMDGPU startup code. This patch sets the AMDGPU code object version to be four explicitly on the LibC test suite. This should unblock the efforts to move the default to 5 without breaking the test suite. This isn't a great solution, but there is currently some time pressure to get COV5 landed and this seems to be the easiest solution.

jhuber6

I manually set the code object of the libc test suite to be 4 for the time being. That should unblock this.

Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: llvm#65410 Differential Revision: https://reviews.llvm.org/D129818

llvmbot · 2023-09-12T08:20:41Z

@llvm/pr-subscribers-lld-elf

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)
(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
(modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
(modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
(modified) clang/test/Driver/hip-device-libs.hip (+8-8)
(modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
(modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
(modified) llvm/docs/AMDGPUUsage.rst (+7-8)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
(modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
(modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
(modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
(modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
(modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
(modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
(modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
(modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
(modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
(modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
(modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
(modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
(modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
(modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
(modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
(modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
(modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
(modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, CC1Option]>,
   Values<"none,2,3,4,5">,
   NormalizedValuesScope<"TargetOptions">,
   NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
 
 unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                            const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
     StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
   return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=2 -o - %s | FileCheck -check-prefix=V2 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
 //.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPT: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900: attributes #8 = { nounwind }
 // GFX900: attributes #9 = { convergent nounwind }
 //.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
 // NOCPU: !2 = !{i32 2, i32 0}
 // NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
 // NOCPU: !15 = !{i32 1}
 // NOCPU: !16 = !{!"int*"}
 //.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
 // GFX900: !2 = !{i32 2, i32 0}
 // GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
 }
 
 // CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 4
+// CHECK: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 12
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 6
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 14
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 8
-// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 16
+// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 8, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 void test_get_workgroup_size(int d, global int *out)
 {
 	switch (d) {
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 71d9554da696b42..f049bf3467c611c 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -160,13 +160,13 @@
 // Test default code object version.
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test default code object version with old device library without abi_version_400.bc
-// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// Test default code object version with old device library without abi_version_500.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
 // Test -mcode-object-version=3
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
@@ -193,12 +193,12 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
-// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
-// RUN:   -mcode-object-version=5 \
+// Test -mcode-object-version=4 with old device library without abi_version_400.bc
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=4 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
 
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
diff --git a/lld/test/ELF/emulation-amdgpu.s b/lld/test/ELF/emulation-amdgpu.s
index 707f0aeb909efae..329fb1c69b16665 100644
--- a/lld/test/ELF/emulation-amdgpu.s
+++ b/lld/test/ELF/emulation-amdgpu.s
@@ -13,7 +13,7 @@
 # CHECK-NEXT:     DataEncoding: LittleEndian (0x1)
 # CHECK-NEXT:     FileVersion: 1
 # CHECK-NEXT:     OS/ABI: AMDGPU_HSA (0x40)
-# CHECK-NEXT:     ABIVersion: 2
+# CHECK-NEXT:     ABIVersion: 3
 # CHECK-NEXT:     Unused: (00 00 00 00 00 00 00)
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Type: Executable (0x2)
diff --git a/lld/test/ELF/lto/amdgcn-oses.ll b/lld/test/ELF/lto/amdgcn-oses.ll
index a2f25cdd57d87b5..a70b678ac25141c 100644
--- a/lld/test/ELF/lto/amdgcn-oses.ll
+++ b/lld/test/ELF/lto/amdgcn-oses.ll
@@ -15,7 +15,7 @@
 ; RUN: llvm-readobj --file-headers %t/mesa3d.so | FileCheck %s --check-prefixes=GCN,NON-AMDHSA,MESA3D
 
 ; AMDHSA: OS/ABI: AMDGPU_HSA (0x40)
-; AMDHSA: ABIVersion: 2
+; AMDHSA: ABIVersion: 3
 
 ; AMDPAL: OS/ABI: AMDGPU_PAL (0x41)
 ; MESA3D: OS/ABI: AMDGPU_MESA3D (0x42)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index f733c514ffbee47..923b8e2e62ab9d3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1418,12 +1418,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
     runtime ABI for code object V4. Specify using the Clang option
-    ``-mcode-object-version=4``. This is the default code object
-    version if not specified.
+    ``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V5`` is used to specify the version of AMD HSA
     runtime ABI for code object V5. Specify using the Clang option
-    ``-mcode-object-version=5``.
+    ``-mcode-object-version=5``. This is the default code object
+    version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
     runtime ABI.
@@ -3852,6 +3852,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++++++++++++++++++++++
 
+. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
@@ -3882,11 +3886,6 @@ defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
 Code Object V5 Metadata
 +++++++++++++++++++++++
 
-.. warning::
-  Code object V5 is not the default code object version emitted by this version
-  of LLVM.
-
-
 Code object V5 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v4` with the changes defined in table
 :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v5`, table
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 63da86391e5c650..d0abd08cc78b8ec 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -34,7 +34,7 @@
 static llvm::cl::opt
     AmdhsaCodeObjectVersion("amdhsa-code-object-version", llvm::cl::Hidde...

llvmbot · 2023-09-12T08:20:44Z

@llvm/pr-subscribers-clang

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)
(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
(modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
(modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
(modified) clang/test/Driver/hip-device-libs.hip (+8-8)
(modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
(modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
(modified) llvm/docs/AMDGPUUsage.rst (+7-8)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
(modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
(modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
(modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
(modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
(modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
(modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
(modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
(modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
(modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
(modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
(modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
(modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
(modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
(modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
(modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
(modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
(modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
(modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
(modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
(modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, CC1Option]>,
   Values<"none,2,3,4,5">,
   NormalizedValuesScope<"TargetOptions">,
   NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
 
 unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                            const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
     StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
   return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=2 -o - %s | FileCheck -check-prefix=V2 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
 //.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPT: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900: attributes #8 = { nounwind }
 // GFX900: attributes #9 = { convergent nounwind }
 //.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
 // NOCPU: !2 = !{i32 2, i32 0}
 // NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
 // NOCPU: !15 = !{i32 1}
 // NOCPU: !16 = !{!"int*"}
 //.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
 // GFX900: !2 = !{i32 2, i32 0}
 // GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
 }
 
 // CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 4
+// CHECK: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 12
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 6
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 14
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 8
-// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 16
+// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 8, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 void test_get_workgroup_size(int d, global int *out)
 {
 	switch (d) {
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 71d9554da696b42..f049bf3467c611c 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -160,13 +160,13 @@
 // Test default code object version.
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test default code object version with old device library without abi_version_400.bc
-// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// Test default code object version with old device library without abi_version_500.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
 // Test -mcode-object-version=3
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
@@ -193,12 +193,12 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
-// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
-// RUN:   -mcode-object-version=5 \
+// Test -mcode-object-version=4 with old device library without abi_version_400.bc
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=4 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
 
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
diff --git a/lld/test/ELF/emulation-amdgpu.s b/lld/test/ELF/emulation-amdgpu.s
index 707f0aeb909efae..329fb1c69b16665 100644
--- a/lld/test/ELF/emulation-amdgpu.s
+++ b/lld/test/ELF/emulation-amdgpu.s
@@ -13,7 +13,7 @@
 # CHECK-NEXT:     DataEncoding: LittleEndian (0x1)
 # CHECK-NEXT:     FileVersion: 1
 # CHECK-NEXT:     OS/ABI: AMDGPU_HSA (0x40)
-# CHECK-NEXT:     ABIVersion: 2
+# CHECK-NEXT:     ABIVersion: 3
 # CHECK-NEXT:     Unused: (00 00 00 00 00 00 00)
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Type: Executable (0x2)
diff --git a/lld/test/ELF/lto/amdgcn-oses.ll b/lld/test/ELF/lto/amdgcn-oses.ll
index a2f25cdd57d87b5..a70b678ac25141c 100644
--- a/lld/test/ELF/lto/amdgcn-oses.ll
+++ b/lld/test/ELF/lto/amdgcn-oses.ll
@@ -15,7 +15,7 @@
 ; RUN: llvm-readobj --file-headers %t/mesa3d.so | FileCheck %s --check-prefixes=GCN,NON-AMDHSA,MESA3D
 
 ; AMDHSA: OS/ABI: AMDGPU_HSA (0x40)
-; AMDHSA: ABIVersion: 2
+; AMDHSA: ABIVersion: 3
 
 ; AMDPAL: OS/ABI: AMDGPU_PAL (0x41)
 ; MESA3D: OS/ABI: AMDGPU_MESA3D (0x42)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index f733c514ffbee47..923b8e2e62ab9d3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1418,12 +1418,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
     runtime ABI for code object V4. Specify using the Clang option
-    ``-mcode-object-version=4``. This is the default code object
-    version if not specified.
+    ``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V5`` is used to specify the version of AMD HSA
     runtime ABI for code object V5. Specify using the Clang option
-    ``-mcode-object-version=5``.
+    ``-mcode-object-version=5``. This is the default code object
+    version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
     runtime ABI.
@@ -3852,6 +3852,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++++++++++++++++++++++
 
+. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
@@ -3882,11 +3886,6 @@ defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
 Code Object V5 Metadata
 +++++++++++++++++++++++
 
-.. warning::
-  Code object V5 is not the default code object version emitted by this version
-  of LLVM.
-
-
 Code object V5 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v4` with the changes defined in table
 :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v5`, table
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 63da86391e5c650..d0abd08cc78b8ec 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -34,7 +34,7 @@
 static llvm::cl::opt
     AmdhsaCodeObjectVersion("amdhsa-code-object-version", llvm::cl::Hidde...

llvmbot · 2023-09-12T08:21:56Z

@llvm/pr-subscribers-lld-elf
Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)

(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)

(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)

(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)

(modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)

(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)

(modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)

(modified) clang/test/Driver/hip-device-libs.hip (+8-8)

(modified) lld/test/ELF/emulation-amdgpu.s (+1-1)

(modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)

(modified) llvm/docs/AMDGPUUsage.rst (+7-8)

(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)

(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)

(modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)

(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)

(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)

(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)

(modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)

(modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)

(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)

(modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)

(modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)

(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)

(modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)

(modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)

(modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)

(modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)

(modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)

(modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)

(modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)

(modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)

(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)

(modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)

(modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)

(modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)

(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)

(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)

(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)

(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)

(modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)

(modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)

(modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)

(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)

(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)

(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)

(modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)

(modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)

(modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
  NegFlag>, Group;

def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
  Visibility<[ClangOption, CC1Option]>,
  Values<"none,2,3,4,5">,
  NormalizedValuesScope<"TargetOptions">,
  NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;

defm cumode : SimpleMFlag<"cumode",
  "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,

unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                           const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
  if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
    StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
  return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
// Create module flag for code object version.

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o -  | FileCheck  -check-prefix=V4
+// RUN:   -o -  | FileCheck  -check-prefix=V5

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
// RUN:   -mcode-object-version=2 -o -  | FileCheck -check-prefix=V2 
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=PRECOV5 


// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=COV5 

// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
//.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
//.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPT: !2 = !{i32 1, !"wchar_size", i32 4}
//.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
// GFX900: attributes #8 = { nounwind }
// GFX900: attributes #9 = { convergent nounwind }
//.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
// NOCPU: !2 = !{i32 2, i32 0}
// NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
// NOCPU: !15 = !{i32 1}
// NOCPU: !16 = !{!"int*"}
//.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
// GFX900: !2 = !{i32 2, i32 0}
// GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
}

// CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) 

Error: Command failed due to missing milestone.

llvmbot · 2023-09-12T08:22:15Z

@llvm/pr-subscribers-clang
Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+2-2)

(modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)

(modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)

(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)

(modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)

(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)

(modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)

(modified) clang/test/Driver/hip-device-libs.hip (+8-8)

(modified) lld/test/ELF/emulation-amdgpu.s (+1-1)

(modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)

(modified) llvm/docs/AMDGPUUsage.rst (+7-8)

(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)

(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)

(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)

(modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)

(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)

(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)

(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)

(modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)

(modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)

(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)

(modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)

(modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)

(modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)

(modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)

(modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)

(modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)

(modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)

(modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)

(modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)

(modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)

(modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)

(modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)

(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)

(modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)

(modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)

(modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)

(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)

(modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)

(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)

(modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)

(modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)

(modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)

(modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)

(modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)

(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)

(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)

(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)

(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)

(modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)

(modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)

(modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)

(modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)

(modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)

(modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
  NegFlag>, Group;

def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
  Visibility<[ClangOption, CC1Option]>,
  Values<"none,2,3,4,5">,
  NormalizedValuesScope<"TargetOptions">,
  NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;

defm cumode : SimpleMFlag<"cumode",
  "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,

unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                           const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
  if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
    StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
  return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
// Create module flag for code object version.

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o -  | FileCheck  -check-prefix=V4
+// RUN:   -o -  | FileCheck  -check-prefix=V5

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
// RUN:   -mcode-object-version=2 -o -  | FileCheck -check-prefix=V2 
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=PRECOV5 


// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=COV5 

// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
//.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
//.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPT: !2 = !{i32 1, !"wchar_size", i32 4}
//.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
// GFX900: attributes #8 = { nounwind }
// GFX900: attributes #9 = { convergent nounwind }
//.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
// NOCPU: !2 = !{i32 2, i32 0}
// NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
// NOCPU: !15 = !{i32 1}
// NOCPU: !16 = !{!"int*"}
//.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
// GFX900: !2 = !{i32 2, i32 0}
// GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
}

// CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) 

Error: Command failed due to missing milestone.

jplehr · 2023-09-12T08:37:31Z

Hey @saiislam this broke the AMDGPU OpenMP buildbot https://lab.llvm.org/buildbot/#/builders/193/builds/38293
If you need assistance, let me know.

…m#65410)" This reverts commit 0a8d17e.

)" (#66060) This reverts commit 0a8d17e.

Summary: There is currently effort to change over the default AMDGPU code object version llvm#65410. However, this unfortunately causes problems in the LLVM LibC test suite that leads to a hang while executing. This is most likely a bug to do with indirect call optimization, as it can be avoided without optimizations or with manually preventing inlining in the AMDGPU startup code. This patch sets the AMDGPU code object version to be four explicitly on the LibC test suite. This should unblock the efforts to move the default to 5 without breaking the test suite. This isn't a great solution, but there is currently some time pressure to get COV5 landed and this seems to be the easiest solution.

Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: llvm#65410 Differential Revision: https://reviews.llvm.org/D129818

…m#65410)" (llvm#66060) This reverts commit 0a8d17e.

saiislam added the backend:AMDGPU label Sep 5, 2023

saiislam requested review from arsenm and jhuber6 September 5, 2023 20:02

saiislam requested review from a team as code owners September 5, 2023 20:02

github-actions bot added lld clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:globalisel labels Sep 5, 2023

saiislam requested a review from a team as a code owner September 7, 2023 20:24

saiislam force-pushed the cov5_default branch from e91bd0c to 201750e Compare September 7, 2023 20:42

arsenm approved these changes Sep 10, 2023

View reviewed changes

jhuber6 mentioned this pull request Sep 11, 2023

[libc] Manually set the AMDGPU code object version #65986

Merged

jhuber6 approved these changes Sep 11, 2023

View reviewed changes

[AMDGPU] Make default AMDHSA Code Object Version to be 5

857c6cd

Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: llvm#65410 Differential Revision: https://reviews.llvm.org/D129818

saiislam force-pushed the cov5_default branch from 7f4a051 to 857c6cd Compare September 12, 2023 08:19

llvmbot added clang Clang issues not falling into any other category lld:ELF labels Sep 12, 2023

saiislam merged commit 0a8d17e into llvm:main Sep 12, 2023

saiislam added a commit to saiislam/llvm-project that referenced this pull request Sep 12, 2023

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (llv…

bdaf40c

…m#65410)" This reverts commit 0a8d17e.

saiislam added a commit that referenced this pull request Sep 12, 2023

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410

466a814

)" (#66060) This reverts commit 0a8d17e.

vzakhari mentioned this pull request Sep 12, 2023

internap proc trampolines #66156

Closed

ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023

Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (llv…

8953ac5

…m#65410)" (llvm#66060) This reverts commit 0a8d17e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

Uh oh!

saiislam commented Sep 5, 2023

Uh oh!

jhuber6 commented Sep 5, 2023

Uh oh!

jhuber6 left a comment

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

jplehr commented Sep 12, 2023

Uh oh!

Uh oh!

[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

Uh oh!

Conversation

saiislam commented Sep 5, 2023

Uh oh!

jhuber6 commented Sep 5, 2023

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

llvmbot commented Sep 12, 2023

Differential Revision: https://reviews.llvm.org/D129818

Uh oh!

jplehr commented Sep 12, 2023

Uh oh!

Uh oh!