Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Make default AMDHSA Code Object Version to be 5 #65410

Merged
merged 1 commit into from Sep 12, 2023

Conversation

saiislam
Copy link
Contributor

@saiislam saiislam commented Sep 5, 2023

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

@saiislam saiislam requested review from a team as code owners September 5, 2023 20:02
@github-actions github-actions bot added lld clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm:globalisel labels Sep 5, 2023
@jhuber6
Copy link
Contributor

jhuber6 commented Sep 5, 2023

The libc tests will need to be updated as well. I'll need to look into that.

jhuber6 added a commit to jhuber6/llvm-project that referenced this pull request Sep 11, 2023
Summary:
There is currently effort to change over the default AMDGPU code object
version llvm#65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.

This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.
jhuber6 added a commit that referenced this pull request Sep 11, 2023
Summary:
There is currently effort to change over the default AMDGPU code object
version #65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.

This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually set the code object of the libc test suite to be 4 for the time being. That should unblock this.

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: llvm#65410

Differential Revision: https://reviews.llvm.org/D129818
@llvmbot llvmbot added clang Clang issues not falling into any other category lld:ELF labels Sep 12, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-lld-elf

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
  • (modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
  • (modified) clang/test/Driver/hip-device-libs.hip (+8-8)
  • (modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
  • (modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
  • (modified) llvm/docs/AMDGPUUsage.rst (+7-8)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
  • (modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
  • (modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
  • (modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
  • (modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
  • (modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
  • (modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
  • (modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
  • (modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, CC1Option]>,
   Values<"none,2,3,4,5">,
   NormalizedValuesScope<"TargetOptions">,
   NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
 
 unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                            const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
     StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
   return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=2 -o - %s | FileCheck -check-prefix=V2 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
 //.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPT: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900: attributes #8 = { nounwind }
 // GFX900: attributes #9 = { convergent nounwind }
 //.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
 // NOCPU: !2 = !{i32 2, i32 0}
 // NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
 // NOCPU: !15 = !{i32 1}
 // NOCPU: !16 = !{!"int*"}
 //.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
 // GFX900: !2 = !{i32 2, i32 0}
 // GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
 }
 
 // CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 4
+// CHECK: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 12
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 6
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 14
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 8
-// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 16
+// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 8, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 void test_get_workgroup_size(int d, global int *out)
 {
 	switch (d) {
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 71d9554da696b42..f049bf3467c611c 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -160,13 +160,13 @@
 // Test default code object version.
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test default code object version with old device library without abi_version_400.bc
-// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// Test default code object version with old device library without abi_version_500.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
 // Test -mcode-object-version=3
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
@@ -193,12 +193,12 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
-// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
-// RUN:   -mcode-object-version=5 \
+// Test -mcode-object-version=4 with old device library without abi_version_400.bc
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=4 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
 
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
diff --git a/lld/test/ELF/emulation-amdgpu.s b/lld/test/ELF/emulation-amdgpu.s
index 707f0aeb909efae..329fb1c69b16665 100644
--- a/lld/test/ELF/emulation-amdgpu.s
+++ b/lld/test/ELF/emulation-amdgpu.s
@@ -13,7 +13,7 @@
 # CHECK-NEXT:     DataEncoding: LittleEndian (0x1)
 # CHECK-NEXT:     FileVersion: 1
 # CHECK-NEXT:     OS/ABI: AMDGPU_HSA (0x40)
-# CHECK-NEXT:     ABIVersion: 2
+# CHECK-NEXT:     ABIVersion: 3
 # CHECK-NEXT:     Unused: (00 00 00 00 00 00 00)
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Type: Executable (0x2)
diff --git a/lld/test/ELF/lto/amdgcn-oses.ll b/lld/test/ELF/lto/amdgcn-oses.ll
index a2f25cdd57d87b5..a70b678ac25141c 100644
--- a/lld/test/ELF/lto/amdgcn-oses.ll
+++ b/lld/test/ELF/lto/amdgcn-oses.ll
@@ -15,7 +15,7 @@
 ; RUN: llvm-readobj --file-headers %t/mesa3d.so | FileCheck %s --check-prefixes=GCN,NON-AMDHSA,MESA3D
 
 ; AMDHSA: OS/ABI: AMDGPU_HSA (0x40)
-; AMDHSA: ABIVersion: 2
+; AMDHSA: ABIVersion: 3
 
 ; AMDPAL: OS/ABI: AMDGPU_PAL (0x41)
 ; MESA3D: OS/ABI: AMDGPU_MESA3D (0x42)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index f733c514ffbee47..923b8e2e62ab9d3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1418,12 +1418,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
     runtime ABI for code object V4. Specify using the Clang option
-    ``-mcode-object-version=4``. This is the default code object
-    version if not specified.
+    ``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V5`` is used to specify the version of AMD HSA
     runtime ABI for code object V5. Specify using the Clang option
-    ``-mcode-object-version=5``.
+    ``-mcode-object-version=5``. This is the default code object
+    version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
     runtime ABI.
@@ -3852,6 +3852,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++++++++++++++++++++++
 
+. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
@@ -3882,11 +3886,6 @@ defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
 Code Object V5 Metadata
 +++++++++++++++++++++++
 
-.. warning::
-  Code object V5 is not the default code object version emitted by this version
-  of LLVM.
-
-
 Code object V5 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v4` with the changes defined in table
 :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v5`, table
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 63da86391e5c650..d0abd08cc78b8ec 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -34,7 +34,7 @@
 static llvm::cl::opt
     AmdhsaCodeObjectVersion("amdhsa-code-object-version", llvm::cl::Hidde...

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-clang

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
  • (modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
  • (modified) clang/test/Driver/hip-device-libs.hip (+8-8)
  • (modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
  • (modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
  • (modified) llvm/docs/AMDGPUUsage.rst (+7-8)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
  • (modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
  • (modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
  • (modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
  • (modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
  • (modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
  • (modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
  • (modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
  • (modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
   NegFlag>, Group;
 
 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
   Visibility<[ClangOption, CC1Option]>,
   Values<"none,2,3,4,5">,
   NormalizedValuesScope<"TargetOptions">,
   NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;
 
 defm cumode : SimpleMFlag<"cumode",
   "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,
 
 unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                            const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
   if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
     StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
   return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
 // Create module flag for code object version.
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o - %s | FileCheck %s -check-prefix=V4
+// RUN:   -o - %s | FileCheck %s -check-prefix=V5
 
 // RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
 // RUN:   -mcode-object-version=2 -o - %s | FileCheck -check-prefix=V2 %s
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip %s \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
 // OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
 //.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
 // OPT: !2 = !{i32 1, !"wchar_size", i32 4}
 //.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900: attributes #8 = { nounwind }
 // GFX900: attributes #9 = { convergent nounwind }
 //.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
 // NOCPU: !2 = !{i32 2, i32 0}
 // NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
 // NOCPU: !15 = !{i32 1}
 // NOCPU: !16 = !{!"int*"}
 //.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
 // GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
 // GFX900: !2 = !{i32 2, i32 0}
 // GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
 }
 
 // CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 4
+// CHECK: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 12
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 6
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 14
 // CHECK: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 8
-// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 4, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
+// CHECK: getelementptr inbounds i8, ptr addrspace(4) %{{.*}}, i64 16
+// CHECK: load i16, ptr addrspace(4) %{{.*}}, align 8, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 void test_get_workgroup_size(int d, global int *out)
 {
 	switch (d) {
diff --git a/clang/test/Driver/hip-device-libs.hip b/clang/test/Driver/hip-device-libs.hip
index 71d9554da696b42..f049bf3467c611c 100644
--- a/clang/test/Driver/hip-device-libs.hip
+++ b/clang/test/Driver/hip-device-libs.hip
@@ -160,13 +160,13 @@
 // Test default code object version.
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test default code object version with old device library without abi_version_400.bc
-// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// Test default code object version with old device library without abi_version_500.bc
+// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
 
 // Test -mcode-object-version=3
 // RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
@@ -193,12 +193,12 @@
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
 // RUN: 2>&1 | FileCheck %s --check-prefixes=ABI5
 
-// Test -mcode-object-version=5 with old device library without abi_version_400.bc
-// RUN: not %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
-// RUN:   -mcode-object-version=5 \
+// Test -mcode-object-version=4 with old device library without abi_version_400.bc
+// RUN: %clang -### --target=x86_64-linux-gnu --offload-arch=gfx900 \
+// RUN:   -mcode-object-version=4 \
 // RUN:   --hip-device-lib-path=%S/Inputs/rocm/amdgcn/bitcode-no-abi-ver   \
 // RUN:   --rocm-path=%S/Inputs/rocm %S/Inputs/hip_multiple_inputs/b.hip \
-// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI5
+// RUN: 2>&1 | FileCheck %s --check-prefixes=NOABI4
 
 // ALL-NOT: error:
 // ALL: {{"[^"]*clang[^"]*"}}
diff --git a/lld/test/ELF/emulation-amdgpu.s b/lld/test/ELF/emulation-amdgpu.s
index 707f0aeb909efae..329fb1c69b16665 100644
--- a/lld/test/ELF/emulation-amdgpu.s
+++ b/lld/test/ELF/emulation-amdgpu.s
@@ -13,7 +13,7 @@
 # CHECK-NEXT:     DataEncoding: LittleEndian (0x1)
 # CHECK-NEXT:     FileVersion: 1
 # CHECK-NEXT:     OS/ABI: AMDGPU_HSA (0x40)
-# CHECK-NEXT:     ABIVersion: 2
+# CHECK-NEXT:     ABIVersion: 3
 # CHECK-NEXT:     Unused: (00 00 00 00 00 00 00)
 # CHECK-NEXT:   }
 # CHECK-NEXT:   Type: Executable (0x2)
diff --git a/lld/test/ELF/lto/amdgcn-oses.ll b/lld/test/ELF/lto/amdgcn-oses.ll
index a2f25cdd57d87b5..a70b678ac25141c 100644
--- a/lld/test/ELF/lto/amdgcn-oses.ll
+++ b/lld/test/ELF/lto/amdgcn-oses.ll
@@ -15,7 +15,7 @@
 ; RUN: llvm-readobj --file-headers %t/mesa3d.so | FileCheck %s --check-prefixes=GCN,NON-AMDHSA,MESA3D
 
 ; AMDHSA: OS/ABI: AMDGPU_HSA (0x40)
-; AMDHSA: ABIVersion: 2
+; AMDHSA: ABIVersion: 3
 
 ; AMDPAL: OS/ABI: AMDGPU_PAL (0x41)
 ; MESA3D: OS/ABI: AMDGPU_MESA3D (0x42)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index f733c514ffbee47..923b8e2e62ab9d3 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1418,12 +1418,12 @@ The AMDGPU backend uses the following ELF header:
 
   * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA
     runtime ABI for code object V4. Specify using the Clang option
-    ``-mcode-object-version=4``. This is the default code object
-    version if not specified.
+    ``-mcode-object-version=4``.
 
   * ``ELFABIVERSION_AMDGPU_HSA_V5`` is used to specify the version of AMD HSA
     runtime ABI for code object V5. Specify using the Clang option
-    ``-mcode-object-version=5``.
+    ``-mcode-object-version=5``. This is the default code object
+    version if not specified.
 
   * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL
     runtime ABI.
@@ -3852,6 +3852,10 @@ same *vendor-name*.
 Code Object V4 Metadata
 +++++++++++++++++++++++
 
+. warning::
+  Code object V4 is not the default code object version emitted by this version
+  of LLVM.
+
 Code object V4 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions
 defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
@@ -3882,11 +3886,6 @@ defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v4`.
 Code Object V5 Metadata
 +++++++++++++++++++++++
 
-.. warning::
-  Code object V5 is not the default code object version emitted by this version
-  of LLVM.
-
-
 Code object V5 metadata is the same as
 :ref:`amdgpu-amdhsa-code-object-metadata-v4` with the changes defined in table
 :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v5`, table
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 63da86391e5c650..d0abd08cc78b8ec 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -34,7 +34,7 @@
 static llvm::cl::opt
     AmdhsaCodeObjectVersion("amdhsa-code-object-version", llvm::cl::Hidde...

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-lld-elf

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
  • (modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
  • (modified) clang/test/Driver/hip-device-libs.hip (+8-8)
  • (modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
  • (modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
  • (modified) llvm/docs/AMDGPUUsage.rst (+7-8)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
  • (modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
  • (modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
  • (modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
  • (modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
  • (modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
  • (modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
  • (modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
  • (modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
  NegFlag>, Group;

def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
  Visibility<[ClangOption, CC1Option]>,
  Values<"none,2,3,4,5">,
  NormalizedValuesScope<"TargetOptions">,
  NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;

defm cumode : SimpleMFlag<"cumode",
  "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,

unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                           const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
  if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
    StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
  return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
// Create module flag for code object version.

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o -  | FileCheck  -check-prefix=V4
+// RUN:   -o -  | FileCheck  -check-prefix=V5

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
// RUN:   -mcode-object-version=2 -o -  | FileCheck -check-prefix=V2 
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=PRECOV5 


// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=COV5 

// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
//.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
//.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPT: !2 = !{i32 1, !"wchar_size", i32 4}
//.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
// GFX900: attributes #8 = { nounwind }
// GFX900: attributes #9 = { convergent nounwind }
//.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
// NOCPU: !2 = !{i32 2, i32 0}
// NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
// NOCPU: !15 = !{i32 1}
// NOCPU: !16 = !{!"int*"}
//.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
// GFX900: !2 = !{i32 2, i32 0}
// GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
}

// CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) 

Error: Command failed due to missing milestone.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 12, 2023

@llvm/pr-subscribers-clang

Changes

Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Differential Revision: https://reviews.llvm.org/D129818

Patch is 1.92 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/65410.diff

109 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-2)
  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-code-object-version.cu (+1-1)
  • (modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (+2-2)
  • (modified) clang/test/CodeGenHIP/default-attributes.hip (+2-2)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+2-2)
  • (modified) clang/test/CodeGenOpenCL/builtins-amdgcn.cl (+5-5)
  • (modified) clang/test/Driver/hip-device-libs.hip (+8-8)
  • (modified) lld/test/ELF/emulation-amdgpu.s (+1-1)
  • (modified) lld/test/ELF/lto/amdgcn-oses.ll (+1-1)
  • (modified) llvm/docs/AMDGPUUsage.rst (+7-8)
  • (modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/crash-stack-address-O0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/dropped_debug_info_assert.ll (+28-29)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-assert-align.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-atomicrmw.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-abi-attribute-hints.ll (+53-55)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll (+468-484)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-return-values.ll (+1578-1623)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-sret.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call.ll (+2013-2077)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-constant-fold-vector-op.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-indirect-call.ll (+31-32)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll (+18-18)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-tail-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-addrspacecast.mir (+28-39)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll (+7-4)
  • (modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+75-35)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.gfx6.ll (+46-20)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-noopt.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+22-24)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll (+384-366)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage-agpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage1.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage2.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-alias-register-usage3.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+1219-1461)
  • (modified) llvm/test/CodeGen/AMDGPU/call-waitcnt.ll (+42-47)
  • (modified) llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll (+8-12)
  • (modified) llvm/test/CodeGen/AMDGPU/cc-update.ll (+144-160)
  • (modified) llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll (+45-45)
  • (modified) llvm/test/CodeGen/AMDGPU/codegen-internal-only-func.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/collapse-endcf.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll (+20-20)
  • (modified) llvm/test/CodeGen/AMDGPU/dagcombine-lshr-and-cmp.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/ds_read2.ll (+28-32)
  • (modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/elf-header-osabi.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/flat-scratch-init.ll (+17-19)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx11-user-sgpr-init16-bug.ll (+19-12)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fadd.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmax.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fmin.ll (+330-363)
  • (modified) llvm/test/CodeGen/AMDGPU/global_atomics_scan_fsub.ll (+550-605)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll (+19-19)
  • (modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+57-59)
  • (modified) llvm/test/CodeGen/AMDGPU/kernel-vgpr-spill-mubuf-with-voffset.ll (+25-32)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll (+124-112)
  • (modified) llvm/test/CodeGen/AMDGPU/lds-global-non-entry-func.ll (+48-22)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (+8-7)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.dbg.value.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-kernargs.ll (+118-118)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll (+2-4)
  • (modified) llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll (+1-9)
  • (modified) llvm/test/CodeGen/AMDGPU/module-lds-false-sharing.ll (+48-44)
  • (modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+33-61)
  • (modified) llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+72-72)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-calling-conv.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/reqd-work-group-size.ll (+24-24)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-no-vgprs.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-update-only-slot-indexes.ll (+10-23)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/sopk-no-literal.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/spill-m0.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+30-30)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-all-on.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-off-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-1.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-mul-func-xnack-any-on-2.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-any.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-not-supported.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-off.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/tid-one-func-xnack-on.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+157-167)
  • (modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr-spill-placement-issue61083.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll (+25-36)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx10.s (+4)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx90a.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-sgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-vgpr.s (+3)
  • (modified) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-zeroed-gfx10.s (+1)
  • (modified) openmp/libomptarget/test/offloading/thread_limit.c (+1)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a5f5ca29053b43b..0484575ca482717 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4616,12 +4616,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
  NegFlag>, Group;

def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group,
-  HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
+  HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
  Visibility<[ClangOption, CC1Option]>,
  Values<"none,2,3,4,5">,
  NormalizedValuesScope<"TargetOptions">,
  NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
-  MarshallingInfoEnum, "COV_4">;
+  MarshallingInfoEnum, "COV_5">;

defm cumode : SimpleMFlag<"cumode",
  "Specify CU wavefront", "Specify WGP wavefront",
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 1db05b47e11ba6a..471ff2541d355b6 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2341,7 +2341,7 @@ void tools::checkAMDGPUCodeObjectVersion(const Driver &D,

unsigned tools::getAMDGPUCodeObjectVersion(const Driver &D,
                                           const llvm::opt::ArgList &Args) {
-  unsigned CodeObjVer = 4; // default
+  unsigned CodeObjVer = 5; // default
  if (auto *CodeObjArg = getAMDGPUCodeObjectArgument(D, Args))
    StringRef(CodeObjArg->getValue()).getAsInteger(0, CodeObjVer);
  return CodeObjVer;
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
index 16505b34c4a6e09..62ccc2bd4d05db3 100644
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-code-object-version.cu
@@ -1,7 +1,7 @@
// Create module flag for code object version.

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
-// RUN:   -o -  | FileCheck  -check-prefix=V4
+// RUN:   -o -  | FileCheck  -check-prefix=V5

// RUN: �lang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
// RUN:   -mcode-object-version=2 -o -  | FileCheck -check-prefix=V2 
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index c661b06d57b78d7..b885b9991a9bd31 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -1,10 +1,10 @@
// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=PRECOV5 


// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=5 -emit-llvm -o - -x hip  \
+// RUN:     -fcuda-is-device -emit-llvm -o - -x hip  \
// RUN:     | FileCheck -check-prefix=COV5 

// RUN: �lang_cc1 -triple amdgcn-amd-amdhsa \
diff --git a/clang/test/CodeGenHIP/default-attributes.hip b/clang/test/CodeGenHIP/default-attributes.hip
index 80aa1ee0700628f..9c9ea521271b99b 100644
--- a/clang/test/CodeGenHIP/default-attributes.hip
+++ b/clang/test/CodeGenHIP/default-attributes.hip
@@ -46,11 +46,11 @@ __global__ void kernel() {
// OPT: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
// OPT: attributes #1 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "amdgpu-flat-work-group-size"="1,1024" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
//.
-// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPTNONE: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPTNONE: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPTNONE: !2 = !{i32 1, !"wchar_size", i32 4}
//.
-// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// OPT: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// OPT: !1 = !{i32 1, !"amdgpu_printf_kind", !"hostcall"}
// OPT: !2 = !{i32 1, !"wchar_size", i32 4}
//.
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index e574b1f64c499bd..2cf1286e2b54e8e 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -703,7 +703,7 @@ kernel void test_target_features_kernel(global int *i) {
// GFX900: attributes #8 = { nounwind }
// GFX900: attributes #9 = { convergent nounwind }
//.
-// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// NOCPU: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// NOCPU: !1 = !{i32 1, !"wchar_size", i32 4}
// NOCPU: !2 = !{i32 2, i32 0}
// NOCPU: !3 = !{i32 1, i32 0, i32 1, i32 0}
@@ -721,7 +721,7 @@ kernel void test_target_features_kernel(global int *i) {
// NOCPU: !15 = !{i32 1}
// NOCPU: !16 = !{!"int*"}
//.
-// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 400}
+// GFX900: !0 = !{i32 1, !"amdgpu_code_object_version", i32 500}
// GFX900: !1 = !{i32 1, !"wchar_size", i32 4}
// GFX900: !2 = !{i32 2, i32 0}
// GFX900: !3 = !{!4, !4, i64 0}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
index 8938642e3b19f8c..74b5ef03b52f21e 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn.cl
@@ -599,13 +599,13 @@ void test_get_local_id(int d, global int *out)
}

// CHECK-LABEL: @test_get_workgroup_size(
-// CHECK: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK: getelementptr inbounds i8, ptr addrspace(4) 

Error: Command failed due to missing milestone.

@saiislam saiislam merged commit 0a8d17e into llvm:main Sep 12, 2023
3 of 4 checks passed
@jplehr
Copy link
Contributor

jplehr commented Sep 12, 2023

Hey @saiislam this broke the AMDGPU OpenMP buildbot https://lab.llvm.org/buildbot/#/builders/193/builds/38293
If you need assistance, let me know.

saiislam added a commit to saiislam/llvm-project that referenced this pull request Sep 12, 2023
saiislam added a commit that referenced this pull request Sep 12, 2023
mhalk pushed a commit to mhalk/llvm-project that referenced this pull request Sep 13, 2023
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: llvm#65410

Differential Revision: https://reviews.llvm.org/D129818
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
Summary:
There is currently effort to change over the default AMDGPU code object
version llvm#65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.

This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: llvm#65410

Differential Revision: https://reviews.llvm.org/D129818
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category lld:ELF lld llvm:globalisel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants