[AMDGPU] Stop combining arbitrary offsets into PAL relocs #80034

jayfoad · 2024-01-30T17:04:57Z

PAL uses ELF REL (not RELA) relocations which can only store a 32-bit
addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI
which require the upper 32 bits of a 64-bit address calculation to be
correct. This means that it is not safe to fold an arbitrary offset into
a GlobalAddressSDNode, so stop doing that.

In practice this is mostly a problem for small negative offsets which do
not work as expected because PAL treats the 32-bit addend as unsigned.

PAL uses ELF REL (not RELA) relocations which can only store a 32-bit addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI which require the upper 32 bits of a 64-bit address calculation to be correct. This means that it is not safe to fold an arbitrary offset into a GlobalAddressSDNode, so stop doing that. In practice this is mostly a problem for small negative offsets which do not work as expected because PAL treats the 32-bit addend as unsigned.

llvmbot · 2024-01-30T17:05:31Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

PAL uses ELF REL (not RELA) relocations which can only store a 32-bit
addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI
which require the upper 32 bits of a 64-bit address calculation to be
correct. This means that it is not safe to fold an arbitrary offset into
a GlobalAddressSDNode, so stop doing that.

In practice this is mostly a problem for small negative offsets which do
not work as expected because PAL treats the 32-bit addend as unsigned.

Full diff: https://github.com/llvm/llvm-project/pull/80034.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+11)
(modified) llvm/test/CodeGen/AMDGPU/global-constant.ll (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll (+2-3)
(modified) llvm/test/CodeGen/AMDGPU/rel32.ll (+16-3)

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 7ab062bcc4da7..e6ca3bd0d9f34 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7133,6 +7133,17 @@ SDValue SITargetLowering::lowerBUILD_VECTOR(SDValue Op,
 
 bool
 SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
+  // Subtargets that use ELF REL relocations (instead of RELA) can only store a
+  // 32-bit addend in the instruction, so it is not safe to allow offset folding
+  // which can create arbitrary 64-bit addends. (This is only a problem for
+  // R_AMDGPU_*32_HI relocations since other relocation types are unaffected by
+  // the high 32 bits of the addend.)
+  //
+  // This should be kept in sync with how HasRelocationAddend is initialized in
+  // the constructor of ELFAMDGPUAsmBackend.
+  if (!Subtarget->isAmdHsaOS())
+    return false;
+
   // We can fold offsets for anything that doesn't require a GOT relocation.
   return (GA->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS ||
           GA->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS ||
diff --git a/llvm/test/CodeGen/AMDGPU/global-constant.ll b/llvm/test/CodeGen/AMDGPU/global-constant.ll
index 336f012ec80e4..38b9c5df7faa1 100644
--- a/llvm/test/CodeGen/AMDGPU/global-constant.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-constant.ll
@@ -19,14 +19,14 @@
 ; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC1_HI]], private2@rel32@hi+12
 
 ; MESA uses absolute relocations.
-; GCN-MESA: s_add_u32 s2, s4, private1@abs32@lo
-; GCN-MESA: s_addc_u32 s3, s5, private1@abs32@hi
+; GCN-MESA: s_add_u32 s2, private1@abs32@lo, s4
+; GCN-MESA: s_addc_u32 s3, private1@abs32@hi, s5
 
 ; PAL uses absolute relocations.
-; GCN-PAL:    s_add_u32 s2, s4, private1@abs32@lo
-; GCN-PAL:    s_addc_u32 s3, s5, private1@abs32@hi
-; GCN-PAL:    s_add_u32 s4, s4, private2@abs32@lo
-; GCN-PAL:    s_addc_u32 s5, s5, private2@abs32@hi
+; GCN-PAL:    s_add_u32 s2, private1@abs32@lo, s4
+; GCN-PAL:    s_addc_u32 s3, private1@abs32@hi, s5
+; GCN-PAL:    s_add_u32 s4, private2@abs32@lo, s4
+; GCN-PAL:    s_addc_u32 s5, private2@abs32@hi, s5
 
 ; R600-LABEL: private_test
 define amdgpu_kernel void @private_test(i32 %index, ptr addrspace(1) %out) {
@@ -44,13 +44,13 @@ define amdgpu_kernel void @private_test(i32 %index, ptr addrspace(1) %out) {
 ; GCN-DEFAULT: s_add_u32 s{{[0-9]+}}, s[[PC0_LO]], available_externally@gotpcrel32@lo+4
 ; GCN-DEFAULT: s_addc_u32 s{{[0-9]+}}, s[[PC0_HI]], available_externally@gotpcrel32@hi+12
 
-; GCN-MESA:    s_mov_b32 s1, available_externally@abs32@hi+4
-; GCN-MESA:    s_mov_b32 s0, available_externally@abs32@lo+4
+; GCN-MESA:    s_mov_b32 s1, available_externally@abs32@hi
+; GCN-MESA:    s_mov_b32 s0, available_externally@abs32@lo
 
 ; R600-LABEL: available_externally_test
 
-; GCN-PAL:    s_mov_b32 s3, available_externally@abs32@hi+4
-; GCN-PAL:    s_mov_b32 s2, available_externally@abs32@lo+4
+; GCN-PAL:    s_mov_b32 s3, available_externally@abs32@hi
+; GCN-PAL:    s_mov_b32 s2, available_externally@abs32@lo
 define amdgpu_kernel void @available_externally_test(ptr addrspace(1) %out) {
   %ptr = getelementptr [256 x i32], ptr addrspace(4) @available_externally, i32 0, i32 1
   %val = load i32, ptr addrspace(4) %ptr
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
index c7cefb37281ed..41e8762311306 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.memcpy.ll
@@ -315,10 +315,9 @@ define amdgpu_kernel void @test_small_memcpy_i64_global_to_global_align16(ptr ad
 
 ; FUNC-LABEL: {{^}}test_memcpy_const_string_align4:
 ; SI: s_getpc_b64
-; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, hello.align4@rel32@lo+20
+; SI: s_add_u32 s{{[0-9]+}}, s{{[0-9]+}}, hello.align4@rel32@lo+4
 ; SI: s_addc_u32
-; SI-DAG: s_load_dwordx4
-; SI-DAG: s_load_dwordx4
+; SI-DAG: s_load_dwordx8
 ; SI-DAG: s_load_dwordx2
 ; SI-DAG: buffer_store_dwordx4
 ; SI-DAG: buffer_store_dwordx4
diff --git a/llvm/test/CodeGen/AMDGPU/rel32.ll b/llvm/test/CodeGen/AMDGPU/rel32.ll
index aa2d269523d80..59d64f3ad2b5e 100644
--- a/llvm/test/CodeGen/AMDGPU/rel32.ll
+++ b/llvm/test/CodeGen/AMDGPU/rel32.ll
@@ -6,9 +6,22 @@
 
 ; CHECK-LABEL: rel32_neg_offset:
 ; CHECK: s_getpc_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
-; CHECK-NEXT: s_add_u32 s[[LO]], s[[LO]], g@rel32@lo-4
-; CHECK-NEXT: s_addc_u32 s[[HI]], s[[HI]], g@rel32@hi+4
+; CHECK-NEXT: s_add_u32 s[[LO]], s[[LO]], g@rel32@lo+4
+; CHECK-NEXT: s_addc_u32 s[[HI]], s[[HI]], g@rel32@hi+12
+; CHECK-NEXT: s_add_u32 s[[LO]], s[[LO]], -16
+; CHECK-NEXT: s_addc_u32 s[[HI]], s[[HI]], -1
 define ptr addrspace(4) @rel32_neg_offset() {
-  %r = getelementptr i32, ptr addrspace(4) @g, i64 -2
+  %r = getelementptr i32, ptr addrspace(4) @g, i64 -4
+  ret ptr addrspace(4) %r
+}
+
+; CHECK-LABEL: rel32_large_offset:
+; CHECK: s_getpc_b64 s[[[LO:[0-9]+]]:[[HI:[0-9]+]]]
+; CHECK-NEXT: s_add_u32 s[[LO]], s[[LO]], g@rel32@lo+4
+; CHECK-NEXT: s_addc_u32 s[[HI]], s[[HI]], g@rel32@hi+12
+; CHECK-NEXT: s_add_u32 s[[LO]], s[[LO]], 0x502f9000
+; CHECK-NEXT: s_addc_u32 s[[HI]], s[[HI]], 9
+define ptr addrspace(4) @rel32_large_offset() {
+  %r = getelementptr i32, ptr addrspace(4) @g, i64 10000000000
   ret ptr addrspace(4) %r
 }

jayfoad · 2024-01-30T17:10:13Z

Possible future improvements:

Change PAL to use RELA relocations.
Failing that, implement some DAG combines to fold the offset into a GlobalAddressSDNode when we know that the resulting offset will fit.

nhaehnle

Ack

arsenm · 2024-01-31T08:32:28Z

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

@@ -7133,6 +7133,17 @@ SDValue SITargetLowering::lowerBUILD_VECTOR(SDValue Op,

 bool
 SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
+  // Subtargets that use ELF REL relocations (instead of RELA) can only store a


not a subtarget?

Can you suggest a better word? Subtarget->isAmdHsaOS() suggests that this is a property of the subtarget, or something else here is badly named.

The subtarget really shouldn't include this. It's the OS, which is top-level module/triple property

jayfoad · 2024-02-01T09:10:10Z

This change affected Mesa too. I have raised a Mesa issue to support RELA relocations so we could revert the Mesa part of this: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10542

llvmbot added the backend:AMDGPU label Jan 30, 2024

jayfoad requested review from nhaehnle, tsymalla and dstutt January 30, 2024 17:05

jayfoad requested review from trenouf and Flakebi January 30, 2024 17:05

nhaehnle approved these changes Jan 30, 2024

View reviewed changes

tsymalla approved these changes Jan 31, 2024

View reviewed changes

arsenm reviewed Jan 31, 2024

View reviewed changes

Tweak comment

4b746d7

jayfoad merged commit c2c650f into llvm:main Jan 31, 2024
3 of 4 checks passed

jayfoad deleted the pal-reloc-addend branch January 31, 2024 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Stop combining arbitrary offsets into PAL relocs #80034

[AMDGPU] Stop combining arbitrary offsets into PAL relocs #80034

jayfoad commented Jan 30, 2024

llvmbot commented Jan 30, 2024

jayfoad commented Jan 30, 2024

nhaehnle left a comment

arsenm Jan 31, 2024

jayfoad Jan 31, 2024

arsenm Jan 31, 2024

jayfoad commented Feb 1, 2024

[AMDGPU] Stop combining arbitrary offsets into PAL relocs #80034

[AMDGPU] Stop combining arbitrary offsets into PAL relocs #80034

Conversation

jayfoad commented Jan 30, 2024

llvmbot commented Jan 30, 2024

jayfoad commented Jan 30, 2024

nhaehnle left a comment

Choose a reason for hiding this comment

arsenm Jan 31, 2024

Choose a reason for hiding this comment

jayfoad Jan 31, 2024

Choose a reason for hiding this comment

arsenm Jan 31, 2024

Choose a reason for hiding this comment

jayfoad commented Feb 1, 2024