[RISCV] Disable gp relaxation if part of object unreachable #72655

nemanjai · 2023-11-17T14:59:55Z

Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable.

It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size.

In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where:
s - size of the object
ma - maximum alignment of the section that contains
the object.

Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone. This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable. It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size. In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where: s - size of the object ma - maximum alignment of the section that contains the object.

llvmbot · 2023-11-17T15:00:23Z

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Nemanja Ivanovic (nemanjai)

Changes

Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable.

It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size.

In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where:
s - size of the object
ma - maximum alignment of the section that contains
the object.

Full diff: https://github.com/llvm/llvm-project/pull/72655.diff

2 Files Affected:

(modified) lld/ELF/Arch/RISCV.cpp (+14)
(added) lld/test/ELF/riscv-relax-gp-partial.s (+58)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 6413dcd7dcd7976..d4c578934a0bb39 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
   if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
     return;
 
+  // The symbol may be accessed in multiple pieces. We need to make sure that
+  // all of the possible accesses are relaxed or none are. This prevents
+  // relaxing the hi relocation and being unable to relax one of the low
+  // relocations. The compiler will only access multiple pieces of an object
+  // with low relocations on the memory op if the alignment allows it.
+  // Therefore it should suffice to check that the smaller of the alignment
+  // and size can be reached from GP.
+  uint32_t alignAdjust =
+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
+
+  if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))
+    return;
+
   switch (r.type) {
   case R_RISCV_HI20:
     // Remove lui rd, %hi20(x).
diff --git a/lld/test/ELF/riscv-relax-gp-partial.s b/lld/test/ELF/riscv-relax-gp-partial.s
new file mode 100644
index 000000000000000..79984d9b65b7cfb
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-gp-partial.s
@@ -0,0 +1,58 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+
+# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax a.s -o rv32.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax a.s -o rv64.o
+
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32.o lds -o rv32
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64.o lds -o rv64
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32 | FileCheck %s
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64 | FileCheck %s
+ 
+# CHECK: 00000080 l       .data  {{0+}}08 Var0
+# CHECK: 00001000 l       .data  {{0+}}80 Var1
+# CHECK: 00000815 g       .sdata {{0+}}00 __global_pointer$
+
+# CHECK: <_start>:
+# CHECK-NOT:  lui
+# CHECK-NEXT: lw      a0, -1941(gp)
+# CHECK-NEXT: lw      a1, -1937(gp)
+# CHECK-NEXT: lui     a1, 1
+# CHECK-NEXT: lw      a0, 0(a1)
+# CHECK-NEXT: lw      a1, 124(a1)
+
+#--- a.s
+.global _start
+_start:
+        lui     a1, %hi(Var0)
+        lw      a0, %lo(Var0)(a1)
+        lw      a1, %lo(Var0+4)(a1)
+        lui     a1, %hi(Var1)
+        lw      a0, %lo(Var1)(a1)      # First part is reachable from gp
+        lw      a1, %lo(Var1+124)(a1)  # The second part is not reachable
+
+.section .rodata
+foo:
+  .space 1
+.section .sdata,"aw"
+  .space 1
+.section .data,"aw"
+  .p2align 3
+Var0:
+  .quad 0
+  .size   Var0, 8
+  .space 3960
+  .p2align 7
+Var1:
+  .quad 0
+  .zero 120
+  .size   Var1, 128
+
+#--- lds
+    SECTIONS {
+        .text : { }
+        .rodata : { }
+        .sdata : { }
+        .sbss : { }
+        .data : { }
+    }

nemanjai · 2023-11-17T15:15:52Z

Fixes #72405

MaskRay · 2023-11-18T06:06:28Z

How does this discussion address my question on #72405 (comment) ? I am not yet convinced that this is a linker problem.

nemanjai · 2023-11-20T14:35:14Z

How does this discussion address my question on #72405 (comment) ? I am not yet convinced that this is a linker problem.

I believe that @topperc has addressed that particular comment (i.e. the compiler checks the alignment of the object it is accessing). The address of an aligned object cannot reside in the range you mentioned (otherwise it would not be aligned).

This patch particularly addresses the issue where the object is aligned as required but GP is less aligned (since no requirement exists that GP be aligned).

MaskRay · 2023-11-21T16:53:35Z

That answer doesn't resolve my question. This test uses a very small alignment and cannot guarantee %hi(Var0) == %hi(Var0+4). As such, it cannot prove that this is an issue that should be done at the linker side rather than at the compiler side (possibly disable this %hi-sharing feature). The description does not say anything about the alignment value 4096 (1<<12).

It seems that we need some discussions on the right semantics possibly in riscv-elf-psabi-doc. A clarification on riscv-asm-manual may be useful as well to help hand-written assembly.

In addition, I asked for source code that leads to this code generation, which is not provided.

nemanjai · 2023-11-21T17:31:16Z

It seems that we need some discussions on the right semantics possibly in riscv-elf-psabi-doc. A clarification on riscv-asm-manual may be useful as well to help hand-written assembly.

I am certainly not against this.

In addition, I asked for source code that leads to this code generation, which is not provided.

double NaturallyAlignedScalar = 5.;
double accessNaturallyAlignedScalar() {
  // This will demonstrate the sharing of the upper portion.
  return NaturallyAlignedScalar;
}

double NaturallyAlignedArray[4] = { 3., 4., 5., 6. };
double accessNaturallyAlignedArray() {
  // Two halves of first element accessed together,
  // all pieces of all other elements accessed separately.
  return NaturallyAlignedArray[0] + NaturallyAlignedArray[3];
}

double __attribute__((aligned(512))) OveralignedArray[64];
double accessOveralignedArray() {
  // All pieces of the array accessed together.
  return OveralignedArray[0] + OveralignedArray[63];
}

// Compile with clang -O2 --target=riscv32

That answer doesn't resolve my question. This test uses a very small alignment and cannot guarantee %hi(Var0) == %hi(Var0+4). As such, it cannot prove that this is an issue that should be done at the linker side rather than at the compiler side ...

I don't follow the logic here. I don't understand your claim here:

IIUC this requires that the compiler can prove %hi(Var0) == %hi(Var0+4),
i.e. Var0%4096 does not reside in the range [0x7fc,0x800).
A necessary but not sufficient condition is that Var0's section's alignment is at least 4096.

Why is it necessary that Var0's section's alignment be at least 4096? An object that is aligned at 8 bytes cannot possibly reside in the address range that you mentioned. The only valid addresses for an 8-byte aligned object near the range you mentioned are 0x7f8 and 0x800 and neither is in the range you mentioned.

(possibly disable this %hi-sharing feature).

This would pessimize every access and we would probably need to implement an optimization in the linker to get rid of the redundant addi's. And I am not sure what problem it would solve.

The description does not say anything about the alignment value 4096 (1<<12).

Please elaborate. I can add anything to the description that you'd like me to add. I just don't understand what you're after.

MaskRay · 2023-11-22T16:38:09Z

lld/ELF/Arch/RISCV.cpp

+  // and size can be reached from GP.
+  uint32_t alignAdjust =
+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());


alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); is not tested

Ah yes, I added a test now.

If addend < st_size, it seems that this can be improved to min(addralign, st_size-addend)

Ah, yes, I think that is indeed the case.

Actually, this is only true if we assume that the addend on the LO12 will be equal or greater than the addend on the HI20.
Ultimately, we need to ensure that the entire window of
min(st_size, st_align) bytes that contains the relocation is reachable from GP.

MaskRay · 2023-11-22T16:39:23Z

lld/ELF/Arch/RISCV.cpp

@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
  if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
    return;

+  // The symbol may be accessed in multiple pieces. We need to make sure that


The symbol may be accessed in multiple pieces with different addends.

Fixed. Thanks.

MaskRay · 2023-11-22T16:40:43Z

lld/ELF/Arch/RISCV.cpp

+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
+
+  if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))


This condition guards against

lui a1, %hi(var) lw a0, %lo(var)(a1) lw a1, %lo(var+4)(a1)

but not

lui a1, %hi(var+4) lw a0, %lo(var+4)(a1) lw a1, %lo(var)(a1)

This is absolutely true. If the addend on the HI20 is higher than the addend on the LO12, there is no guarantee whatsoever that we can't get into the same situation (relaxing and removing the HI instruction even though we need it for the LO instruction). Of course, this isn't new with this patch, the issue already existed.
I think that to be on the safe side, we should reject a relaxation of the HI20 if the addend is non-zero. What do you think?

I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.

MaskRay · 2023-11-22T17:18:39Z

lld/test/ELF/riscv-relax-gp-partial.s

+  .size   Var1, 128
+
+#--- lds
+    SECTIONS {


Place SECTIONS at column 0 and indent the body by 2.

Oh, yup. Sorry.

MaskRay · 2023-11-22T17:20:37Z

Thanks for the C source example. It demonstrates the problem and justifies %hi-sharing codegen by the compiler.
This is exactly the information the description can have to clarify the issue.

I think you can edit the description to include the following

double NaturallyAlignedScalar = 5.;
double accessNaturallyAlignedScalar() { return NaturallyAlignedScalar; }
//      lui     a1, %hi(NaturallyAlignedScalar)
//      lw      a0, %lo(NaturallyAlignedScalar)(a1)
//      lw      a1, %lo(NaturallyAlignedScalar+4)(a1)

double NaturallyAlignedArray[4] = { 3., 4., 5., 6. };
double accessNaturallyAlignedArray() {
  return NaturallyAlignedArray[0] + NaturallyAlignedArray[3];
}
//      lui     a2, %hi(NaturallyAlignedArray)
//      lw      a0, %lo(NaturallyAlignedArray)(a2)
//      lw      a1, %lo(NaturallyAlignedArray+4)(a2)
//      addi    a3, a2, %lo(NaturallyAlignedArray)
//      lw      a2, 24(a3)
//      lw      a3, 28(a3)

For the accessNaturallyAlignedScalar codegen, the compiler must prove that %hi(NaturallyAlignedScalar) = %hi(NaturallyAlignedScalar+4), i.e. NaturallyAlignedScalar%0x800 is out of [0x7fc,0x800).
This requirement is satisfied because NaturallyAlignedScalar is aligned by 8.

When I commented on [0x7fc,0x800) the first time, I did not think deeply.
The alignment makes the condition satisfied. This analysis is important and should be included in the description to justify that the compiler codegen can reuse %hi20.

I have created riscv-non-isa/riscv-elf-psabi-doc#408 to clarify that
multiple R_RISCV_LO12_I/R_RISCV_LO12_S can share one single R_RISCV_HI20.

The patch handles

// when gp < var
lui     a1, %hi(var)
lw      a0, %lo(var)(a1)
lw      a1, %lo(var+4)(a1)

but not

// when var < gp
lui     a1, %hi(var+4)
lw      a0, %lo(var+4)(a1)
lw      a1, %lo(var)(a1)

We need a test for the latter case as well.

- Add a test that rejects the relaxation due to alignment (smaller than object size) - Fix the adjustment to be 1 byte less since that is the minimum requirement for addressability - Add code to reject relaxing the HI20 relocation when the addend is non-zero (i.e. check the range HI20 +/- adjustment) - Restrict the code that rejects relaxation to the relaxation of HI20 since the LO12 can still proceed as nothing depends on it

github-actions · 2023-11-27T15:30:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MaskRay · 2023-11-27T15:30:55Z

lld/test/ELF/riscv-relax-gp-partial-align-min.s

@@ -0,0 +1,40 @@
+# REQUIRES: riscv


Do we need lld/test/ELF/riscv-relax-gp-* files? Can they be folded into two files or even one?

I was actually wondering the same. I don't have much experience writing these. I'll give it a shot.

nemanjai

Sorry about the delay in addressing these comments.

nemanjai · 2023-11-27T15:28:07Z

lld/ELF/Arch/RISCV.cpp

+  // and size can be reached from GP.
+  uint32_t alignAdjust =
+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());


Ah yes, I added a test now.

nemanjai · 2023-11-27T15:28:43Z

lld/ELF/Arch/RISCV.cpp

@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
  if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
    return;

+  // The symbol may be accessed in multiple pieces. We need to make sure that


Fixed. Thanks.

nemanjai · 2023-11-27T15:29:32Z

lld/ELF/Arch/RISCV.cpp

+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
+
+  if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))


I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.

nemanjai · 2023-11-27T15:29:45Z

lld/test/ELF/riscv-relax-gp-partial.s

+  .size   Var1, 128
+
+#--- lds
+    SECTIONS {


Oh, yup. Sorry.

nemanjai · 2023-12-07T11:35:29Z

Ping.

asb · 2023-12-07T14:21:40Z

Resigning as a reviewer as I don't think I can add anything beyond what @MaskRay or @jrtc27 could, who are best placed to review.

MaskRay · 2023-12-15T07:09:52Z

lld/ELF/Arch/RISCV.cpp

+    uint64_t hiAddr = r.sym->getVA(r.addend);
+    // If the addend is zero, the LO12 relocations can only be accessing the
+    // range [base, base+alignAdjust] (where base == r.sym->getVA()).
+    if (r.addend == 0 && !isInt<12>(hiAddr + alignAdjust - gp->getVA()))


R_RISCV_HI20 and R_RISCV_LO12_I/R_RISCV_LO12_S should have consistent decisions on whether to do relaxation. Placing the condition only at R_RISCV_HI20 could lead to inconsistent results. I think the condition should be moved closer to if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA())) above.

The logic here was that only this direction involves a functional problem. Relaxing the LO12 and not relaxing the HI20 leaves a redundant HI20, but that's about it. Whereas the other way is the crux of the problem. However, if you'd prefer that I implement the LO12 pessimistically as well, I am not against it.

If we retain lui/HI20 but relax addi/LO12_i, the output will be lui a0, ...; addi a0, gp, ... with a redundant lui. I think it's confusing to leave lui there and the addi change has no benefit anyway. Therefore, I prefer that we disable the relaxation completely, which aligns with GNU ld.

MaskRay · 2023-12-15T07:17:39Z

lld/ELF/Arch/RISCV.cpp

+
+    // However, if the addend is non-zero, the LO12 relocations may be accessing
+    // the range [HI-alignAdjust-1, HI+alignAdjust].
+    if (r.addend != 0 && (!isInt<12>(hiAddr - alignAdjust - 1 - gp->getVA()) ||


The addend ==0 and !=0 distinction appears to make the condition less strict and utilize pointer provenance: for char p[1], q[1];, we can't use q-1 to get p even if q is placed after p.

At the object file format level, I wonder whether we should optimize the addend==0 case (GNU ld doesn't).
I understand that optimizing the addend==0 case allows us to relax the first lui in riscv-relax-hi20-lo12.s.

I suppose we could check the negative range even in the addend==0 case with the assumption that the compiler won't allow for out-of-bounds access as you describe here. This is just another one of those assumptions that the linker makes about what the compiler will or will not do.
There are of course other ways we can accomplish this:

Disable relaxation if we see a negative addend on a LO12 relaxation

Keep track of symbols that are accessed with such negative addends and disable relaxation for relocations referencing those symbols

Emit a fatal error if such a negative addend is used

I want to hear from others about the reliability.

Ultimately I hope that we can derive some rules that are simple (ideally no addend==0/addend!=0 distinction), even if we give up some legitimate opportunities.

How about if I just change it to reject relaxation for negative addends? That way, we always look only forward? Would that be too pessimistic?

MaskRay · 2023-12-15T07:29:37Z

lld/test/ELF/riscv-relax-gp-edges.s

+# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax c.s -o rv32c.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax c.s -o rv64c.o
+
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32a.o lds.a -o rv32a


a.lds b.lds c.lds instead of lds.a lds.b lds.c

Switch to using a somewhat loosely defined concept of accessible blocks of memory to determine if all allowable relocations are relaxable.

nemanjai requested review from asb, MaskRay, jrtc27 and topperc November 17, 2023 14:59

llvmbot added lld lld:ELF labels Nov 17, 2023

nemanjai mentioned this pull request Nov 17, 2023

[RISCV] GP relaxation broken if accesses straddle the boundary #72405

Open

MaskRay reviewed Nov 22, 2023

View reviewed changes

MaskRay reviewed Nov 27, 2023

View reviewed changes

nemanjai commented Nov 27, 2023

View reviewed changes

Nemanja Ivanovic added 2 commits November 27, 2023 16:31

Run clang-format.

b8bd95f

Consolidate tests into a single file.

c4667aa

asb removed their request for review December 7, 2023 14:21

MaskRay reviewed Dec 15, 2023

View reviewed changes

Simplify relaxation decision

075b209

Switch to using a somewhat loosely defined concept of accessible blocks of memory to determine if all allowable relocations are relaxable.

[RISCV] Disable gp relaxation if part of object unreachable #72655

Are you sure you want to change the base?

[RISCV] Disable gp relaxation if part of object unreachable #72655

Conversation

nemanjai commented Nov 17, 2023

llvmbot commented Nov 17, 2023 • edited

nemanjai commented Nov 17, 2023

MaskRay commented Nov 18, 2023

nemanjai commented Nov 20, 2023

MaskRay commented Nov 21, 2023 • edited

nemanjai commented Nov 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaskRay commented Nov 22, 2023

github-actions bot commented Nov 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemanjai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemanjai commented Dec 7, 2023

asb commented Dec 7, 2023

MaskRay Dec 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaskRay Dec 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvmbot commented Nov 17, 2023 •

edited

MaskRay commented Nov 21, 2023 •

edited

github-actions bot commented Nov 27, 2023 •

edited

MaskRay Dec 15, 2023 •

edited

MaskRay Dec 15, 2023 •

edited