Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Disable gp relaxation if part of object unreachable #72655

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nemanjai
Copy link
Member

Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable.

It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size.

In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where:
s - size of the object
ma - maximum alignment of the section that contains
the object.

Linker gp relaxation is greedy. It will eliminate the LUI
with R_RISCV_HI20 if the base of the object is reachable
from the gp. The relaxation on the R_RISCV_LO12 will be
rejected if it is not reachable, but that is too late if
the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of
the object that can be relocated together is not reachable.

It is important that this does not necessarily mean the
size of the object since the size doesn't matter if its
alignment is smaller than its size.

In order to achieve correctness without excessively
pessimizing relaxation for large objects, relaxation
is rejected if the base of the object + min(s, ma) is
not reachable from gp, where:
s  - size of the object
ma - maximum alignment of the section that contains
     the object.
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 17, 2023

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-elf

Author: Nemanja Ivanovic (nemanjai)

Changes

Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable.

It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size.

In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where:
s - size of the object
ma - maximum alignment of the section that contains
the object.


Full diff: https://github.com/llvm/llvm-project/pull/72655.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/RISCV.cpp (+14)
  • (added) lld/test/ELF/riscv-relax-gp-partial.s (+58)
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 6413dcd7dcd7976..d4c578934a0bb39 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
   if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
     return;
 
+  // The symbol may be accessed in multiple pieces. We need to make sure that
+  // all of the possible accesses are relaxed or none are. This prevents
+  // relaxing the hi relocation and being unable to relax one of the low
+  // relocations. The compiler will only access multiple pieces of an object
+  // with low relocations on the memory op if the alignment allows it.
+  // Therefore it should suffice to check that the smaller of the alignment
+  // and size can be reached from GP.
+  uint32_t alignAdjust =
+      r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+  alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
+
+  if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))
+    return;
+
   switch (r.type) {
   case R_RISCV_HI20:
     // Remove lui rd, %hi20(x).
diff --git a/lld/test/ELF/riscv-relax-gp-partial.s b/lld/test/ELF/riscv-relax-gp-partial.s
new file mode 100644
index 000000000000000..79984d9b65b7cfb
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-gp-partial.s
@@ -0,0 +1,58 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+
+# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax a.s -o rv32.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax a.s -o rv64.o
+
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32.o lds -o rv32
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64.o lds -o rv64
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32 | FileCheck %s
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64 | FileCheck %s
+ 
+# CHECK: 00000080 l       .data  {{0+}}08 Var0
+# CHECK: 00001000 l       .data  {{0+}}80 Var1
+# CHECK: 00000815 g       .sdata {{0+}}00 __global_pointer$
+
+# CHECK: <_start>:
+# CHECK-NOT:  lui
+# CHECK-NEXT: lw      a0, -1941(gp)
+# CHECK-NEXT: lw      a1, -1937(gp)
+# CHECK-NEXT: lui     a1, 1
+# CHECK-NEXT: lw      a0, 0(a1)
+# CHECK-NEXT: lw      a1, 124(a1)
+
+#--- a.s
+.global _start
+_start:
+        lui     a1, %hi(Var0)
+        lw      a0, %lo(Var0)(a1)
+        lw      a1, %lo(Var0+4)(a1)
+        lui     a1, %hi(Var1)
+        lw      a0, %lo(Var1)(a1)      # First part is reachable from gp
+        lw      a1, %lo(Var1+124)(a1)  # The second part is not reachable
+
+.section .rodata
+foo:
+  .space 1
+.section .sdata,"aw"
+  .space 1
+.section .data,"aw"
+  .p2align 3
+Var0:
+  .quad 0
+  .size   Var0, 8
+  .space 3960
+  .p2align 7
+Var1:
+  .quad 0
+  .zero 120
+  .size   Var1, 128
+
+#--- lds
+    SECTIONS {
+        .text : { }
+        .rodata : { }
+        .sdata : { }
+        .sbss : { }
+        .data : { }
+    }

@nemanjai
Copy link
Member Author

Fixes #72405

@MaskRay
Copy link
Member

MaskRay commented Nov 18, 2023

How does this discussion address my question on #72405 (comment) ? I am not yet convinced that this is a linker problem.

@nemanjai
Copy link
Member Author

How does this discussion address my question on #72405 (comment) ? I am not yet convinced that this is a linker problem.

I believe that @topperc has addressed that particular comment (i.e. the compiler checks the alignment of the object it is accessing). The address of an aligned object cannot reside in the range you mentioned (otherwise it would not be aligned).

This patch particularly addresses the issue where the object is aligned as required but GP is less aligned (since no requirement exists that GP be aligned).

@MaskRay
Copy link
Member

MaskRay commented Nov 21, 2023

That answer doesn't resolve my question. This test uses a very small alignment and cannot guarantee %hi(Var0) == %hi(Var0+4). As such, it cannot prove that this is an issue that should be done at the linker side rather than at the compiler side (possibly disable this %hi-sharing feature). The description does not say anything about the alignment value 4096 (1<<12).

It seems that we need some discussions on the right semantics possibly in riscv-elf-psabi-doc. A clarification on riscv-asm-manual may be useful as well to help hand-written assembly.


In addition, I asked for source code that leads to this code generation, which is not provided.

@nemanjai
Copy link
Member Author

It seems that we need some discussions on the right semantics possibly in riscv-elf-psabi-doc. A clarification on riscv-asm-manual may be useful as well to help hand-written assembly.

I am certainly not against this.

In addition, I asked for source code that leads to this code generation, which is not provided.

double NaturallyAlignedScalar = 5.;
double accessNaturallyAlignedScalar() {
  // This will demonstrate the sharing of the upper portion.
  return NaturallyAlignedScalar;
}

double NaturallyAlignedArray[4] = { 3., 4., 5., 6. };
double accessNaturallyAlignedArray() {
  // Two halves of first element accessed together,
  // all pieces of all other elements accessed separately.
  return NaturallyAlignedArray[0] + NaturallyAlignedArray[3];
}

double __attribute__((aligned(512))) OveralignedArray[64];
double accessOveralignedArray() {
  // All pieces of the array accessed together.
  return OveralignedArray[0] + OveralignedArray[63];
}

// Compile with clang -O2 --target=riscv32

That answer doesn't resolve my question. This test uses a very small alignment and cannot guarantee %hi(Var0) == %hi(Var0+4). As such, it cannot prove that this is an issue that should be done at the linker side rather than at the compiler side ...

I don't follow the logic here. I don't understand your claim here:

IIUC this requires that the compiler can prove %hi(Var0) == %hi(Var0+4),
i.e. Var0%4096 does not reside in the range [0x7fc,0x800).
A necessary but not sufficient condition is that Var0's section's alignment is at least 4096.

Why is it necessary that Var0's section's alignment be at least 4096? An object that is aligned at 8 bytes cannot possibly reside in the address range that you mentioned. The only valid addresses for an 8-byte aligned object near the range you mentioned are 0x7f8 and 0x800 and neither is in the range you mentioned.

(possibly disable this %hi-sharing feature).

This would pessimize every access and we would probably need to implement an optimization in the linker to get rid of the redundant addi's. And I am not sure what problem it would solve.

The description does not say anything about the alignment value 4096 (1<<12).

Please elaborate. I can add anything to the description that you'd like me to add. I just don't understand what you're after.

// and size can be reached from GP.
uint32_t alignAdjust =
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); is not tested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I added a test now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If addend < st_size, it seems that this can be improved to min(addralign, st_size-addend)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I think that is indeed the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is only true if we assume that the addend on the LO12 will be equal or greater than the addend on the HI20.
Ultimately, we need to ensure that the entire window of
min(st_size, st_align) bytes that contains the relocation is reachable from GP.

@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
return;

// The symbol may be accessed in multiple pieces. We need to make sure that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol may be accessed in multiple pieces with different addends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());

if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition guards against

lui     a1, %hi(var)
lw      a0, %lo(var)(a1)
lw      a1, %lo(var+4)(a1)

but not

lui     a1, %hi(var+4)
lw      a0, %lo(var+4)(a1)
lw      a1, %lo(var)(a1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely true. If the addend on the HI20 is higher than the addend on the LO12, there is no guarantee whatsoever that we can't get into the same situation (relaxing and removing the HI instruction even though we need it for the LO instruction). Of course, this isn't new with this patch, the issue already existed.
I think that to be on the safe side, we should reject a relaxation of the HI20 if the addend is non-zero. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.

.size Var1, 128

#--- lds
SECTIONS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place SECTIONS at column 0 and indent the body by 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yup. Sorry.

@MaskRay
Copy link
Member

MaskRay commented Nov 22, 2023

Thanks for the C source example. It demonstrates the problem and justifies %hi-sharing codegen by the compiler.
This is exactly the information the description can have to clarify the issue.

I think you can edit the description to include the following

double NaturallyAlignedScalar = 5.;
double accessNaturallyAlignedScalar() { return NaturallyAlignedScalar; }
//      lui     a1, %hi(NaturallyAlignedScalar)
//      lw      a0, %lo(NaturallyAlignedScalar)(a1)
//      lw      a1, %lo(NaturallyAlignedScalar+4)(a1)

double NaturallyAlignedArray[4] = { 3., 4., 5., 6. };
double accessNaturallyAlignedArray() {
  return NaturallyAlignedArray[0] + NaturallyAlignedArray[3];
}
//      lui     a2, %hi(NaturallyAlignedArray)
//      lw      a0, %lo(NaturallyAlignedArray)(a2)
//      lw      a1, %lo(NaturallyAlignedArray+4)(a2)
//      addi    a3, a2, %lo(NaturallyAlignedArray)
//      lw      a2, 24(a3)
//      lw      a3, 28(a3)

For the accessNaturallyAlignedScalar codegen, the compiler must prove that %hi(NaturallyAlignedScalar) = %hi(NaturallyAlignedScalar+4), i.e. NaturallyAlignedScalar%0x800 is out of [0x7fc,0x800).
This requirement is satisfied because NaturallyAlignedScalar is aligned by 8.


When I commented on [0x7fc,0x800) the first time, I did not think deeply.
The alignment makes the condition satisfied. This analysis is important and should be included in the description to justify that the compiler codegen can reuse %hi20.

I have created riscv-non-isa/riscv-elf-psabi-doc#408 to clarify that
multiple R_RISCV_LO12_I/R_RISCV_LO12_S can share one single R_RISCV_HI20.


The patch handles

// when gp < var
lui     a1, %hi(var)
lw      a0, %lo(var)(a1)
lw      a1, %lo(var+4)(a1)

but not

// when var < gp
lui     a1, %hi(var+4)
lw      a0, %lo(var+4)(a1)
lw      a1, %lo(var)(a1)

We need a test for the latter case as well.

- Add a test that rejects the relaxation due to alignment (smaller
  than object size)
- Fix the adjustment to be 1 byte less since that is the minimum
  requirement for addressability
- Add code to reject relaxing the HI20 relocation when the addend
  is non-zero (i.e. check the range HI20 +/- adjustment)
- Restrict the code that rejects relaxation to the relaxation of
  HI20 since the LO12 can still proceed as nothing depends on it
Copy link

github-actions bot commented Nov 27, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -0,0 +1,40 @@
# REQUIRES: riscv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need lld/test/ELF/riscv-relax-gp-* files? Can they be folded into two files or even one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually wondering the same. I don't have much experience writing these. I'll give it a shot.

Copy link
Member Author

@nemanjai nemanjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay in addressing these comments.

// and size can be reached from GP.
uint32_t alignAdjust =
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I added a test now.

@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
return;

// The symbol may be accessed in multiple pieces. We need to make sure that
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());

if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.

.size Var1, 128

#--- lds
SECTIONS {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yup. Sorry.

@nemanjai
Copy link
Member Author

nemanjai commented Dec 7, 2023

Ping.

@asb asb removed their request for review December 7, 2023 14:21
@asb
Copy link
Contributor

asb commented Dec 7, 2023

Resigning as a reviewer as I don't think I can add anything beyond what @MaskRay or @jrtc27 could, who are best placed to review.

uint64_t hiAddr = r.sym->getVA(r.addend);
// If the addend is zero, the LO12 relocations can only be accessing the
// range [base, base+alignAdjust] (where base == r.sym->getVA()).
if (r.addend == 0 && !isInt<12>(hiAddr + alignAdjust - gp->getVA()))
Copy link
Member

@MaskRay MaskRay Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R_RISCV_HI20 and R_RISCV_LO12_I/R_RISCV_LO12_S should have consistent decisions on whether to do relaxation. Placing the condition only at R_RISCV_HI20 could lead to inconsistent results. I think the condition should be moved closer to if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA())) above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here was that only this direction involves a functional problem. Relaxing the LO12 and not relaxing the HI20 leaves a redundant HI20, but that's about it. Whereas the other way is the crux of the problem. However, if you'd prefer that I implement the LO12 pessimistically as well, I am not against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we retain lui/HI20 but relax addi/LO12_i, the output will be lui a0, ...; addi a0, gp, ... with a redundant lui. I think it's confusing to leave lui there and the addi change has no benefit anyway. Therefore, I prefer that we disable the relaxation completely, which aligns with GNU ld.


// However, if the addend is non-zero, the LO12 relocations may be accessing
// the range [HI-alignAdjust-1, HI+alignAdjust].
if (r.addend != 0 && (!isInt<12>(hiAddr - alignAdjust - 1 - gp->getVA()) ||
Copy link
Member

@MaskRay MaskRay Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addend ==0 and !=0 distinction appears to make the condition less strict and utilize pointer provenance: for char p[1], q[1];, we can't use q-1 to get p even if q is placed after p.

At the object file format level, I wonder whether we should optimize the addend==0 case (GNU ld doesn't).
I understand that optimizing the addend==0 case allows us to relax the first lui in riscv-relax-hi20-lo12.s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could check the negative range even in the addend==0 case with the assumption that the compiler won't allow for out-of-bounds access as you describe here. This is just another one of those assumptions that the linker makes about what the compiler will or will not do.
There are of course other ways we can accomplish this:

  • Disable relaxation if we see a negative addend on a LO12 relaxation
  • Keep track of symbols that are accessed with such negative addends and disable relaxation for relocations referencing those symbols
  • Emit a fatal error if such a negative addend is used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to hear from others about the reliability.

Ultimately I hope that we can derive some rules that are simple (ideally no addend==0/addend!=0 distinction), even if we give up some legitimate opportunities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if I just change it to reject relaxation for negative addends? That way, we always look only forward? Would that be too pessimistic?

# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax c.s -o rv32c.o
# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax c.s -o rv64c.o

# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32a.o lds.a -o rv32a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.lds b.lds c.lds instead of lds.a lds.b lds.c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Switch to using a somewhat loosely defined concept of accessible
blocks of memory to determine if all allowable relocations are
relaxable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants