-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Disable gp relaxation if part of object unreachable #72655
base: main
Are you sure you want to change the base?
Conversation
Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone. This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable. It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size. In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where: s - size of the object ma - maximum alignment of the section that contains the object.
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-elf Author: Nemanja Ivanovic (nemanjai) ChangesLinker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone. It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size. In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where: Full diff: https://github.com/llvm/llvm-project/pull/72655.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 6413dcd7dcd7976..d4c578934a0bb39 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
return;
+ // The symbol may be accessed in multiple pieces. We need to make sure that
+ // all of the possible accesses are relaxed or none are. This prevents
+ // relaxing the hi relocation and being unable to relax one of the low
+ // relocations. The compiler will only access multiple pieces of an object
+ // with low relocations on the memory op if the alignment allows it.
+ // Therefore it should suffice to check that the smaller of the alignment
+ // and size can be reached from GP.
+ uint32_t alignAdjust =
+ r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
+ alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
+
+ if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA()))
+ return;
+
switch (r.type) {
case R_RISCV_HI20:
// Remove lui rd, %hi20(x).
diff --git a/lld/test/ELF/riscv-relax-gp-partial.s b/lld/test/ELF/riscv-relax-gp-partial.s
new file mode 100644
index 000000000000000..79984d9b65b7cfb
--- /dev/null
+++ b/lld/test/ELF/riscv-relax-gp-partial.s
@@ -0,0 +1,58 @@
+# REQUIRES: riscv
+# RUN: rm -rf %t && split-file %s %t && cd %t
+
+# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax a.s -o rv32.o
+# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax a.s -o rv64.o
+
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32.o lds -o rv32
+# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64.o lds -o rv64
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32 | FileCheck %s
+# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64 | FileCheck %s
+
+# CHECK: 00000080 l .data {{0+}}08 Var0
+# CHECK: 00001000 l .data {{0+}}80 Var1
+# CHECK: 00000815 g .sdata {{0+}}00 __global_pointer$
+
+# CHECK: <_start>:
+# CHECK-NOT: lui
+# CHECK-NEXT: lw a0, -1941(gp)
+# CHECK-NEXT: lw a1, -1937(gp)
+# CHECK-NEXT: lui a1, 1
+# CHECK-NEXT: lw a0, 0(a1)
+# CHECK-NEXT: lw a1, 124(a1)
+
+#--- a.s
+.global _start
+_start:
+ lui a1, %hi(Var0)
+ lw a0, %lo(Var0)(a1)
+ lw a1, %lo(Var0+4)(a1)
+ lui a1, %hi(Var1)
+ lw a0, %lo(Var1)(a1) # First part is reachable from gp
+ lw a1, %lo(Var1+124)(a1) # The second part is not reachable
+
+.section .rodata
+foo:
+ .space 1
+.section .sdata,"aw"
+ .space 1
+.section .data,"aw"
+ .p2align 3
+Var0:
+ .quad 0
+ .size Var0, 8
+ .space 3960
+ .p2align 7
+Var1:
+ .quad 0
+ .zero 120
+ .size Var1, 128
+
+#--- lds
+ SECTIONS {
+ .text : { }
+ .rodata : { }
+ .sdata : { }
+ .sbss : { }
+ .data : { }
+ }
|
Fixes #72405 |
How does this discussion address my question on #72405 (comment) ? I am not yet convinced that this is a linker problem. |
I believe that @topperc has addressed that particular comment (i.e. the compiler checks the alignment of the object it is accessing). The address of an aligned object cannot reside in the range you mentioned (otherwise it would not be aligned). This patch particularly addresses the issue where the object is aligned as required but GP is less aligned (since no requirement exists that GP be aligned). |
That answer doesn't resolve my question. This test uses a very small alignment and cannot guarantee It seems that we need some discussions on the right semantics possibly in riscv-elf-psabi-doc. A clarification on riscv-asm-manual may be useful as well to help hand-written assembly. In addition, I asked for source code that leads to this code generation, which is not provided. |
I am certainly not against this.
I don't follow the logic here. I don't understand your claim here:
Why is it necessary that
This would pessimize every access and we would probably need to implement an optimization in the linker to get rid of the redundant
Please elaborate. I can add anything to the description that you'd like me to add. I just don't understand what you're after. |
lld/ELF/Arch/RISCV.cpp
Outdated
// and size can be reached from GP. | ||
uint32_t alignAdjust = | ||
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0; | ||
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
is not tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I added a test now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If addend < st_size
, it seems that this can be improved to min(addralign, st_size-addend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I think that is indeed the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is only true if we assume that the addend on the LO12 will be equal or greater than the addend on the HI20.
Ultimately, we need to ensure that the entire window of
min(st_size, st_align)
bytes that contains the relocation is reachable from GP.
lld/ELF/Arch/RISCV.cpp
Outdated
@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc, | |||
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA())) | |||
return; | |||
|
|||
// The symbol may be accessed in multiple pieces. We need to make sure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symbol may be accessed in multiple pieces with different addends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
lld/ELF/Arch/RISCV.cpp
Outdated
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0; | ||
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); | ||
|
||
if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition guards against
lui a1, %hi(var)
lw a0, %lo(var)(a1)
lw a1, %lo(var+4)(a1)
but not
lui a1, %hi(var+4)
lw a0, %lo(var+4)(a1)
lw a1, %lo(var)(a1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely true. If the addend on the HI20
is higher than the addend on the LO12
, there is no guarantee whatsoever that we can't get into the same situation (relaxing and removing the HI instruction even though we need it for the LO instruction). Of course, this isn't new with this patch, the issue already existed.
I think that to be on the safe side, we should reject a relaxation of the HI20
if the addend is non-zero. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.
.size Var1, 128 | ||
|
||
#--- lds | ||
SECTIONS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place SECTIONS
at column 0 and indent the body by 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yup. Sorry.
Thanks for the C source example. It demonstrates the problem and justifies %hi-sharing codegen by the compiler. I think you can edit the description to include the following
For the When I commented on [0x7fc,0x800) the first time, I did not think deeply. I have created riscv-non-isa/riscv-elf-psabi-doc#408 to clarify that The patch handles
but not
We need a test for the latter case as well. |
- Add a test that rejects the relaxation due to alignment (smaller than object size) - Fix the adjustment to be 1 byte less since that is the minimum requirement for addressability - Add code to reject relaxing the HI20 relocation when the addend is non-zero (i.e. check the range HI20 +/- adjustment) - Restrict the code that rejects relaxation to the relaxation of HI20 since the LO12 can still proceed as nothing depends on it
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -0,0 +1,40 @@ | |||
# REQUIRES: riscv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need lld/test/ELF/riscv-relax-gp-*
files? Can they be folded into two files or even one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually wondering the same. I don't have much experience writing these. I'll give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay in addressing these comments.
lld/ELF/Arch/RISCV.cpp
Outdated
// and size can be reached from GP. | ||
uint32_t alignAdjust = | ||
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0; | ||
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I added a test now.
lld/ELF/Arch/RISCV.cpp
Outdated
@@ -651,6 +651,20 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc, | |||
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA())) | |||
return; | |||
|
|||
// The symbol may be accessed in multiple pieces. We need to make sure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
lld/ELF/Arch/RISCV.cpp
Outdated
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0; | ||
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); | ||
|
||
if (!isInt<12>(r.sym->getVA() + alignAdjust - gp->getVA())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually opted to just check the range of +/- adjustment since any LO12 relocations that depend on this are only allowed to access this range.
.size Var1, 128 | ||
|
||
#--- lds | ||
SECTIONS { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yup. Sorry.
Ping. |
lld/ELF/Arch/RISCV.cpp
Outdated
uint64_t hiAddr = r.sym->getVA(r.addend); | ||
// If the addend is zero, the LO12 relocations can only be accessing the | ||
// range [base, base+alignAdjust] (where base == r.sym->getVA()). | ||
if (r.addend == 0 && !isInt<12>(hiAddr + alignAdjust - gp->getVA())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R_RISCV_HI20 and R_RISCV_LO12_I/R_RISCV_LO12_S should have consistent decisions on whether to do relaxation. Placing the condition only at R_RISCV_HI20 could lead to inconsistent results. I think the condition should be moved closer to if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here was that only this direction involves a functional problem. Relaxing the LO12
and not relaxing the HI20
leaves a redundant HI20
, but that's about it. Whereas the other way is the crux of the problem. However, if you'd prefer that I implement the LO12
pessimistically as well, I am not against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we retain lui/HI20 but relax addi/LO12_i, the output will be lui a0, ...; addi a0, gp, ...
with a redundant lui
. I think it's confusing to leave lui
there and the addi
change has no benefit anyway. Therefore, I prefer that we disable the relaxation completely, which aligns with GNU ld.
lld/ELF/Arch/RISCV.cpp
Outdated
|
||
// However, if the addend is non-zero, the LO12 relocations may be accessing | ||
// the range [HI-alignAdjust-1, HI+alignAdjust]. | ||
if (r.addend != 0 && (!isInt<12>(hiAddr - alignAdjust - 1 - gp->getVA()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addend ==0 and !=0 distinction appears to make the condition less strict and utilize pointer provenance: for char p[1], q[1];
, we can't use q-1 to get p even if q is placed after p.
At the object file format level, I wonder whether we should optimize the addend==0
case (GNU ld doesn't).
I understand that optimizing the addend==0
case allows us to relax the first lui
in riscv-relax-hi20-lo12.s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could check the negative range even in the addend==0
case with the assumption that the compiler won't allow for out-of-bounds access as you describe here. This is just another one of those assumptions that the linker makes about what the compiler will or will not do.
There are of course other ways we can accomplish this:
- Disable relaxation if we see a negative addend on a
LO12
relaxation - Keep track of symbols that are accessed with such negative addends and disable relaxation for relocations referencing those symbols
- Emit a fatal error if such a negative addend is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to hear from others about the reliability.
Ultimately I hope that we can derive some rules that are simple (ideally no addend==0
/addend!=0
distinction), even if we give up some legitimate opportunities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if I just change it to reject relaxation for negative addends? That way, we always look only forward? Would that be too pessimistic?
lld/test/ELF/riscv-relax-gp-edges.s
Outdated
# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax c.s -o rv32c.o | ||
# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax c.s -o rv64c.o | ||
|
||
# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32a.o lds.a -o rv32a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.lds b.lds c.lds instead of lds.a lds.b lds.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Switch to using a somewhat loosely defined concept of accessible blocks of memory to determine if all allowable relocations are relaxable.
Linker gp relaxation is greedy. It will eliminate the LUI with R_RISCV_HI20 if the base of the object is reachable from the gp. The relaxation on the R_RISCV_LO12 will be rejected if it is not reachable, but that is too late if the corresponding R_RISCV_HI20 is gone.
This patch disables relaxation if the entire portion of the object that can be relocated together is not reachable.
It is important that this does not necessarily mean the size of the object since the size doesn't matter if its alignment is smaller than its size.
In order to achieve correctness without excessively pessimizing relaxation for large objects, relaxation is rejected if the base of the object + min(s, ma) is not reachable from gp, where:
s - size of the object
ma - maximum alignment of the section that contains
the object.