-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907
Conversation
@llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-lld Author: Lu Weining (SixWeining) ChangesIf Full diff: https://github.com/llvm/llvm-project/pull/71907.diff 2 Files Affected:
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 04ddb4682917b4b..01e6b037eb900af 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
bool negativeA = lo12(dest) > 0x7ff;
bool negativeB = (result & 0x8000'0000) != 0;
+ // A corner case; directly return the expected result.
+ if (result == 0xfffffffffffff000 && negativeA)
+ return result = 0xffffffff00000000;
+
if (negativeA)
result += 0x1000;
if (negativeA && !negativeB)
diff --git a/lld/test/ELF/loongarch-pc-aligned.s b/lld/test/ELF/loongarch-pc-aligned.s
index f6ac56e5261ddb7..ae77376ebcdbf38 100644
--- a/lld/test/ELF/loongarch-pc-aligned.s
+++ b/lld/test/ELF/loongarch-pc-aligned.s
@@ -260,18 +260,17 @@
# EXTREME15-NEXT: lu32i.d $t0, -349526
# EXTREME15-NEXT: lu52i.d $t0, $t0, -1093
-## FIXME: Correct %pc64_lo20 should be 0xfffff (-1) and %pc64_hi12 should be 0xfff (-1), but current values are:
-## page delta = 0x0000000000000000, page offset = 0x888
+## page delta = 0xffffffff00000000, page offset = 0x888
## %pc_lo12 = 0x888 = -1912
## %pc_hi20 = 0x00000 = 0
-## %pc64_lo20 = 0x00000 = 0
-## %pc64_hi12 = 0x00000 = 0
+## %pc64_lo20 = 0xfffff = -1
+## %pc64_hi12 = 0xfff = -1
# RUN: ld.lld %t/extreme.o --section-start=.rodata=0x0000000012344888 --section-start=.text=0x0000000012345678 -o %t/extreme16
# RUN: llvm-objdump -d --no-show-raw-insn %t/extreme16 | FileCheck %s --check-prefix=EXTREME16
# EXTREME16: addi.d $t0, $zero, -1912
# EXTREME16-NEXT: pcalau12i $t1, 0
-# EXTREME16-NEXT: lu32i.d $t0, 0
-# EXTREME16-NEXT: lu52i.d $t0, $t0, 0
+# EXTREME16-NEXT: lu32i.d $t0, -1
+# EXTREME16-NEXT: lu52i.d $t0, $t0, -1
#--- a.s
.rodata
|
lld/ELF/Arch/LoongArch.cpp
Outdated
@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) { | |||
bool negativeA = lo12(dest) > 0x7ff; | |||
bool negativeB = (result & 0x8000'0000) != 0; | |||
|
|||
// A corner case; directly return the expected result. | |||
if (result == 0xfffffffffffff000 && negativeA) | |||
return result = 0xffffffff00000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the misbehavior is caused by an overflow in the result += 0x1000
below; who expected that! Thanks for spotting it.
First of all, stylistically, the result =
part could be dropped and some '
could be added to help counting bits; other than that, I don't know if the other arithmetic could overflow too, in which case we might want to guard them too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current case is raised by some end user. I don't know if there are other cases too. Seems that binutils' processing is in elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32
); I'm not sure whether it is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that mold uses the same impl as binutils. https://github.com/rui314/mold/blob/v2.3.2/elf/arch-loongarch.cc#L53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this bug.
I had report this bug to mengqinggang and Rui rui314/mold#1131 (comment)
That's because we can not get consistent PC from R_LARCH_PCALA_HI20
and R_LARCH_PCALA64_{LO20,HI12}
. With this fix, we will wrong when span (+2 + 4N)G.
Wrong codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special case should probably be added inside if (negativeA) { ... }
so that it reads less magic.
If `page(dest) - page(pc)` is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.
058298f
to
663522c
Compare
Sorry that I accidentally used force-push to address review comments. :( |
Is this the correct way to get the four parts of PCALA64?
This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075 |
Seems this is what ld and mold do? Like bfd/elfnn-loongarch.c ( |
As jinyang said, this way relies on the PC of |
Another corner case? DEST: 0 PC: 0x80000ffc
ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb |
This special casing is really ugly and should not be needed, surely, if you just do the arithmetic in the right order and correctly. Can we not just do the same as mold? |
I'll also note that the inconsistent positioning of the |
How about using the approach proposed by @heiher in #71907 (comment) ?
|
ld is correct in this case.
@MQ-mengqing Can you post an example you mentioned in rui314/mold#1131 (comment) ? |
I found that different ld versions have different results, it doesn't seem to be completely correct? 😞 DEST: 0 PC: 0x80000ffcld 2.40 (correct) 80000ffc: 1b000005 pcalau12i $a1, -524288(0x80000)
80001000: 02c00004 addi.d $a0, $zero, 0
80001004: 16000004 lu32i.d $a0, 0
80001008: 03000084 lu52i.d $a0, $a0, 0
8000100c: 00109484 add.d $a0, $a0, $a1 # $a0 = 0 ld 2.41 or git mainline (incorrect) 80000ffc: 1b000005 pcalau12i $a1, -524288(0x80000)
80001000: 02c00004 addi.d $a0, $zero, 0
80001004: 17ffffe4 lu32i.d $a0, -1(0xfffff)
80001008: 033ffc84 lu52i.d $a0, $a0, -1(0xfff)
8000100c: 00109484 add.d $a0, $a0, $a1 # $a0 = 0xffffffff00000000 DEST: 0 PC: 0x80001000ld 2.40 (incorrect) 80001000: 1affffe5 pcalau12i $a1, 524287(0x7ffff)
80001004: 02c00004 addi.d $a0, $zero, 0
80001008: 16000004 lu32i.d $a0, 0
8000100c: 03000084 lu52i.d $a0, $a0, 0
80001010: 00109484 add.d $a0, $a0, $a1 # $a0 = 0x100000000 ld 2.41 or git mainline (correct) 80001000: 1affffe5 pcalau12i $a1, 524287(0x7ffff)
80001004: 02c00004 addi.d $a0, $zero, 0
80001008: 17ffffe4 lu32i.d $a0, -1(0xfffff)
8000100c: 033ffc84 lu52i.d $a0, $a0, -1(0xfff)
80001010: 00109484 add.d $a0, $a0, $a1 # $a0 = 0 |
What if we could get the page offset of current instruction and Similar to type c here. .L1:
pcalau12i $a1, 0 # R_LARCH_PCALA_HI20
addi.d $a0, $zero, A # R_LARCH_PCALA_LO12
lu32i.d $a0, B # R_LARCH_PCALA64_LO20
lu52i.d $a0, $a0, C # R_LARCH_PCALA64_HI12
add.d $a0, $a0, $a1 A/B/Cpage(4k) offset of current instruction and
PC
R_LARCH_PCALA*_*
|
Is it possible to use the For example: Before relocationThe addend should be written by linker but not assembler because assembler doesn't know the page delta of current instruction and
After link
|
This seems fine for static linking, but how to deal with object files (.o)? How to calculate the adjusting bits at this time? Can we store the byte offset in
|
Yes, seems it's impossible.
I'm not sure how these byte offsets could be updated after instrucion scheduling. Anyway, the |
Phew. The doc for these relocs in https://github.com/gimli-rs/object, written by me, is also wrong. |
I made a brute force algorithm: def uintptr_t(x):
return x & ((1 << 64) - 1)
def ptrdiff_t(x):
x = uintptr_t(x)
if x & (1 << 63):
x = (1 << 64) - x
return x
def pcalau12i(pc, imm):
assert(imm in range(-0x80000, 0x80000))
return uintptr_t(pc + (imm << 12)) & ~0xfff
def simm(width, bits):
assert(bits >= 0 and bits < (1 << width))
return bits - ((1 << width) if bits & (1 << (width - 1)) else 0)
def reloc(dest, pc):
lo12 = dest & 0xfff
a1_first_val = uintptr_t(simm(12, lo12))
for hi20 in range(-0x80000, 0x80000):
a0 = pcalau12i(pc, hi20)
# We need to insert something into a1[32..] to make a0 + a1 = dest,
# i. e. a1 = dest - a0.
want_a1 = uintptr_t(dest - a0)
if (want_a1 & 0xffffffff) != (a1_first_val & 0xffffffff):
continue
lo20 = (want_a1 >> 32) & 0xfffff
hi12 = (want_a1 >> 52) & 0xfff
return (lo12, hi20 & 0xfffff, lo20, hi12)
raise Exception("should not be reachable")
def test(dest, pc):
reloc(dest, pc)
lo12, hi20, lo20, hi12 = reloc(dest, pc)
a0 = pcalau12i(pc, simm(20, hi20))
a1 = uintptr_t(simm(12, lo12))
a1 &= ~(0xffffffff << 32)
a1 |= lo20 << 32
a1 |= hi12 << 52
assert(uintptr_t(a1 + a0) == dest)
test(0xfffffffffffff8ee, 0x8ee)
test(0, 0x80000ffc)
test(0, 0x80001000) Now I need to optimize |
I cannot figure out any reasonable solution if we allow the address materialize sequence to cross page boundary. It looks like we'll have to generate something like:
With relaxation disabled, ". - la_sym1" can be evaluated precisely by the assembler so we can store it into the addend; with relaxation enabled, we'll just keep the entire address materializing sequence intact (for taking advantage with relaxation) and we end up:
"8" and "12" can be encoded into addend too. |
I like using a program to enumerate all the interesting bits. We should figure out a concise program to compute the offset and avoid the special cases (like in the current patch):
We should have a description how the new code does its job and a concise proof as the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
The
Then the remaining task is proving the "raise Exception("should not be reachable")" line is really unreachable... |
Defer the compution of `negativeB` because adding 0x1000 to original `result` may yield a different `negativeB` value. Actually this issue was first reported by @rui314 at https://reviews.llvm.org/D138135#4568594. Note that even with this patch, the handling of R_LARCH_PCALA64_* relocs are NOT totally correct, because current approach assumes those four instructions (pcalau12i/addi.d/lu32i.d/lu52i.d) are in the same 4K-page which is not always true. It is possible to document this assumption as a constraint in psABI in future. But at least this patch is necessary. See llvm#71907 and loongson-community/discussions#17 for details.
Replace this ugly fix with a new PR #73387. Thanks. |
…v2.30 (#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See #71907 and loongson-community/discussions#17 for details.
…v2.30 (llvm#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See llvm#71907 and loongson-community/discussions#17 for details.
…v2.30 (#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See llvm/llvm-project#71907 and loongson-community/discussions#17 for details. (cherry picked from commit 38394a3)
If
page(dest) - page(pc)
is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.