[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

SixWeining · 2023-11-10T08:58:17Z

If page(dest) - page(pc) is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.

llvmbot · 2023-11-10T08:58:48Z

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Lu Weining (SixWeining)

Changes

If page(dest) - page(pc) is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.

Full diff: https://github.com/llvm/llvm-project/pull/71907.diff

2 Files Affected:

(modified) lld/ELF/Arch/LoongArch.cpp (+4)
(modified) lld/test/ELF/loongarch-pc-aligned.s (+5-6)

diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 04ddb4682917b4b..01e6b037eb900af 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
   bool negativeA = lo12(dest) > 0x7ff;
   bool negativeB = (result & 0x8000'0000) != 0;
 
+  // A corner case; directly return the expected result.
+  if (result == 0xfffffffffffff000 && negativeA)
+    return result = 0xffffffff00000000;
+
   if (negativeA)
     result += 0x1000;
   if (negativeA && !negativeB)
diff --git a/lld/test/ELF/loongarch-pc-aligned.s b/lld/test/ELF/loongarch-pc-aligned.s
index f6ac56e5261ddb7..ae77376ebcdbf38 100644
--- a/lld/test/ELF/loongarch-pc-aligned.s
+++ b/lld/test/ELF/loongarch-pc-aligned.s
@@ -260,18 +260,17 @@
 # EXTREME15-NEXT: lu32i.d   $t0, -349526
 # EXTREME15-NEXT: lu52i.d   $t0, $t0, -1093
 
-## FIXME: Correct %pc64_lo20 should be 0xfffff (-1) and %pc64_hi12 should be 0xfff (-1), but current values are:
-## page delta = 0x0000000000000000, page offset = 0x888
+## page delta = 0xffffffff00000000, page offset = 0x888
 ## %pc_lo12   = 0x888 = -1912
 ## %pc_hi20   = 0x00000 = 0
-## %pc64_lo20 = 0x00000 = 0
-## %pc64_hi12 = 0x00000 = 0
+## %pc64_lo20 = 0xfffff = -1
+## %pc64_hi12 = 0xfff = -1
 # RUN: ld.lld %t/extreme.o --section-start=.rodata=0x0000000012344888 --section-start=.text=0x0000000012345678 -o %t/extreme16
 # RUN: llvm-objdump -d --no-show-raw-insn %t/extreme16 | FileCheck %s --check-prefix=EXTREME16
 # EXTREME16:      addi.d $t0, $zero, -1912
 # EXTREME16-NEXT: pcalau12i $t1, 0
-# EXTREME16-NEXT: lu32i.d   $t0, 0
-# EXTREME16-NEXT: lu52i.d   $t0, $t0, 0
+# EXTREME16-NEXT: lu32i.d   $t0, -1
+# EXTREME16-NEXT: lu52i.d   $t0, $t0, -1
 
 #--- a.s
 .rodata

SixWeining · 2023-11-10T08:59:22Z

Add @xen0n @xry111.

xen0n · 2023-11-10T09:30:08Z

lld/ELF/Arch/LoongArch.cpp

@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
  bool negativeA = lo12(dest) > 0x7ff;
  bool negativeB = (result & 0x8000'0000) != 0;

+  // A corner case; directly return the expected result.
+  if (result == 0xfffffffffffff000 && negativeA)
+    return result = 0xffffffff00000000;


I can see the misbehavior is caused by an overflow in the result += 0x1000 below; who expected that! Thanks for spotting it.

First of all, stylistically, the result = part could be dropped and some ' could be added to help counting bits; other than that, I don't know if the other arithmetic could overflow too, in which case we might want to guard them too.

The current case is raised by some end user. I don't know if there are other cases too. Seems that binutils' processing is in elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32); I'm not sure whether it is helpful.

Seems that mold uses the same impl as binutils. https://github.com/rui314/mold/blob/v2.3.2/elf/arch-loongarch.cc#L53

Not this bug.
~~I had report this bug to mengqinggang and Rui rui314/mold#1131 (comment)~~
That's because we can not get consistent PC from R_LARCH_PCALA_HI20 and R_LARCH_PCALA64_{LO20,HI12}. ~~With this fix, we will wrong when span (+2 + 4N)G.~~

Wrong codes.

This special case should probably be added inside if (negativeA) { ... } so that it reads less magic.

If `page(dest) - page(pc)` is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.

SixWeining · 2023-11-10T09:56:48Z

Sorry that I accidentally used force-push to address review comments. :(

heiher · 2023-11-15T15:50:35Z

Is this the correct way to get the four parts of PCALA64?

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075

SixWeining · 2023-11-16T02:54:42Z

Is this the correct way to get the four parts of PCALA64?
DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]
This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075

Seems this is what ld and mold do? Like bfd/elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32).

heiher · 2023-11-16T03:35:13Z

Is this the correct way to get the four parts of PCALA64?
DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]
This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075
Seems this is what ld and mold do? Like bfd/elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32).

As jinyang said, this way relies on the PC of pcalau12i, which is difficult to do when there is instruction scheduling.

heiher · 2023-11-16T03:46:23Z

Another corner case?

DEST: 0 PC: 0x80000ffc

80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1

ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

jrtc27 · 2023-11-18T07:57:54Z

This special casing is really ugly and should not be needed, surely, if you just do the arithmetic in the right order and correctly. Can we not just do the same as mold?

jrtc27 · 2023-11-18T07:59:42Z

I'll also note that the inconsistent positioning of the 's in the numbers (groups of 4 for halfwords vs 3 and 5 to match the relocations) is pretty confusing to read at a glance, possibly even making it less clear than not having them, since one expects them to be in the same place.

SixWeining · 2023-11-18T16:18:34Z

This special casing is really ugly and should not be needed, surely, if you just do the arithmetic in the right order and correctly. Can we not just do the same as mold?

How about using the approach proposed by @heiher in #71907 (comment) ?

--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -156,16 +156,9 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
   //     i = -0x10000'0000, j = 0, k = -0x1000
   //     result = page(dest) - page(pc) + 0x1000
   uint64_t result = getLoongArchPage(dest) - getLoongArchPage(pc);
-  bool negativeA = lo12(dest) > 0x7ff;
-  bool negativeB = (result & 0x8000'0000) != 0;
-
-  if (negativeA)
-    result += 0x1000;
-  if (negativeA && !negativeB)
-    result -= 0x10000'0000;
-  else if (!negativeA && negativeB)
-    result += 0x10000'0000;
-
+  result += ((dest & 0x800) << 1);
+  result += ((result & 0x8000'0000) << 1);
+  result -= ((dest & 0x800) << 21);
   return result;
 }

SixWeining · 2023-11-19T09:02:07Z

Another corner case?

DEST: 0 PC: 0x80000ffc
80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1
ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

ld is correct in this case.

    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@MQ-mengqing Can you post an example you mentioned in rui314/mold#1131 (comment) ?

heiher · 2023-11-19T12:26:44Z

Another corner case?
DEST: 0 PC: 0x80000ffc
80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1
ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

ld is correct in this case.

    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@MQ-mengqing Can you post an example you mentioned in rui314/mold#1131 (comment) ?

I found that different ld versions have different results, it doesn't seem to be completely correct? 😞

DEST: 0 PC: 0x80000ffc

ld 2.40 (correct)

    80000ffc:	1b000005 	pcalau12i   	$a1, -524288(0x80000)
    80001000:	02c00004 	addi.d      	$a0, $zero, 0
    80001004:	16000004 	lu32i.d     	$a0, 0
    80001008:	03000084 	lu52i.d     	$a0, $a0, 0
    8000100c:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0

ld 2.41 or git mainline (incorrect)

    80000ffc:	1b000005 	pcalau12i   	$a1, -524288(0x80000)
    80001000:	02c00004 	addi.d      	$a0, $zero, 0
    80001004:	17ffffe4 	lu32i.d     	$a0, -1(0xfffff)
    80001008:	033ffc84 	lu52i.d     	$a0, $a0, -1(0xfff)
    8000100c:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0xffffffff00000000

DEST: 0 PC: 0x80001000

ld 2.40 (incorrect)

    80001000:	1affffe5 	pcalau12i   	$a1, 524287(0x7ffff)
    80001004:	02c00004 	addi.d      	$a0, $zero, 0
    80001008:	16000004 	lu32i.d     	$a0, 0
    8000100c:	03000084 	lu52i.d     	$a0, $a0, 0
    80001010:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0x100000000

ld 2.41 or git mainline (correct)

    80001000:	1affffe5 	pcalau12i   	$a1, 524287(0x7ffff)
    80001004:	02c00004 	addi.d      	$a0, $zero, 0
    80001008:	17ffffe4 	lu32i.d     	$a0, -1(0xfffff)
    8000100c:	033ffc84 	lu52i.d     	$a0, $a0, -1(0xfff)
    80001010:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0

heiher · 2023-11-19T13:25:31Z

What if we could get the page offset of current instruction and pcalau12i and record in unused immediate bit fields of addi.d/lu32i.d and lu52i.d. I think ~~we can do it correct~~ and these 4 instructions can be scheduled within the 16M(+/-8M) range. This is enough even for a larger basic block.

Similar to type c here.

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

A/B/C

page(4k) offset of current instruction and pcalau12i.

A = ((. >> 12) - (.L1 >> 12))
B = ((. >> 12) - (.L1 >> 12))
C = ((. >> 12) - (.L1 >> 12))

PC

pcalau12i: PC = PC
addi.d:    PC = PC - (A << 12)
lu32i.d:   PC = PC - (B << 12)
lu52i.d:   PC = PC - (C << 12)

R_LARCH_PCALA_

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

SixWeining · 2023-11-19T16:36:37Z

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

Is it possible to use the r_addend of Elf64_Rela instead of insn imm bits to store the adjusting bits?

For example:

Before relocation

The addend should be written by linker but not assembler because assembler doesn't know the page delta of current instruction and pcalau12i. (Same as the imm bits approach?)

0000000000000000 <_start>:
   0:   1a000004        pcalau12i       $a0, 0
                        0: R_LARCH_PCALA_HI20   foo
   4:   02c00001        addi.d          $ra, $zero, 0
                        4: R_LARCH_PCALA_LO12   foo
   8:   16000001        lu32i.d         $ra, 0
                        8: R_LARCH_PCALA64_LO20 foo+0x10000000
   c:   03000021        lu52i.d         $ra, $ra, 0
                        c: R_LARCH_PCALA64_HI12 foo+0x10000000000000
  10:   00109021        add.d           $ra, $ra, $a0
  14:   4c000021        jirl            $ra, $ra, 0

After link

0000000080000ffc <_start>:
    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

heiher · 2023-11-19T17:11:55Z

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

Is it possible to use the r_addend of Elf64_Rela instead of insn imm bits to store the adjusting bits?

For example:

Before relocation

The addend should be written by linker but not assembler because assembler doesn't know the page delta of current instruction and pcalau12i. (Same as the imm bits approach?)

0000000000000000 <_start>:
   0:   1a000004        pcalau12i       $a0, 0
                        0: R_LARCH_PCALA_HI20   foo
   4:   02c00001        addi.d          $ra, $zero, 0
                        4: R_LARCH_PCALA_LO12   foo
   8:   16000001        lu32i.d         $ra, 0
                        8: R_LARCH_PCALA64_LO20 foo+0x10000000
   c:   03000021        lu52i.d         $ra, $ra, 0
                        c: R_LARCH_PCALA64_HI12 foo+0x10000000000000
  10:   00109021        add.d           $ra, $ra, $a0
  14:   4c000021        jirl            $ra, $ra, 0

This seems fine for static linking, but how to deal with object files (.o)? How to calculate the adjusting bits at this time?
(Okay, the above way of recording page offsets doesn't work either.)

Can we store the byte offset in r_addend for object file? It looks like there are enough bits.

After link

0000000080000ffc <_start>:
    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

SixWeining · 2023-11-20T01:31:22Z

This seems fine for static linking, but how to deal with object files (.o)? How to calculate the adjusting bits at this time?

Yes, seems it's impossible.

Can we store the byte offset in r_addend for object file? It looks like there are enough bits.

I'm not sure how these byte offsets could be updated after instrucion scheduling.

Anyway, the Detail description of R_LARCH_PCALA_HI20, R_LARCH_PCALA64_LO20 and R_LARCH_PCALA64_HI12 in ABI is incorrect.
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L518
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L530
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L535

xry111 · 2023-11-20T12:22:25Z

Phew. The doc for these relocs in https://github.com/gimli-rs/object, written by me, is also wrong.

xry111 · 2023-11-20T17:23:51Z

I made a brute force algorithm:

def uintptr_t(x):
    return x & ((1 << 64) - 1)

def ptrdiff_t(x):
    x = uintptr_t(x)
    if x & (1 << 63):
        x = (1 << 64) - x
    return x

def pcalau12i(pc, imm):
    assert(imm in range(-0x80000, 0x80000))
    return uintptr_t(pc + (imm << 12)) & ~0xfff

def simm(width, bits):
    assert(bits >= 0 and bits < (1 << width))
    return bits - ((1 << width) if bits & (1 << (width - 1)) else 0)

def reloc(dest, pc):
    lo12 = dest & 0xfff
    a1_first_val = uintptr_t(simm(12, lo12))

    for hi20 in range(-0x80000, 0x80000):
        a0 = pcalau12i(pc, hi20)
        # We need to insert something into a1[32..] to make a0 + a1 = dest,
        # i. e. a1 = dest - a0.
        want_a1 = uintptr_t(dest - a0)
        if (want_a1 & 0xffffffff) != (a1_first_val & 0xffffffff):
            continue

        lo20 = (want_a1 >> 32) & 0xfffff
        hi12 = (want_a1 >> 52) & 0xfff
        return (lo12, hi20 & 0xfffff, lo20, hi12)

    raise Exception("should not be reachable")

def test(dest, pc):
    reloc(dest, pc)
    lo12, hi20, lo20, hi12 = reloc(dest, pc)
    a0 = pcalau12i(pc, simm(20, hi20))
    a1 = uintptr_t(simm(12, lo12))
    a1 &= ~(0xffffffff << 32)
    a1 |= lo20 << 32
    a1 |= hi12 << 52
    assert(uintptr_t(a1 + a0) == dest)

test(0xfffffffffffff8ee, 0x8ee)
test(0, 0x80000ffc)
test(0, 0x80001000)

Now I need to optimize for hi20 in range(-0x80000, 0x80000): line, limiting hi20 into one or two possible values.

xry111 · 2023-11-22T13:02:12Z

I cannot figure out any reasonable solution if we allow the address materialize sequence to cross page boundary. It looks like we'll have to generate something like:

.la_sym_1: pcalau12i $a0, %pc_hi20(sym)
nop # "nop" for other instructions scheduled here
nop
nop
addi.d $a1, $zero, %pc_lo12(sym)
nop
nop
lu32i.d $a1, %pc64_lo20(sym + (. - .la_sym_1))
nop
nop
nop
lu52i.d $a1, $a1, %pc64_hi12(sym + (. - la_sym_1))

With relaxation disabled, ". - la_sym1" can be evaluated precisely by the assembler so we can store it into the addend; with relaxation enabled, we'll just keep the entire address materializing sequence intact (for taking advantage with relaxation) and we end up:

pcalau12i $a0, %pc_hi20(sym)
addi.d $a1, $zero, %pc_lo12(sym)
lu32i.d $a1, %pc64_lo20(sym + 8)
lu52i.d $a1, $a1, %pc64_hi12(sym + 12)

"8" and "12" can be encoded into addend too.

MaskRay · 2023-11-22T19:21:01Z

I made a brute force algorithm:

def uintptr_t(x):
    return x & ((1 << 64) - 1)

def ptrdiff_t(x):
    x = uintptr_t(x)
    if x & (1 << 63):
        x = (1 << 64) - x
    return x

def pcalau12i(pc, imm):
    assert(imm in range(-0x80000, 0x80000))
    return uintptr_t(pc + (imm << 12)) & ~0xfff

def simm(width, bits):
    assert(bits >= 0 and bits < (1 << width))
    return bits - ((1 << width) if bits & (1 << (width - 1)) else 0)

def reloc(dest, pc):
    lo12 = dest & 0xfff
    a1_first_val = uintptr_t(simm(12, lo12))

    for hi20 in range(-0x80000, 0x80000):
        a0 = pcalau12i(pc, hi20)
        # We need to insert something into a1[32..] to make a0 + a1 = dest,
        # i. e. a1 = dest - a0.
        want_a1 = uintptr_t(dest - a0)
        if (want_a1 & 0xffffffff) != (a1_first_val & 0xffffffff):
            continue

        lo20 = (want_a1 >> 32) & 0xfffff
        hi12 = (want_a1 >> 52) & 0xfff
        return (lo12, hi20 & 0xfffff, lo20, hi12)

    raise Exception("should not be reachable")

def test(dest, pc):
    reloc(dest, pc)
    lo12, hi20, lo20, hi12 = reloc(dest, pc)
    a0 = pcalau12i(pc, simm(20, hi20))
    a1 = uintptr_t(simm(12, lo12))
    a1 &= ~(0xffffffff << 32)
    a1 |= lo20 << 32
    a1 |= hi12 << 52
    assert(uintptr_t(a1 + a0) == dest)

test(0xfffffffffffff8ee, 0x8ee)
test(0, 0x80000ffc)
test(0, 0x80001000)

Now I need to optimize for hi20 in range(-0x80000, 0x80000): line, limiting hi20 into one or two possible values.

I like using a program to enumerate all the interesting bits. We should figure out a concise program to compute the offset and avoid the special cases (like in the current patch):

  if (result == 0xfff'fffff'fffff'000 && negativeA) // special case
    return 0xfff'fffff'00000'000;

We should have a description how the new code does its job and a concise proof as the comment.

MaskRay

.

xry111 · 2023-11-23T04:34:59Z

We should have a description how the new code does its job and a concise proof as the comment.

The range(-0x80000, 0x80000) can be obviously optimized to [page_low, page_high] where

page_low = ((dest >> 12) - (pc >> 12)) & 0xfffff
page_high = (page_low + 1) & 0xfffff

Then the remaining task is proving the "raise Exception("should not be reachable")" line is really unreachable...

@rui314

Defer the compution of `negativeB` because adding 0x1000 to original `result` may yield a different `negativeB` value. Actually this issue was first reported by @rui314 at https://reviews.llvm.org/D138135#4568594. Note that even with this patch, the handling of R_LARCH_PCALA64_* relocs are NOT totally correct, because current approach assumes those four instructions (pcalau12i/addi.d/lu32i.d/lu52i.d) are in the same 4K-page which is not always true. It is possible to document this assumption as a constraint in psABI in future. But at least this patch is necessary. See llvm#71907 and loongson-community/discussions#17 for details.

SixWeining · 2023-11-25T12:56:27Z

Replace this ugly fix with a new PR #73387. Thanks.

…v2.30 (#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See #71907 and loongson-community/discussions#17 for details.

…v2.30 (llvm#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See llvm#71907 and loongson-community/discussions#17 for details.

…v2.30 (#73387) psABI v2.30 requires the extreme code model instructions sequence (pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent. See llvm/llvm-project#71907 and loongson-community/discussions#17 for details. (cherry picked from commit 38394a3)

llvmbot added lld lld:ELF labels Nov 10, 2023

SixWeining requested review from heiher, wangleiat and MaskRay November 10, 2023 08:59

xen0n reviewed Nov 10, 2023

View reviewed changes

SixWeining force-pushed the lld-large-fix branch from 058298f to 663522c Compare November 10, 2023 09:54

MaskRay requested changes Nov 22, 2023

View reviewed changes

xry111 mentioned this pull request Nov 23, 2023

ELF: Handle R_LARCH_PCALA64_* in a correct and reasonable way loongson-community/discussions#17

Closed

SixWeining mentioned this pull request Nov 25, 2023

[lld][LoongArch] Handle extreme code model relocs according to psABI v2.30 #73387

Merged

SixWeining closed this Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

SixWeining commented Nov 10, 2023

llvmbot commented Nov 10, 2023 •

edited

Loading

SixWeining commented Nov 10, 2023

xen0n Nov 10, 2023

SixWeining Nov 10, 2023

SixWeining Nov 13, 2023

MQ-mengqing Nov 14, 2023 •

edited

Loading

MaskRay Nov 18, 2023

SixWeining commented Nov 10, 2023

heiher commented Nov 15, 2023

SixWeining commented Nov 16, 2023

heiher commented Nov 16, 2023

heiher commented Nov 16, 2023 •

edited

Loading

jrtc27 commented Nov 18, 2023

jrtc27 commented Nov 18, 2023

SixWeining commented Nov 18, 2023

SixWeining commented Nov 19, 2023

heiher commented Nov 19, 2023

heiher commented Nov 19, 2023 •

edited

Loading

SixWeining commented Nov 19, 2023

heiher commented Nov 19, 2023 •

edited

Loading

Before relocation

After link

SixWeining commented Nov 20, 2023

xry111 commented Nov 20, 2023 •

edited

Loading

xry111 commented Nov 20, 2023

xry111 commented Nov 22, 2023

MaskRay commented Nov 22, 2023

MaskRay left a comment

xry111 commented Nov 23, 2023

SixWeining commented Nov 25, 2023

[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

Conversation

SixWeining commented Nov 10, 2023

llvmbot commented Nov 10, 2023 • edited Loading

SixWeining commented Nov 10, 2023

xen0n Nov 10, 2023

Choose a reason for hiding this comment

SixWeining Nov 10, 2023

Choose a reason for hiding this comment

SixWeining Nov 13, 2023

Choose a reason for hiding this comment

MQ-mengqing Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

MaskRay Nov 18, 2023

Choose a reason for hiding this comment

SixWeining commented Nov 10, 2023

heiher commented Nov 15, 2023

SixWeining commented Nov 16, 2023

heiher commented Nov 16, 2023

heiher commented Nov 16, 2023 • edited Loading

jrtc27 commented Nov 18, 2023

jrtc27 commented Nov 18, 2023

SixWeining commented Nov 18, 2023

SixWeining commented Nov 19, 2023

heiher commented Nov 19, 2023

DEST: 0 PC: 0x80000ffc

ld 2.40 (correct)

ld 2.41 or git mainline (incorrect)

DEST: 0 PC: 0x80001000

ld 2.40 (incorrect)

ld 2.41 or git mainline (correct)

heiher commented Nov 19, 2023 • edited Loading

A/B/C

PC

R_LARCH_PCALA*_*

SixWeining commented Nov 19, 2023

Before relocation

After link

heiher commented Nov 19, 2023 • edited Loading

Before relocation

After link

SixWeining commented Nov 20, 2023

xry111 commented Nov 20, 2023 • edited Loading

xry111 commented Nov 20, 2023

xry111 commented Nov 22, 2023

MaskRay commented Nov 22, 2023

MaskRay left a comment

Choose a reason for hiding this comment

xry111 commented Nov 23, 2023

SixWeining commented Nov 25, 2023

llvmbot commented Nov 10, 2023 •

edited

Loading

MQ-mengqing Nov 14, 2023 •

edited

Loading

heiher commented Nov 16, 2023 •

edited

Loading

heiher commented Nov 19, 2023 •

edited

Loading

R_LARCH_PCALA_

heiher commented Nov 19, 2023 •

edited

Loading

xry111 commented Nov 20, 2023 •

edited

Loading