Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lld][ELF] Fix a corner case of elf::getLoongArchPageDelta #71907

Closed
wants to merge 1 commit into from

Conversation

SixWeining
Copy link
Contributor

If page(dest) - page(pc) is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 10, 2023

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Lu Weining (SixWeining)

Changes

If page(dest) - page(pc) is 0xfffffffffff000, i.e. page(pc) is next to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12 should be both -1 (which can be checked with binutils) but they are both 0 on lld. This patch fixes this issue.


Full diff: https://github.com/llvm/llvm-project/pull/71907.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/LoongArch.cpp (+4)
  • (modified) lld/test/ELF/loongarch-pc-aligned.s (+5-6)
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 04ddb4682917b4b..01e6b037eb900af 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
   bool negativeA = lo12(dest) > 0x7ff;
   bool negativeB = (result & 0x8000'0000) != 0;
 
+  // A corner case; directly return the expected result.
+  if (result == 0xfffffffffffff000 && negativeA)
+    return result = 0xffffffff00000000;
+
   if (negativeA)
     result += 0x1000;
   if (negativeA && !negativeB)
diff --git a/lld/test/ELF/loongarch-pc-aligned.s b/lld/test/ELF/loongarch-pc-aligned.s
index f6ac56e5261ddb7..ae77376ebcdbf38 100644
--- a/lld/test/ELF/loongarch-pc-aligned.s
+++ b/lld/test/ELF/loongarch-pc-aligned.s
@@ -260,18 +260,17 @@
 # EXTREME15-NEXT: lu32i.d   $t0, -349526
 # EXTREME15-NEXT: lu52i.d   $t0, $t0, -1093
 
-## FIXME: Correct %pc64_lo20 should be 0xfffff (-1) and %pc64_hi12 should be 0xfff (-1), but current values are:
-## page delta = 0x0000000000000000, page offset = 0x888
+## page delta = 0xffffffff00000000, page offset = 0x888
 ## %pc_lo12   = 0x888 = -1912
 ## %pc_hi20   = 0x00000 = 0
-## %pc64_lo20 = 0x00000 = 0
-## %pc64_hi12 = 0x00000 = 0
+## %pc64_lo20 = 0xfffff = -1
+## %pc64_hi12 = 0xfff = -1
 # RUN: ld.lld %t/extreme.o --section-start=.rodata=0x0000000012344888 --section-start=.text=0x0000000012345678 -o %t/extreme16
 # RUN: llvm-objdump -d --no-show-raw-insn %t/extreme16 | FileCheck %s --check-prefix=EXTREME16
 # EXTREME16:      addi.d $t0, $zero, -1912
 # EXTREME16-NEXT: pcalau12i $t1, 0
-# EXTREME16-NEXT: lu32i.d   $t0, 0
-# EXTREME16-NEXT: lu52i.d   $t0, $t0, 0
+# EXTREME16-NEXT: lu32i.d   $t0, -1
+# EXTREME16-NEXT: lu52i.d   $t0, $t0, -1
 
 #--- a.s
 .rodata

@SixWeining
Copy link
Contributor Author

Add @xen0n @xry111.

@@ -159,6 +159,10 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
bool negativeA = lo12(dest) > 0x7ff;
bool negativeB = (result & 0x8000'0000) != 0;

// A corner case; directly return the expected result.
if (result == 0xfffffffffffff000 && negativeA)
return result = 0xffffffff00000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the misbehavior is caused by an overflow in the result += 0x1000 below; who expected that! Thanks for spotting it.

First of all, stylistically, the result = part could be dropped and some ' could be added to help counting bits; other than that, I don't know if the other arithmetic could overflow too, in which case we might want to guard them too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current case is raised by some end user. I don't know if there are other cases too. Seems that binutils' processing is in elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32); I'm not sure whether it is helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that mold uses the same impl as binutils. https://github.com/rui314/mold/blob/v2.3.2/elf/arch-loongarch.cc#L53

Copy link
Contributor

@MQ-mengqing MQ-mengqing Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this bug.
I had report this bug to mengqinggang and Rui rui314/mold#1131 (comment)
That's because we can not get consistent PC from R_LARCH_PCALA_HI20 and R_LARCH_PCALA64_{LO20,HI12}. With this fix, we will wrong when span (+2 + 4N)G.

Wrong codes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special case should probably be added inside if (negativeA) { ... } so that it reads less magic.

If `page(dest) - page(pc)` is 0xfffffffffff000, i.e. page(pc) is next
to page(dest), and lo12(dest) > 0x7ff, correct %pc64_lo12 and %pc64_hi12
should be both -1 (which can be checked with binutils) but they are both
0 on lld. This patch fixes this issue.
@SixWeining
Copy link
Contributor Author

Sorry that I accidentally used force-push to address review comments. :(

@heiher
Copy link
Member

heiher commented Nov 15, 2023

Is this the correct way to get the four parts of PCALA64?

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075

@SixWeining
Copy link
Contributor Author

Is this the correct way to get the four parts of PCALA64?

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075

Seems this is what ld and mold do? Like bfd/elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32).

@heiher
Copy link
Member

heiher commented Nov 16, 2023

Is this the correct way to get the four parts of PCALA64?

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

This is a simple validator, let me test it with boundary data: https://gist.github.com/heiher/bd7398397f3cb5598dce35cbc04d0075

Seems this is what ld and mold do? Like bfd/elfnn-loongarch.c (RELOCATE_CALC_PC64_HI32).

As jinyang said, this way relies on the PC of pcalau12i, which is difficult to do when there is instruction scheduling.

@heiher
Copy link
Member

heiher commented Nov 16, 2023

Another corner case?

DEST: 0 PC: 0x80000ffc

80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1

ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 18, 2023

This special casing is really ugly and should not be needed, surely, if you just do the arithmetic in the right order and correctly. Can we not just do the same as mold?

@jrtc27
Copy link
Collaborator

jrtc27 commented Nov 18, 2023

I'll also note that the inconsistent positioning of the 's in the numbers (groups of 4 for halfwords vs 3 and 5 to match the relocations) is pretty confusing to read at a glance, possibly even making it less clear than not having them, since one expects them to be in the same place.

@SixWeining
Copy link
Contributor Author

This special casing is really ugly and should not be needed, surely, if you just do the arithmetic in the right order and correctly. Can we not just do the same as mold?

How about using the approach proposed by @heiher in #71907 (comment) ?

--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -156,16 +156,9 @@ uint64_t elf::getLoongArchPageDelta(uint64_t dest, uint64_t pc) {
   //     i = -0x10000'0000, j = 0, k = -0x1000
   //     result = page(dest) - page(pc) + 0x1000
   uint64_t result = getLoongArchPage(dest) - getLoongArchPage(pc);
-  bool negativeA = lo12(dest) > 0x7ff;
-  bool negativeB = (result & 0x8000'0000) != 0;
-
-  if (negativeA)
-    result += 0x1000;
-  if (negativeA && !negativeB)
-    result -= 0x10000'0000;
-  else if (!negativeA && negativeB)
-    result += 0x10000'0000;
-
+  result += ((dest & 0x800) << 1);
+  result += ((result & 0x8000'0000) << 1);
+  result -= ((dest & 0x800) << 21);
   return result;
 }

@SixWeining
Copy link
Contributor Author

Another corner case?

DEST: 0 PC: 0x80000ffc

80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1

ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

ld is correct in this case.

    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@MQ-mengqing Can you post an example you mentioned in rui314/mold#1131 (comment) ?

@heiher
Copy link
Member

heiher commented Nov 19, 2023

Another corner case?
DEST: 0 PC: 0x80000ffc

80000ffc: 05 00 00 1b  	pcalau12i	$a1, -524288
80001000: 04 00 c0 02  	addi.d	$a0, $zero, 0
80001004: e4 ff ff 17  	lu32i.d	$a0, -1
80001008: 84 fc 3f 03  	lu52i.d	$a0, $a0, -1
8000100c: 84 94 10 00  	add.d	$a0, $a0, $a1

ugly test: https://gist.github.com/heiher/3c7a2a9e7d05354e818b50d69e40b8cb

ld is correct in this case.

    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@MQ-mengqing Can you post an example you mentioned in rui314/mold#1131 (comment) ?

I found that different ld versions have different results, it doesn't seem to be completely correct? 😞

DEST: 0 PC: 0x80000ffc

ld 2.40 (correct)

    80000ffc:	1b000005 	pcalau12i   	$a1, -524288(0x80000)
    80001000:	02c00004 	addi.d      	$a0, $zero, 0
    80001004:	16000004 	lu32i.d     	$a0, 0
    80001008:	03000084 	lu52i.d     	$a0, $a0, 0
    8000100c:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0

ld 2.41 or git mainline (incorrect)

    80000ffc:	1b000005 	pcalau12i   	$a1, -524288(0x80000)
    80001000:	02c00004 	addi.d      	$a0, $zero, 0
    80001004:	17ffffe4 	lu32i.d     	$a0, -1(0xfffff)
    80001008:	033ffc84 	lu52i.d     	$a0, $a0, -1(0xfff)
    8000100c:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0xffffffff00000000

DEST: 0 PC: 0x80001000

ld 2.40 (incorrect)

    80001000:	1affffe5 	pcalau12i   	$a1, 524287(0x7ffff)
    80001004:	02c00004 	addi.d      	$a0, $zero, 0
    80001008:	16000004 	lu32i.d     	$a0, 0
    8000100c:	03000084 	lu52i.d     	$a0, $a0, 0
    80001010:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0x100000000

ld 2.41 or git mainline (correct)

    80001000:	1affffe5 	pcalau12i   	$a1, 524287(0x7ffff)
    80001004:	02c00004 	addi.d      	$a0, $zero, 0
    80001008:	17ffffe4 	lu32i.d     	$a0, -1(0xfffff)
    8000100c:	033ffc84 	lu52i.d     	$a0, $a0, -1(0xfff)
    80001010:	00109484 	add.d       	$a0, $a0, $a1 # $a0 = 0

@heiher
Copy link
Member

heiher commented Nov 19, 2023

What if we could get the page offset of current instruction and pcalau12i and record in unused immediate bit fields of addi.d/lu32i.d and lu52i.d. I think we can do it correct and these 4 instructions can be scheduled within the 16M(+/-8M) range. This is enough even for a larger basic block.

Similar to type c here.

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

A/B/C

page(4k) offset of current instruction and pcalau12i.

A = ((. >> 12) - (.L1 >> 12))
B = ((. >> 12) - (.L1 >> 12))
C = ((. >> 12) - (.L1 >> 12))

PC

pcalau12i: PC = PC
addi.d:    PC = PC - (A << 12)
lu32i.d:   PC = PC - (B << 12)
lu52i.d:   PC = PC - (C << 12)

R_LARCH_PCALA*_*

DELTA = DEST - PC[63:12][000]
DELTA = DELTA + (DELTA[11] << 12)
DELTA = DELTA + (DELTA[31] << 32)
DELTA = DELTA - (DELTA[11] << 32)

R_LARCH_PCALA_HI20   = DELTA[31:12]
R_LARCH_PCALA_LO20   = DELTA[11:00]
R_LARCH_PCALA64_LO20 = DELTA[51:32]
R_LARCH_PCALA64_HI12 = DELTA[63:52]

@SixWeining
Copy link
Contributor Author

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

Is it possible to use the r_addend of Elf64_Rela instead of insn imm bits to store the adjusting bits?

For example:

Before relocation

The addend should be written by linker but not assembler because assembler doesn't know the page delta of current instruction and pcalau12i. (Same as the imm bits approach?)

0000000000000000 <_start>:
   0:   1a000004        pcalau12i       $a0, 0
                        0: R_LARCH_PCALA_HI20   foo
   4:   02c00001        addi.d          $ra, $zero, 0
                        4: R_LARCH_PCALA_LO12   foo
   8:   16000001        lu32i.d         $ra, 0
                        8: R_LARCH_PCALA64_LO20 foo+0x10000000
   c:   03000021        lu52i.d         $ra, $ra, 0
                        c: R_LARCH_PCALA64_HI12 foo+0x10000000000000
  10:   00109021        add.d           $ra, $ra, $a0
  14:   4c000021        jirl            $ra, $ra, 0

After link

0000000080000ffc <_start>:
    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@heiher
Copy link
Member

heiher commented Nov 19, 2023

.L1:
  pcalau12i $a1, 0        # R_LARCH_PCALA_HI20
  addi.d    $a0, $zero, A # R_LARCH_PCALA_LO12
  lu32i.d   $a0, B        # R_LARCH_PCALA64_LO20
  lu52i.d   $a0, $a0, C   # R_LARCH_PCALA64_HI12
  add.d     $a0, $a0, $a1

Is it possible to use the r_addend of Elf64_Rela instead of insn imm bits to store the adjusting bits?

For example:

Before relocation

The addend should be written by linker but not assembler because assembler doesn't know the page delta of current instruction and pcalau12i. (Same as the imm bits approach?)

0000000000000000 <_start>:
   0:   1a000004        pcalau12i       $a0, 0
                        0: R_LARCH_PCALA_HI20   foo
   4:   02c00001        addi.d          $ra, $zero, 0
                        4: R_LARCH_PCALA_LO12   foo
   8:   16000001        lu32i.d         $ra, 0
                        8: R_LARCH_PCALA64_LO20 foo+0x10000000
   c:   03000021        lu52i.d         $ra, $ra, 0
                        c: R_LARCH_PCALA64_HI12 foo+0x10000000000000
  10:   00109021        add.d           $ra, $ra, $a0
  14:   4c000021        jirl            $ra, $ra, 0

This seems fine for static linking, but how to deal with object files (.o)? How to calculate the adjusting bits at this time?
(Okay, the above way of recording page offsets doesn't work either.)

Can we store the byte offset in r_addend for object file? It looks like there are enough bits.

After link

0000000080000ffc <_start>:
    80000ffc:   1b000004        pcalau12i       $a0, -524288(0x80000)
    80001000:   02c00001        addi.d          $ra, $zero, 0
    80001004:   16000001        lu32i.d         $ra, 0
    80001008:   03000021        lu52i.d         $ra, $ra, 0
    8000100c:   00109021        add.d           $ra, $ra, $a0

@SixWeining
Copy link
Contributor Author

This seems fine for static linking, but how to deal with object files (.o)? How to calculate the adjusting bits at this time?

Yes, seems it's impossible.

Can we store the byte offset in r_addend for object file? It looks like there are enough bits.

I'm not sure how these byte offsets could be updated after instrucion scheduling.

Anyway, the Detail description of R_LARCH_PCALA_HI20, R_LARCH_PCALA64_LO20 and R_LARCH_PCALA64_HI12 in ABI is incorrect.
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L518
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L530
https://github.com/loongson/la-abi-specs/blob/v2.20/laelf.adoc?plain=1#L535

@xry111
Copy link
Contributor

xry111 commented Nov 20, 2023

Phew. The doc for these relocs in https://github.com/gimli-rs/object, written by me, is also wrong.

@xry111
Copy link
Contributor

xry111 commented Nov 20, 2023

I made a brute force algorithm:

def uintptr_t(x):
    return x & ((1 << 64) - 1)

def ptrdiff_t(x):
    x = uintptr_t(x)
    if x & (1 << 63):
        x = (1 << 64) - x
    return x

def pcalau12i(pc, imm):
    assert(imm in range(-0x80000, 0x80000))
    return uintptr_t(pc + (imm << 12)) & ~0xfff

def simm(width, bits):
    assert(bits >= 0 and bits < (1 << width))
    return bits - ((1 << width) if bits & (1 << (width - 1)) else 0)

def reloc(dest, pc):
    lo12 = dest & 0xfff
    a1_first_val = uintptr_t(simm(12, lo12))

    for hi20 in range(-0x80000, 0x80000):
        a0 = pcalau12i(pc, hi20)
        # We need to insert something into a1[32..] to make a0 + a1 = dest,
        # i. e. a1 = dest - a0.
        want_a1 = uintptr_t(dest - a0)
        if (want_a1 & 0xffffffff) != (a1_first_val & 0xffffffff):
            continue

        lo20 = (want_a1 >> 32) & 0xfffff
        hi12 = (want_a1 >> 52) & 0xfff
        return (lo12, hi20 & 0xfffff, lo20, hi12)

    raise Exception("should not be reachable")

def test(dest, pc):
    reloc(dest, pc)
    lo12, hi20, lo20, hi12 = reloc(dest, pc)
    a0 = pcalau12i(pc, simm(20, hi20))
    a1 = uintptr_t(simm(12, lo12))
    a1 &= ~(0xffffffff << 32)
    a1 |= lo20 << 32
    a1 |= hi12 << 52
    assert(uintptr_t(a1 + a0) == dest)

test(0xfffffffffffff8ee, 0x8ee)
test(0, 0x80000ffc)
test(0, 0x80001000)

Now I need to optimize for hi20 in range(-0x80000, 0x80000): line, limiting hi20 into one or two possible values.

@xry111
Copy link
Contributor

xry111 commented Nov 22, 2023

I cannot figure out any reasonable solution if we allow the address materialize sequence to cross page boundary. It looks like we'll have to generate something like:

.la_sym_1: pcalau12i $a0, %pc_hi20(sym)
nop # "nop" for other instructions scheduled here
nop
nop
addi.d $a1, $zero, %pc_lo12(sym)
nop
nop
lu32i.d $a1, %pc64_lo20(sym + (. - .la_sym_1))
nop
nop
nop
lu52i.d $a1, $a1, %pc64_hi12(sym + (. - la_sym_1))

With relaxation disabled, ". - la_sym1" can be evaluated precisely by the assembler so we can store it into the addend; with relaxation enabled, we'll just keep the entire address materializing sequence intact (for taking advantage with relaxation) and we end up:

pcalau12i $a0, %pc_hi20(sym)
addi.d $a1, $zero, %pc_lo12(sym)
lu32i.d $a1, %pc64_lo20(sym + 8)
lu52i.d $a1, $a1, %pc64_hi12(sym + 12)

"8" and "12" can be encoded into addend too.

@MaskRay
Copy link
Member

MaskRay commented Nov 22, 2023

I made a brute force algorithm:

def uintptr_t(x):
    return x & ((1 << 64) - 1)

def ptrdiff_t(x):
    x = uintptr_t(x)
    if x & (1 << 63):
        x = (1 << 64) - x
    return x

def pcalau12i(pc, imm):
    assert(imm in range(-0x80000, 0x80000))
    return uintptr_t(pc + (imm << 12)) & ~0xfff

def simm(width, bits):
    assert(bits >= 0 and bits < (1 << width))
    return bits - ((1 << width) if bits & (1 << (width - 1)) else 0)

def reloc(dest, pc):
    lo12 = dest & 0xfff
    a1_first_val = uintptr_t(simm(12, lo12))

    for hi20 in range(-0x80000, 0x80000):
        a0 = pcalau12i(pc, hi20)
        # We need to insert something into a1[32..] to make a0 + a1 = dest,
        # i. e. a1 = dest - a0.
        want_a1 = uintptr_t(dest - a0)
        if (want_a1 & 0xffffffff) != (a1_first_val & 0xffffffff):
            continue

        lo20 = (want_a1 >> 32) & 0xfffff
        hi12 = (want_a1 >> 52) & 0xfff
        return (lo12, hi20 & 0xfffff, lo20, hi12)

    raise Exception("should not be reachable")

def test(dest, pc):
    reloc(dest, pc)
    lo12, hi20, lo20, hi12 = reloc(dest, pc)
    a0 = pcalau12i(pc, simm(20, hi20))
    a1 = uintptr_t(simm(12, lo12))
    a1 &= ~(0xffffffff << 32)
    a1 |= lo20 << 32
    a1 |= hi12 << 52
    assert(uintptr_t(a1 + a0) == dest)

test(0xfffffffffffff8ee, 0x8ee)
test(0, 0x80000ffc)
test(0, 0x80001000)

Now I need to optimize for hi20 in range(-0x80000, 0x80000): line, limiting hi20 into one or two possible values.

I like using a program to enumerate all the interesting bits. We should figure out a concise program to compute the offset and avoid the special cases (like in the current patch):

  if (result == 0xfff'fffff'fffff'000 && negativeA) // special case
    return 0xfff'fffff'00000'000;

We should have a description how the new code does its job and a concise proof as the comment.

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@xry111
Copy link
Contributor

xry111 commented Nov 23, 2023

We should have a description how the new code does its job and a concise proof as the comment.

The range(-0x80000, 0x80000) can be obviously optimized to [page_low, page_high] where

page_low = ((dest >> 12) - (pc >> 12)) & 0xfffff
page_high = (page_low + 1) & 0xfffff

Then the remaining task is proving the "raise Exception("should not be reachable")" line is really unreachable...

SixWeining added a commit to SixWeining/llvm-project that referenced this pull request Nov 25, 2023
Defer the compution of `negativeB` because adding 0x1000 to original
`result` may yield a different `negativeB` value. Actually this issue
was first reported by @rui314 at https://reviews.llvm.org/D138135#4568594.

Note that even with this patch, the handling of R_LARCH_PCALA64_*
relocs are NOT totally correct, because current approach assumes those
four instructions (pcalau12i/addi.d/lu32i.d/lu52i.d) are in the same
4K-page which is not always true. It is possible to document this
assumption as a constraint in psABI in future. But at least this patch
is necessary.

See llvm#71907 and
loongson-community/discussions#17 for details.
@SixWeining
Copy link
Contributor Author

Replace this ugly fix with a new PR #73387. Thanks.

@SixWeining SixWeining closed this Nov 25, 2023
SixWeining added a commit that referenced this pull request Jan 10, 2024
…v2.30 (#73387)

psABI v2.30 requires the extreme code model instructions sequence
(pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent.

See #71907 and
loongson-community/discussions#17 for details.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
…v2.30 (llvm#73387)

psABI v2.30 requires the extreme code model instructions sequence
(pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent.

See llvm#71907 and
loongson-community/discussions#17 for details.
wanghao75 pushed a commit to openeuler-mirror/llvm-project that referenced this pull request Oct 9, 2024
…v2.30 (#73387)

psABI v2.30 requires the extreme code model instructions sequence
(pcalau12i+addi.d+lu32i.d+lu52i.d) to be adjacent.

See llvm/llvm-project#71907 and
loongson-community/discussions#17 for details.

(cherry picked from commit 38394a3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants