Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: support calls to SDYNIMPORT symbols on mips64 #46178

Open
4a6f656c opened this issue May 14, 2021 · 7 comments
Open

cmd/link: support calls to SDYNIMPORT symbols on mips64 #46178

4a6f656c opened this issue May 14, 2021 · 7 comments
Labels

Comments

@4a6f656c
Copy link
Contributor

@4a6f656c 4a6f656c commented May 14, 2021

The mips64 port currently fails to handle calls to SDYNIMPORT symbols, even when external linking is in use, since it emits R_MIPS_26 relocations, resulting in "relocation truncated to fit" warnings. An equivalent call from assembly when compiled with LLVM results in the following:

   0:   df990000        ld      t9,0(gp)
   4:   0320f809        jalr    t9
   8:   00000000        nop

Furthermore, two relocations are produced for the external symbol - a R_MIPS_CALL16 relocation for the ld instruction and a R_MIPS_JALR relocation for the jalr instruction.

There are a couple of additional things to note here - firstly, PIC code requires that register 25 (t9) contains the address of the called function, which is why this register is used here. Secondly, when the external linker performs the R_MIPS_CALL16 relocation is is done with an expected gp register value, which is loaded by the dynamic linker. Unfortunately, Go currently uses this register and clobbers the value placed in it by the dynamic linker, breaking the relocation.

This means there are effectively two things to solve:

  1. Currently a CALL translates to a single JAL instruction with an R_CALLMIPS (aka R_MIPS_26) relocation - given that the assembler does not know if this is an internal function or an SDYNIMPORT symbol, the easiest option is to likely produce trampolines for the SDYNIMPORT calls, modelling the above assembly and generating the two matching relocations.

  2. In order for these relocations to work we likely have to avoid clobbering the gp register - one option may be to change Go's REGSB register (as was done for tp on riscv64). The other is to save the value of the gp register at start up and restore the gp value before calling an SDYNIMPORT, switching it back to Go's value after the call returns.

I've got a proof of concept working with external linking, trampoline generation and gp register save/restore on openbsd/mips64 (I'll make this available soon) - in the mean time, I'm interested in feedback and/or other suggestions on these issues/approach, before I proceed further.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 14, 2021

@gopherbot
Copy link

@gopherbot gopherbot commented May 14, 2021

Change https://golang.org/cl/320249 mentions this issue: cmd/link,runtime: SDYNIMPORT calls on openbsd/mips64

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented May 14, 2021

I'd be surprised it use gp-relative addressing. I thought it does not use that mode by default? What happens if you pass -G 0 or -mno-gpopt to the C compiler?

An equivalent call from assembly when compiled with LLVM results in the following:
0: df990000 ld t9,0(gp)
4: 0320f809 jalr t9
8: 00000000 nop

I'm not sure I followed this. What equivalent to what? Is this code generated by Go or by the C compiler/linker?

Currently a CALL translates to a single JAL instruction with an R_CALLMIPS (aka R_MIPS_26) relocation - given that the assembler does not know if this is an internal function or an SDYNIMPORT symbol, the easiest option is to likely produce trampolines for the SDYNIMPORT calls, modelling the above assembly and generating the two matching relocations.

I agree this sounds like a good solution. But I don't think the code snippet has to be like that above. It is true that a call has to be made through R25. But the trampoline doesn't need to be exactly like that. It can use other instruction sequence and relocations to load the address to R25, without using the gp register.

Another possibility is to always generate indirect calls through R25 on OpenBSD (possibly let the linker to rewrite to direct calls if the target is not DYNIMPORT). What we need then is to handle relocations for the address of a DYNIMPORT symbol.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 15, 2021

I'd be surprised it use gp-relative addressing. I thought it does not use that mode by default? What happens if you pass -G 0 or -mno-gpopt to the C compiler?

The use of -mno-gpopt with the below assembly results in:

clang: warning: argument unused during compilation: '-mno-gpopt' [-Wunused-command-line-argument]

I'll try compiling some actual C code to see if this has any impact there.

Also, -G seems to be a linker option that only applies to ECOFF (at least on this linker):

-Gvalue
       --gpsize=value
           Set the maximum size of objects to be optimized using the GP
           register to size.  This is only meaningful for object file formats
           such as MIPS ECOFF which supports putting large and small objects
           into different sections.  This is ignored for other object file
           formats.

An equivalent call from assembly when compiled with LLVM results in the following:
0: df990000 ld t9,0(gp)
4: 0320f809 jalr t9
8: 00000000 nop

I'm not sure I followed this. What equivalent to what? Is this code generated by Go or by the C compiler/linker?

Apologies for not being clearer - this is basically the result of compiling an equivalent assembly call to a global symbol using clang, then disassembling the object file:

$ cat b.s
        .text
        .global bar

foo:
        jal bar
        nop
$ clang -c -o b.o b.s                
$ objdump -d b.o

b.o:     file format elf64-tradbigmips

Disassembly of section .text:

0000000000000000 <foo>:
   0:   df990000        ld      t9,0(gp)
   4:   0320f809        jalr    t9
   8:   00000000        nop
   c:   00000000        nop
$ readelf -a b.o
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
...
Relocation section '.rela.text' at offset 0xf0 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  00030000000b R_MIPS_CALL16     0000000000000000 bar + 0
                    Type2: R_MIPS_NONE      
                    Type3: R_MIPS_NONE      
000000000004  000300000025 R_MIPS_JALR       0000000000000000 bar + 0
                    Type2: R_MIPS_NONE      
                    Type3: R_MIPS_NONE      
...

Currently a CALL translates to a single JAL instruction with an R_CALLMIPS (aka R_MIPS_26) relocation - given that the assembler does not know if this is an internal function or an SDYNIMPORT symbol, the easiest option is to likely produce trampolines for the SDYNIMPORT calls, modelling the above assembly and generating the two matching relocations.

I agree this sounds like a good solution. But I don't think the code snippet has to be like that above. It is true that a call has to be made through R25. But the trampoline doesn't need to be exactly like that. It can use other instruction sequence and relocations to load the address to R25, without using the gp register.

Good point - I guess the trampoline could use REGTMP in place of the GP register, however we would still need to load the same value that the linkers expected to be there, given that this is encoded into the final relocation, for example:

0000000010082178 <libc_pthread_create-tramp0>:
    10082178:   df998830        ld      t9,-30672(gp)
    1008217c:   03200009        jalr    zero,t9
...

At least the original value would not have to be restored following the call.

Another possibility is to always generate indirect calls through R25 on OpenBSD (possibly let the linker to rewrite to direct calls if the target is not DYNIMPORT). What we need then is to handle relocations for the address of a DYNIMPORT symbol.

I also considered that - given that we have a limited number of SDYNIMPORT calls (in runtime and syscall), it seems somewhat wasteful to generate an ld and jalr pair for every call, only to replace those with a jal and nop pair almost everywhere. On the flip side it would mean that we'd avoid generating trampolines and running the trampoline processing on mips64.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 15, 2021

Compiling and disassembling some C code gives us a better insight/option:

$ cat c.c
void bar();

void
foo()
{
        bar();
}
$ clang -fno-stack-protector -c -o c.o c.c 
$ objdump -d -r c.o

c.o:     file format elf64-tradbigmips

Disassembly of section .text:

0000000000000000 <foo>:
   0:   67bdffe0        daddiu  sp,sp,-32
   4:   ffbf0018        sd      ra,24(sp)
   8:   ffbe0010        sd      s8,16(sp)
   c:   ffbc0008        sd      gp,8(sp)
  10:   03a0f025        move    s8,sp
  14:   3c010000        lui     at,0x0
                        14: R_MIPS_GPREL16      foo
                        14: R_MIPS_SUB  *ABS*
                        14: R_MIPS_HI16 *ABS*
  18:   0039082d        daddu   at,at,t9
  1c:   64210000        daddiu  at,at,0
                        1c: R_MIPS_GPREL16      foo
                        1c: R_MIPS_SUB  *ABS*
                        1c: R_MIPS_LO16 *ABS*
  20:   dc390000        ld      t9,0(at)
                        20: R_MIPS_CALL16       bar
                        20: R_MIPS_NONE *ABS*
                        20: R_MIPS_NONE *ABS*
  24:   0020e025        move    gp,at
  28:   0320f809        jalr    t9
                        28: R_MIPS_JALR bar
                        28: R_MIPS_NONE *ABS*
                        28: R_MIPS_NONE *ABS*
  2c:   00000000        nop
  30:   03c0e825        move    sp,s8
  34:   dfbc0008        ld      gp,8(sp)
  38:   dfbe0010        ld      s8,16(sp)
  3c:   dfbf0018        ld      ra,24(sp)
  40:   67bd0020        daddiu  sp,sp,32
  44:   03e00008        jr      ra
  48:   00000000        nop

This is generating code and relocations that acquires the general pointer value via relocations, rather than requiring it to be preserved, also meaning that we do not need to clobber the Go SB register. This does however rely on the function address being available (normally in register 25 aka t9) - we can handle this via a six instruction dance in the trampoline. I'm currently still testing, however this means we should be able to handle SDYNIMPORT using a 12 instruction trampoline with four relocations, requiring no other changes outside of the linker.

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 15, 2021

Ugh, so this works, however the PIC library code we're calling into is expecting gp to contain the correct value... so I guess we're back to having to save and restore the Go SB register post-call (or changing the Go SB register to not conflict with the gp register).

@4a6f656c
Copy link
Contributor Author

@4a6f656c 4a6f656c commented May 15, 2021

I meant to mention that -mno-gpopt can only be used with -mno-abicalls, which results in code being produced with a jal and R_MIPS_26 relocation, which is going to lead to a later failure.

I'm also wondering if we should not be saving/restoring (or reinitialising) the RSB value in asmcgocall or similar, which would allow the trampoline to simply clobber the register.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants