New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking with a memory region at 0x00000000 => relocation truncated to fit #120

Open
Dolu1990 opened this Issue Jan 30, 2018 · 12 comments

Comments

Projects
None yet
7 participants
@Dolu1990
Copy link

Dolu1990 commented Jan 30, 2018

Hi,

Got an error when i'm trying to link some binaries at address 0x00000000. If i try at 0x00000004 it pass the compilation successfuly.

/home/spinalvm/hdl/tmp/freedom-e-sdk/software/dhrystone/../../bsp/env/start.S:35:(.init+0x3c): relocation truncated to fit: R_RISCV_PCREL_LO12_I against `.L0 '
collect2: error: ld returned 1 exit status

Here are steps to reproduce :

  1. Git clone https://github.com/sifive/freedom-e-sdk.git
  2. Change https://github.com/sifive/freedom-e-sdk/blob/master/bsp/env/freedom-e300-hifive1/dhrystone.lds#L7 into flash (rxai!w) : ORIGIN = 0x00000000, LENGTH = 512M
  3. Run make clean all BOARD=freedom-e300-hifive1 LINK_TARGET=dhrystone into https://github.com/sifive/freedom-e-sdk/tree/master/software/dhrystone

I have no idea why it happen, look like a bug in the toolchain ?

@sorear

This comment has been minimized.

Copy link

sorear commented Jan 30, 2018

The medany code model implemented in gcc requires that code and data be within ±2GiB of each other. You have moved the code (in flash) too far away from the data (in ram); if you want flash at zero, try moving ram to 0x4000_0000 or so.

@Dolu1990

This comment has been minimized.

Copy link
Author

Dolu1990 commented Jan 30, 2018

Right,
I just tried now to move the other section like this :
ram (wxa!ri) : ORIGIN = 0x40000000, LENGTH = 16K

But it doesn't solve the issue.
I also had the same issue with a simpler project which only had a single memory region at 0x00000000

@palmer-dabbelt

This comment has been minimized.

Copy link
Member

palmer-dabbelt commented Jan 31, 2018

There's actually some undocumented behavior in the RISC-V medany code model where it can link within a 2GiB range and can also link in a 2GiB range around 0. It's necessary to be able to link these low addresses because undefined weak symbols end up with the address 0, so in order to build Linux (which uses an undefined weak symbol) in medany mode we need to be able to link against addresses around 0 from anywhere. You shouldn't rely on this... :)

@Dolu1990 Can you reduce this to a smaller test case? Things are pretty crazy here right now.

@jim-wilson

This comment has been minimized.

Copy link
Collaborator

jim-wilson commented Jan 31, 2018

I managed to reproduce. It appears that the PCREL HI reloc is being converted into a GPREL reloc, but the PCREL LO reloc is not, and then that gives the relocation error as there is no matching PCREL HI reloc that we can get the address from. These linker issues are complicated to look at though, so I'm still trying to figure out what is really going on.

The link works if you turn off relaxation. You can use the gcc option -Wl,--no-relax. Or if calling the linker directly, just use --no-relax.

@jim-wilson

This comment has been minimized.

Copy link
Collaborator

jim-wilson commented Jan 31, 2018

The problem is in _bfd_riscv_relax_section in bfd/elfnn-riscv.c. It does for local symbols
sym_sec = elf_elfsections (abfd)[isym->st_shndx]->bfd_section;
if (sec_addr (sym_sec) == 0)
continue;
So if a section base address is 0, then local relocs will not be relaxed. In a HI/LO pair, the hi reloc will point to a global symbol, and the lo reloc will point at a local symbol that points at the instruction with the hi reloc. Which means that if a section base address is 0, the high relocs will be relaxed, but the lo relocs will not, and the result will be a linker error due to the relaxation mismatch.

Unfortunately, I don't have access to any useful history here, and there is no comment explaining why this is done, so it isn't clear what the fix should be. One option is to disable the zero check, and then wait to see what breaks. I'm hoping I can think of a better option.

Meanwhile, you either have to avoid assigning any section to address 0, or you have to turn off linker relaxation.

@aswaterman

This comment has been minimized.

Copy link
Member

aswaterman commented Jan 31, 2018

This might have been special casing to deal with SHN_COMMON and/or SHN_ABS. Instead of actually handling them properly, we just didn't relax to them.

@jim-wilson

This comment has been minimized.

Copy link
Collaborator

jim-wilson commented Feb 7, 2018

I tried instrumenting the code and compiling lots of stuff, and haven't yet found any case where the zero address check might matter. I have seen no SHN_COMMON or SHN_ABS section here, I have seen no debug info or other unallocated section which would have a zero base address, etc. I have an additional theory that maybe the zero section address check was added first, and then the SHN_UNDEF check was added later, making the zero section address check unnecessary. I haven't tested that yet. I also want to try writing some targeted testcases to see if it is possible to get a SHN_COMMON or SHN_ABS or debug info section here. If all of that fails, then I probably have to disable the test and then wait to see what if anything breaks.

@aswaterman

This comment has been minimized.

Copy link
Member

aswaterman commented Feb 7, 2018

My wild theory came from looking at other ports' similar code, e.g., https://github.com/riscv/riscv-binutils-gdb/blob/riscv-binutils-2.29/bfd/elf32-nds32.c#L6677

If you can't get SHN_COMMON or SHN_ABS to trigger a problem with targeted test cases, I agree with your conclusion.

@jim-wilson

This comment has been minimized.

Copy link
Collaborator

jim-wilson commented Feb 15, 2018

I added a patch upstream to ifdef out the problematic code, as I've been unable to find any reason for it, and believe that it is not necessary.

@colin4124

This comment has been minimized.

Copy link

colin4124 commented Aug 29, 2018

@jim-wilson So, this issue is fixed in upstream? Does the latest riscv-gcc fixed the issue?

@jim-wilson

This comment has been minimized.

Copy link
Collaborator

jim-wilson commented Aug 29, 2018

It is a binutils problem not a gcc problem. The patch that fixes it is here.
https://sourceware.org/ml/binutils/2018-02/msg00249.html

riscv-gnu-toolchain is using the riscv-binutils-gdb binutils-2.30 branch from January, which does not have the fix. The fix is in binutils-2.31, but riscv-gnu-toolchain hasn't yet been updated to use that yet. Someone needs to volunteer to do the work. Kito from Andes has indicated that he might have time to do that, but uncertain when/if he will get around to it.

@kito-cheng

This comment has been minimized.

Copy link
Member

kito-cheng commented Aug 29, 2018

I've sent a PR just now :) but need take times for CI.
riscv/riscv-gnu-toolchain#367

tomtor added a commit to tomtor/icicle that referenced this issue Jan 14, 2019

Move memory to 0x10000000 for ALL configurations.
Prevent linker bug: riscv/riscv-gcc#120

In addition this prevents bumping into the IO region when >64K
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment