Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[avr] Wrong code to indirect call functions located beyond the first 128KiB code segment. #58856

Closed
sprintersb opened this issue Nov 7, 2022 · 12 comments
Assignees

Comments

@sprintersb
Copy link

sprintersb commented Nov 7, 2022

Consider the following C code:

int func (void);

int main (void)
{
    int (*f)(void) = func;
    __asm ("" : "+r" (f));
    return f();
}

and compiled with:

> clang --target=avr foo.c -Os -save-temps -mmcu=atmega2560 -Os -o foo.elf -Wl,--defsym,func=0x22222
> avr-objdump -d foo.elf > foo.lst
> avr-objdump -dr foo.o > foo.lss

So that the code mimics a function func located at 0x22222. There are problems with the generated code:

  • main is using icall to execute f(), but icall will always target the low 128KiB segment. The correct instruction is eicall resp. eijmp for tail-calls.
  • The compiler uses the wrong relocations like pm_lo8(sym) and pm_hi8(sym), where the correct code will use lo8(gs(sym)) and hi8(gs(sym)); so the GNU linker will generate stubs as needed. Correct RELOCs are R_AVR_LO8_LDI_GS resp. R_AVR_HI8_LDI_GS.
  • If main was located beyond the first 128KiB segment, icall/ijmp won't work either, correct are eicall/eijmp, cf. also startup-code.

FYI, the disassembly of main is as of foo.lst is:

0000012c <main>:
 12c:	e1 e1       	ldi	r30, 0x11	; 17
 12e:	f1 e1       	ldi	r31, 0x11	; 17
 130:	09 95       	icall
 132:	08 95       	ret

so the call will target 0x2222, not 0x22222.

@aykevl
Copy link
Contributor

aykevl commented Jan 1, 2023

avr-gcc generates the following code:

00000000 <main>:
   0:   ldi     r30, 0x00       ; 0
                        0: R_AVR_LO8_LDI_GS     func
   2:   ldi     r31, 0x00       ; 0
                        2: R_AVR_HI8_LDI_GS     func
   4:   ijmp

while Clang generates the following code:

00000000 <main>:
   0:   ldi     r30, 0x00       ; 0
                        0: R_AVR_LO8_LDI_PM     func
   2:   ldi     r31, 0x00       ; 0
                        2: R_AVR_HI8_LDI_PM     func
   4:   icall
   6:   ret

The ijmp vs icall is simply a matter of LLVM not yet supporting tail calls. The actual bug here appears to be the relocation type, which is AVR_*_LDI_GS for avr-gcc and AVR_*_LDI_PM for Clang. I'm going to assume this is the actual bug.

@aykevl
Copy link
Contributor

aykevl commented Jan 1, 2023

If main was located beyond the first 128KiB segment, icall/ijmp won't work either, correct are eicall/eijmp, cf. also startup-code.

Ah, yes, that is another (separate) bug. I was testing with the atmega328p, which correctly generates icall in Clang. For the atmega2560 for example, it should generate eicall instead of icall.

@sprintersb
Copy link
Author

The actual bug here appears to be the relocation type, which is AVR_*_LDI_GS for avr-gcc and AVR_*_LDI_PM for Clang. I'm going to assume this is the actual bug.

Yes.

For the atmega2560 for example, it should generate eicall instead of icall.

Yes. FYI, to factor this out in inline assembly, avr-gcc supports %! which prints a single e for devices with 3-byte PC, and an empty string otherwise. So you get the appropriate instruction by menas of %!icall resp. %!ijmp.

@benshi001 benshi001 self-assigned this Jan 19, 2023
@benshi001
Copy link
Member

avr-gcc emits R_AVR_LO8_LDI_GS and R_AVR_HI8_LDI_GS with a eijmp, while clang emits R_AVR_LO8_LDI_PM and R_AVR_HI8_LDI_PM with icall. Though the instruction icall is wrong, I think the relocations are correct.

@sprintersb
Copy link
Author

sprintersb commented Jan 21, 2023

I think the relocations are correct.

avr-gcc shows the relocation must emit gs() so the linker will generate stub as needed:

void fun (int);

void (*pfun)(int) = fun;

void (*get_pfun (void))(int)
{
    return fun;
}
get_pfun:
	ldi r24,lo8(gs(fun))
	ldi r25,hi8(gs(fun))
	ret
	.size	get_pfun, .-get_pfun

.global	pfun
	.data
	.type	pfun, @object
	.size	pfun, 2
pfun:
	.word	gs(fun)
.global __do_copy_data

For devices with 2-byte PC or when the label is in the 128 KiB code block covered by EIND, this will make no difference.

The difference is for devices with 3-byte PC and when the label outside the 128 KiB segment covered by EIND: For such gs(sym), the linker will generate a stub in .trampolines that contains a forwarding 'JMP' to the very target. .trampolines is accessible via eicall / EIND.

The actual bug here appears to be the relocation type, which is AVR_*_LDI_GS for avr-gcc and AVR_*_LDI_PM for Clang. I'm going to assume this is the actual bug.

The bug has 2 parts:

  • Wrong relocation (must be gs())
  • Wrong instruction (must be eicall or eijmp on devices with 3-byte PC).

An example where EIND≠0 is a bootloader located in the upper 128 KiB of a ATmega2560. The vector table will be in the upper 128 kiB, and startup-code uses __vectors location to init EIND:

#ifdef __AVR_3_BYTE_PC__
	ldi	r16, hh8(pm(__vectors))
	out	_SFR_IO_ADDR(EIND), r16
#endif	/* __AVR_3_BYTE_PC__ */

cf. .init2 code in gcrt1.S from AVR-LibC.

The linker has similar magic to infer the value of EIND resp. __vectors, and uses this to infer which gs need actual stub generation (outside EIND range) and where the stubs will be located (inside EIND range).

See also EIND and Devices with More Than 128 KiB of Flash in avr-gcc docs.

@aykevl
Copy link
Contributor

aykevl commented Jan 22, 2023

There is probably a third bug: I don't think ld.lld currently supports trampoline functions.

@benshi001
Copy link
Member

fixed by https://reviews.llvm.org/D142298.

I will take a look at how to emit trampoline functions in lld.

@sprintersb
Copy link
Author

sprintersb commented Jan 22, 2023

So to make this work we need 3 parts:

  1. Use EIJMP/EICALL (done afaiu)
  2. Use R_AVR_LO8_LDI_GS, R_AVR_HI8_LDI_GS when taking the address of a code label (not for data labels).
  3. Emit the linker stubs in section .trampolines.

To extend on the opening comment #1, here is the code as generated by GNU:

> avr-gcc foo.c -o foo.elf -Os -mrelax -mmcu=atmega2560 -Wl,-Map,foo.map -Wl,--defsym,func=0x22222 
> avr-objdump -d foo.elf

foo.lst

00000000 <__vectors>:
   0:	73 c0       	rjmp	.+230    	; 0xe8 startup code
   2:	00 00       	nop
   4:	7b c0       	rjmp	.+246    	; 0xfc <__bad_interrupt>
[snipped rest of vectab]

;; These are the "linker stubs" or "trampolines".
;; Because func is out of reach of EICALL, we use indirect call / jump
;; to here and then forward to the very target, func.
000000e4 <__trampolines_start>:
  e4:	0d 94 11 11 	jmp	0x22222	; 0x22222 <func>

;; Startup-code from AVR-LibC
000000e8 <__ctors_end>:
  e8:	11 24       	eor	r1, r1
  ea:	1f be       	out	0x3f, r1	; SREG
  ec:	cf ef       	ldi	r28, 0xFF	; 255
  ee:	d1 e2       	ldi	r29, 0x21	; 33
  f0:	de bf       	out	0x3e, r29	; SPH
  f2:	cd bf       	out	0x3d, r28	; SPL
  f4:	00 e0       	ldi	r16, 0x00	; hh8(__vectors)
  f6:	0c bf       	out	0x3c, r16	; EIND
  f8:	02 d0       	rcall	.+4      	; 0xfe <main>
  fa:	04 c0       	rjmp	.+8      	; 0x104 <_exit>

000000fc <__bad_interrupt>:
  fc:	81 cf       	rjmp	.-254    	; 0x0 <__vectors>

000000fe <main>:
  fe:	e2 e7       	ldi	r30, 0x72	; lo8 (__trampolines_start / 2)
 100:	f0 e0       	ldi	r31, 0x00	; hi8 (__trampolines_start / 2)
 102:	19 94       	eijmp

[snip]

On the other hand, if the application starts at, say, 0x20000 due to -Ttext=0x20000, then EIND=1 and no trampoline is needed:

000200fa <main>:
   200fa:	e1 e1       	ldi	r30, 0x11	; lo8(0x22222 / 2)
   200fc:	f1 e1       	ldi	r31, 0x11	; hi8(0x22222 / 2)
   200fe:	19 94       	eijmp

But a stub is needed if, say, func is located in lower 128 KiB of code space.

@benshi001
Copy link
Member

benshi001 commented Jan 22, 2023

With my patch and your original test program (as below)

int func (void);
int main (void) {
    int (*f)(void) = func;
    __asm ("" : "+r" (f));
    return f();
}

Build with command

./bin/clang a.c -O3 -Wall --target=avr -mmcu=atmega2560 -Wl,--defsym,func=0x22222 -o a.out

I think all the 3 parts you mentioned become correct.

  1. The icall is replaced by a eicall:
00000104 <main>:
 104:   e2 e7           ldi     r30, 0x72       ; 114
 106:   f0 e0           ldi     r31, 0x00       ; 0
 108:   19 95           eicall
 10a:   08 95           ret
  1. There is a trampoline function locates at 0xe4.
...
  dc:   0c 94 80 00     jmp     0x100   ; 0x100 <__bad_interrupt>
  e0:   0c 94 80 00     jmp     0x100   ; 0x100 <__bad_interrupt>

000000e4 <__trampolines_start>:
  e4:   0d 94 11 11     jmp     0x22222 ; 0x22222 <func>

000000e8 <__ctors_end>:
  e8:   11 24           eor     r1, r1
  ea:   1f be           out     0x3f, r1        ; 63
...
  1. The relocations switch to GS* from PM*. The linked address is indeed the trampoline's address.
        .type   main,@function
main:                                   ; @main
; %bb.0:
        ldi     r30, lo8_gs(func)
        ldi     r31, hi8_gs(func)
        ;APP
        ;NO_APP
        eicall
        ret

So I am sure all your concerns are fixed by my patch.

@sprintersb
Copy link
Author

So I am sure all your concerns are fixed by my patch.

  1. will work for the GNU linker, but above was a mention of a different linker. So I was not sure about that.

@benshi001
Copy link
Member

So I am sure all your concerns are fixed by my patch.

  1. will work for the GNU linker, but above was a mention of a different linker. So I was not sure about that.

We need not consider lld now, currently clang still calls avr-ld as default linker.

@benshi001
Copy link
Member

fixed in 029f669

CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants