ELF code emitter for Z80 architecture (naiive impl.) #10

pawosm-arm · 2020-04-22T20:51:22Z

Following the ability of the other 8-bit platform supported by LLVM (AVR) to generate ELF object files, I have prepared this crude ELF code emmiter for Z80. It was not extensively tested, mostly due to the lack of a compatible runtime library (e.g. for CP/M system).

Some caveats:

- The Z80 backend emits a lot of compiler library calls as specified in
  llvm/lib/Target/Z80/Z80ISelLowering.cpp making it hard for practical use

jacobly0 · 2020-04-23T00:51:10Z

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

pawosm-arm · 2020-04-23T08:48:52Z

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

This -march=generic would be helpful to generate better code as for now, idxhalf's and lea's are implemented with generating workaround machine code (which isn't the best solution). Yet what I would like to see the most is the less use of compiler library routines. E.g., in the example main.c code I wrote:

const int n = ((unsigned char)(argv[0][0]));

...where argv is a pointer to a pointer of signed char elements. This explicit signed char to unsigned char cast may look awkward, unfortunately, if I omit that, the Z80 backend will generate a call to a compiler function __bshrs (which I don't have implemented) with the casted value passed in the A register and 7 passed in the B register. Unfortunately, casting of signed character to integer happens pretty often in C codes, so this needs to be addressed somehow.

jacobly0 · 2020-05-02T10:38:12Z

Alright, I finally got caught up with unfixed bugs and pushed, so the sext opt I wrote a few weeks ago should solve your __bshrs issue. Note that, naturally, it is not reasonable to replace every instance of libcalls, but I can still do it if an optimization warrants it. Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly, for example here's an implementation of __bshrs.

__bshrs:
	inc b
	dec b
	ret z
	push bc
.loop:
	sra a
	djnz .loop
	pop bc
	ret

pawosm-arm · 2020-05-02T13:47:42Z

Hi @jacobly0

Also, if you need help implementing a libcall just ask,

I'm thinking about some more general approach that makes use of the userspace C libraries offered by LLVM compiler suite. There are two sub-projects like that in the monorepo, compiler-rt and libc, I wonder which one is more suitable. For practical reasons I was rather looking at the libc project.

I was having some hopes regarding this libc project: it's new, not overgrown (yet) and intended for mulit-targeting, so it may be easier to extend. Namely two targets could be added: cpm/z80 for CP/M userspace programs and none/z80 (or even one more, none/ez80) for bare-metal projects. It's not that hard on CP/M where the whole of userspace program needs to fit into 64kB TPA area, eliminating all the problems with FAR pointers (a problem that haunts the most of 8 and 16 bit platforms), leaving only the libm part with a 'magical coding' requirements. Some of the problems I see are:

it's laborious
I didn't figure out yet how to modify libc project CMake files to use (for building libc) the clang compiler just built during the same make invocation, instead of using the compiler used for building the rest of LLVM project (with Z80-cross-compiling clang itself)
similarly, I still didn't figure out yet how to modify CMake files to ensure static library generation even if the rest of the current LLVM building process is intended to build shared libraries....

Any hints are welcomed.

jacobly0 · 2020-05-02T14:45:46Z

Well they all contain disjoint routines, so it's not really a choice between one or the other.

The purpose of compiler-rt is to provide compiler-specific builtins that compute operations that may not be available as an instruction on every processor, but are representable in the source language. Given the simplistic nature of the z80 instruction set, it will need quite a lot of these. Even though it has C implementations of various intrinsics, the majority of these are not going to be useful since they assume a minimal basic set of instructions which the z80 simply does not have. In the end there is just no escaping writing basic operations in assembly such as char * char (note that compiler-rt is full of specialized target-specific assembly routines in the first place anyway).

The purpose of libc on the other hand is to provide a standard set of C routines that can be used by programs to interface with the OS. These are written almost entirely in C except that the code that interfaces directly with the OS sometimes requires asm. Here the challenge is going to be rewriting said interface to work with a different OS in the first place.

And then libm is a bunch of computation-heavy floating-point based C routines, where some of the functions can be replaced by instructions on modern cpus. However, most implementations of libm don't provide basic float operations which are instead provided by hardware or compiler-rt depending on the cpu. Since the z80 would be entirely soft-float by necessity, it's certainly possible to use C implementations for everything, but still, they will depend heavily on compiler-rt assembly routines for basic operations.

pawosm-arm · 2020-05-03T20:21:30Z

Rebased to follow changes on your branch.

pawosm-arm · 2020-05-06T17:45:56Z

updated to follow recent API changes.

pawosm-arm · 2020-05-06T23:35:53Z

Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly

Yeah, I'm finding more of those symbols, namely:

         U __frameset
         U __frameset0
         U __indcallhl
         U __sand
         U __setflag
         U __lshl
         U __lshru
         U __lsub
         U __sshl
         U __sshrs
         U __sshru
         U __smulu
         U __sxor
         U __lmulu
         U __ladd
         U __lcmpu
         U __ldivs
         U __sdivs
         U __sdivsu
         U __srems

The floats are currently outside of my scope, despite encountering them too:
__fadd, __fdiv, __fmul, __fsub, __fcmp, __fneg

pawosm-arm · 2020-05-11T22:18:38Z

@jacobly0 from this file https://github.com/c4ooo/TI84-CE-Wrapper-for-Monochrome-TI-BASIC-Programs./blob/master/ti84pce.inc I can figure out where those builtin function names originate from. Can I assume that you have some access to the sources for those functions? I need a bit of your help as I'm stuck with my hobby project without them... now it's __smulu, but who knows what it's going to be tomorrow...

adriweb · 2020-05-11T22:39:03Z

You are probably interested in the various files here: https://github.com/CE-Programming/toolchain/tree/master/src/std

jacobly0 · 2020-05-11T22:40:19Z

The source for all of the zds routines are in the previous zilog toolchain release at src/rtl/common but they are written for the ez80 so won't be very useful, certainly not the multiplication routines. You can see copies of the comments that explain what each routine does here. Here's a z80 __smulu taken from elsewhere:

__smulu:
	push	af
	push	bc
	push	de
	ld	e,c
	ld	d,b
	call	.mul
	pop	de
	pop	bc
	pop	af
	ret
.mul:
	xor	a
	cp	h
	jr	z,.swap
	ex	de,hl
.swap:
	ld	c,l
	ld	l,a
	add	a,h
	call	nz,.byte
	ld	a,c
.byte:
	ld	b,8
.next:
	add	hl,hl
	add	a,a
	jr	nc,.skip
	add	hl,de
.skip:
	djnz	.next
	ret

pawosm-arm · 2020-05-11T22:51:17Z

Wow, that's a great response, thanks a million :)

pawosm-arm · 2020-05-14T18:29:36Z

I observed something odd today while trying to use stdint.h's uint32_t type, sizeof(uint32_t) is... 2 (while it's 4 with sdcc as everywhere else...)
fortunately, sizeof(unsigned long) is 4.

jacobly0 · 2020-05-14T18:41:06Z

Then you are using a stdint.h that isn't valid for the z80. The one from glibc definitely won't work. The one packaged with clang would work if you don't have another invalid stdint.h in the include search path. I use a much more simplified version here.

pawosm-arm · 2020-05-14T18:52:30Z

You were right, clang's stdint.h was including-next host's stdint.h unless -U__STDC_HOSTED__ flag is passed, this solves big ints sizes problem, thx!

pawosm-arm · 2020-05-16T10:17:25Z

Hi @jacobly0, can I ask for your implementation of __lcmpu?

jacobly0 · 2020-05-16T11:55:11Z

That's a surprisingly tough one, let me give it a shot...

; speed optimized
__lcmpu:
	or a
	sbc hl,bc
	add hl,bc
	push de
	push bc
	push iy
	pop bc
	ex de,hl
	jr z,.maybeEqual
	sbc hl,bc
	ex de,hl
	pop bc
	jr nz,.notEqual
	jp pe,.overflow
	inc d
.notEqual:
	pop de
	ret
.overflow:
	ld d,$80
	dec d
	pop de
	ret
.maybeEqual:
	sbc hl,bc
	ex de,hl
	pop bc
	pop de
	ret

; size optimized
__lcmpu:
	or a
	sbc hl,bc
	add hl,bc
	push de
	push bc
	push iy
	pop bc
	ex de,hl
	jr z,.maybeEqual
	sbc hl,bc
	push af
	pop hl
	res 6,l
	push hl
	pop af
	db $21
.maybeEqual:
	sbc hl,bc
	ex de,hl
	pop bc
	pop de
	ret

(All assuming you want to avoid index half reg instructions at least)

pawosm-arm · 2020-05-16T14:01:43Z

It works, thanks!

pawosm-arm · 2020-05-27T16:00:26Z

Hi @jacobly0, can I look a the __setflag implementation too?

jacobly0 · 2020-05-27T16:41:36Z

__setflag:
	ret po
	push af
	dec sp
	pop af
	xor $80
	push af
	inc sp
	pop af
	ret

pawosm-arm · 2020-05-27T18:07:58Z

Wow, seems like __lcmpzero (called before __setflag) needs to leave P/V flag set properly. Can I have its code too?

jacobly0 · 2020-05-27T18:33:15Z

; speed optimized
__lcmpzero:
	push bc
	ld c,a
	ld a,l
	or h
	or e
	or d
	jr z,.zero
	ld a,d
	or 1
.zero:
	cp 0
	ld a,c
	pop bc
	ret

; size optimized (and basically what zds does)
__lcmpzero:
	push bc
	ld bc,0
	push bc
	ex (sp),iy
	call __lcmpu
	pop iy
	pop bc
	ret

pawosm-arm · 2020-05-27T20:02:24Z

Works lovely, thanks!

pawosm-arm · 2020-06-02T16:26:16Z

False alarm, I've found a bug in the rest of the code.

pawosm-arm · 2020-06-03T12:51:55Z

AArgh, it generated call to __sshl in a simple array access. Also I need __sdivs to complete this function, @jacobly0 can I see your impl. of both?
(I meant __sshl not __smulu which is already there, my mistake).

jacobly0 · 2022-01-31T11:05:26Z

Can't merge changes to the asm output without a way to select between asm flavors.

pawosm-arm · 2022-01-31T11:49:07Z

Hey @jacobly0 it was 2 years ago, so I guess this PR needs massive rework, at least to solve merge conflicts and to make it compatible with any of the LLVM API changes that occurred during that time.

AFAIR my solution does not output any assembly, it creates ELF binaries directly (namely, ELF LSB relocatable, *unknown arch 0xdc* version 1 (SYSV), not stripped, to have such binaries outputted, the compiler is started with those flags: --target=z80-none-elf -march=z80 -fintegrated-as -fcommon -fno-builtin -U__STDC_HOSTED__), so I don't know what changes to the asm output aren't able to select between asm flavors...

jacobly0 · 2022-01-31T11:50:50Z

I'm referring to this, file-local labels that are unmarked or that start with dot are not supported by the assembler/linker I'm using.

Interestingly, my assembler/linker does support outputting ELF files now and those files are correctly dumped by llvm object inspection tools compiled from the z80 branch.

pawosm-arm · 2022-01-31T12:00:53Z

Sadly, I don't remember now why I had to make that change, so I'd have to track it back before finding a way to make it correct :(

We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () #4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 #5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007f2bb3d144ee in ?? () #8 0x746e692f706d742f in ?? () #9 0x692d747065637265 in ?? () #10 0x2f653631326b3034 in ?? () #11 0x646d632e35353532 in ?? () #12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439

We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 jacobly0#1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 jacobly0#2 0x00007f2bb3d152e4 in ?? () jacobly0#3 0x00007ffcc5f0cc80 in ?? () jacobly0#4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 jacobly0#5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 jacobly0#6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 jacobly0#7 0x00007f2bb3d144ee in ?? () jacobly0#8 0x746e692f706d742f in ?? () jacobly0#9 0x692d747065637265 in ?? () jacobly0#10 0x2f653631326b3034 in ?? () jacobly0#11 0x646d632e35353532 in ?? () jacobly0#12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439 (cherry picked from commit d919d02)

jacobly0 force-pushed the z80 branch from b4df0db to 2cf0ea9 Compare May 2, 2020 10:25

jacobly0 force-pushed the z80 branch from 2cf0ea9 to 12e4fac Compare May 3, 2020 11:29

pawosm-arm force-pushed the z80-elf branch from 5304209 to 519814e Compare May 3, 2020 20:20

pawosm-arm force-pushed the z80-elf branch 2 times, most recently from bff422c to 6c9504d Compare May 6, 2020 17:45

jacobly0 force-pushed the z80 branch from bf4ac0e to 004a9da Compare June 6, 2021 15:15

jacobly0 force-pushed the z80 branch 3 times, most recently from 560682a to a139def Compare August 11, 2021 05:49

jacobly0 force-pushed the z80 branch 3 times, most recently from 0832f55 to 55bc950 Compare January 30, 2022 23:05

jacobly0 force-pushed the z80 branch from d0e4dc9 to 89273d9 Compare February 9, 2022 05:39

jacobly0 force-pushed the z80 branch 4 times, most recently from ce1e5e0 to 6d5b492 Compare March 13, 2022 06:31

jacobly0 force-pushed the z80 branch 3 times, most recently from 67095ed to 03a92e5 Compare May 28, 2022 08:45

jacobly0 force-pushed the z80 branch 3 times, most recently from a26904a to 142c6ae Compare June 4, 2022 12:01

jacobly0 force-pushed the z80 branch from a1bd854 to 911ae58 Compare February 1, 2023 03:18

jacobly0 force-pushed the z80 branch 3 times, most recently from b5b00fc to fcc1b7e Compare October 29, 2023 11:28

jacobly0 force-pushed the z80 branch from fcc1b7e to efc2b24 Compare November 6, 2023 18:49

jacobly0 force-pushed the z80 branch from 99ba9dd to 87dbcb9 Compare March 19, 2024 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELF code emitter for Z80 architecture (naiive impl.) #10

ELF code emitter for Z80 architecture (naiive impl.) #10

pawosm-arm commented Apr 22, 2020 •

edited

Loading

jacobly0 commented Apr 23, 2020

pawosm-arm commented Apr 23, 2020 •

edited

Loading

jacobly0 commented May 2, 2020

pawosm-arm commented May 2, 2020

jacobly0 commented May 2, 2020

pawosm-arm commented May 3, 2020

pawosm-arm commented May 6, 2020

pawosm-arm commented May 6, 2020

pawosm-arm commented May 11, 2020

adriweb commented May 11, 2020 via email •

edited

Loading

jacobly0 commented May 11, 2020 •

edited

Loading

pawosm-arm commented May 11, 2020

pawosm-arm commented May 14, 2020 •

edited

Loading

jacobly0 commented May 14, 2020 •

edited

Loading

pawosm-arm commented May 14, 2020

pawosm-arm commented May 16, 2020

jacobly0 commented May 16, 2020

pawosm-arm commented May 16, 2020

pawosm-arm commented May 27, 2020

jacobly0 commented May 27, 2020 •

edited

Loading

pawosm-arm commented May 27, 2020

jacobly0 commented May 27, 2020 •

edited

Loading

pawosm-arm commented May 27, 2020

pawosm-arm commented Jun 2, 2020 •

edited

Loading

pawosm-arm commented Jun 3, 2020 •

edited

Loading

jacobly0 commented Jan 31, 2022

pawosm-arm commented Jan 31, 2022

jacobly0 commented Jan 31, 2022 •

edited

Loading

pawosm-arm commented Jan 31, 2022

ELF code emitter for Z80 architecture (naiive impl.) #10

Are you sure you want to change the base?

ELF code emitter for Z80 architecture (naiive impl.) #10

Conversation

pawosm-arm commented Apr 22, 2020 • edited Loading

jacobly0 commented Apr 23, 2020

pawosm-arm commented Apr 23, 2020 • edited Loading

jacobly0 commented May 2, 2020

pawosm-arm commented May 2, 2020

jacobly0 commented May 2, 2020

pawosm-arm commented May 3, 2020

pawosm-arm commented May 6, 2020

pawosm-arm commented May 6, 2020

pawosm-arm commented May 11, 2020

adriweb commented May 11, 2020 via email • edited Loading

jacobly0 commented May 11, 2020 • edited Loading

pawosm-arm commented May 11, 2020

pawosm-arm commented May 14, 2020 • edited Loading

jacobly0 commented May 14, 2020 • edited Loading

pawosm-arm commented May 14, 2020

pawosm-arm commented May 16, 2020

jacobly0 commented May 16, 2020

pawosm-arm commented May 16, 2020

pawosm-arm commented May 27, 2020

jacobly0 commented May 27, 2020 • edited Loading

pawosm-arm commented May 27, 2020

jacobly0 commented May 27, 2020 • edited Loading

pawosm-arm commented May 27, 2020

pawosm-arm commented Jun 2, 2020 • edited Loading

pawosm-arm commented Jun 3, 2020 • edited Loading

jacobly0 commented Jan 31, 2022

pawosm-arm commented Jan 31, 2022

jacobly0 commented Jan 31, 2022 • edited Loading

pawosm-arm commented Jan 31, 2022

pawosm-arm commented Apr 22, 2020 •

edited

Loading

pawosm-arm commented Apr 23, 2020 •

edited

Loading

adriweb commented May 11, 2020 via email •

edited

Loading

jacobly0 commented May 11, 2020 •

edited

Loading

pawosm-arm commented May 14, 2020 •

edited

Loading

jacobly0 commented May 14, 2020 •

edited

Loading

jacobly0 commented May 27, 2020 •

edited

Loading

jacobly0 commented May 27, 2020 •

edited

Loading

pawosm-arm commented Jun 2, 2020 •

edited

Loading

pawosm-arm commented Jun 3, 2020 •

edited

Loading

jacobly0 commented Jan 31, 2022 •

edited

Loading