Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELF code emitter for Z80 architecture (naiive impl.) #10

Open
wants to merge 21 commits into
base: z80
Choose a base branch
from

Conversation

pawosm-arm
Copy link

@pawosm-arm pawosm-arm commented Apr 22, 2020

Following the ability of the other 8-bit platform supported by LLVM (AVR) to generate ELF object files, I have prepared this crude ELF code emmiter for Z80. It was not extensively tested, mostly due to the lack of a compatible runtime library (e.g. for CP/M system).

Some caveats:

- The Z80 backend emits a lot of compiler library calls as specified in
  llvm/lib/Target/Z80/Z80ISelLowering.cpp making it hard for practical use

@jacobly0
Copy link
Owner

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

@pawosm-arm
Copy link
Author

pawosm-arm commented Apr 23, 2020

This backend has always has always supported avoiding undocumented instructions, but there's not any frontend support for it yet. The simplest way to pass it to clang is something like -Xclang -target-feature -Xclang -undoc,-idxhalf, but once I push it should be as simple as -march=generic.

Also, I can repro the invalid mnemonics so I'll fix that soon.

This -march=generic would be helpful to generate better code as for now, idxhalf's and lea's are implemented with generating workaround machine code (which isn't the best solution). Yet what I would like to see the most is the less use of compiler library routines. E.g., in the example main.c code I wrote:

const int n = ((unsigned char)(argv[0][0]));

...where argv is a pointer to a pointer of signed char elements. This explicit signed char to unsigned char cast may look awkward, unfortunately, if I omit that, the Z80 backend will generate a call to a compiler function __bshrs (which I don't have implemented) with the casted value passed in the A register and 7 passed in the B register. Unfortunately, casting of signed character to integer happens pretty often in C codes, so this needs to be addressed somehow.

@jacobly0
Copy link
Owner

jacobly0 commented May 2, 2020

Alright, I finally got caught up with unfixed bugs and pushed, so the sext opt I wrote a few weeks ago should solve your __bshrs issue. Note that, naturally, it is not reasonable to replace every instance of libcalls, but I can still do it if an optimization warrants it. Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly, for example here's an implementation of __bshrs.

__bshrs:
	inc b
	dec b
	ret z
	push bc
.loop:
	sra a
	djnz .loop
	pop bc
	ret

@pawosm-arm
Copy link
Author

Hi @jacobly0

Also, if you need help implementing a libcall just ask,

I'm thinking about some more general approach that makes use of the userspace C libraries offered by LLVM compiler suite. There are two sub-projects like that in the monorepo, compiler-rt and libc, I wonder which one is more suitable. For practical reasons I was rather looking at the libc project.

I was having some hopes regarding this libc project: it's new, not overgrown (yet) and intended for mulit-targeting, so it may be easier to extend. Namely two targets could be added: cpm/z80 for CP/M userspace programs and none/z80 (or even one more, none/ez80) for bare-metal projects. It's not that hard on CP/M where the whole of userspace program needs to fit into 64kB TPA area, eliminating all the problems with FAR pointers (a problem that haunts the most of 8 and 16 bit platforms), leaving only the libm part with a 'magical coding' requirements. Some of the problems I see are:

  • it's laborious
  • I didn't figure out yet how to modify libc project CMake files to use (for building libc) the clang compiler just built during the same make invocation, instead of using the compiler used for building the rest of LLVM project (with Z80-cross-compiling clang itself)
  • similarly, I still didn't figure out yet how to modify CMake files to ensure static library generation even if the rest of the current LLVM building process is intended to build shared libraries....

Any hints are welcomed.

@jacobly0
Copy link
Owner

jacobly0 commented May 2, 2020

Well they all contain disjoint routines, so it's not really a choice between one or the other.

The purpose of compiler-rt is to provide compiler-specific builtins that compute operations that may not be available as an instruction on every processor, but are representable in the source language. Given the simplistic nature of the z80 instruction set, it will need quite a lot of these. Even though it has C implementations of various intrinsics, the majority of these are not going to be useful since they assume a minimal basic set of instructions which the z80 simply does not have. In the end there is just no escaping writing basic operations in assembly such as char * char (note that compiler-rt is full of specialized target-specific assembly routines in the first place anyway).

The purpose of libc on the other hand is to provide a standard set of C routines that can be used by programs to interface with the OS. These are written almost entirely in C except that the code that interfaces directly with the OS sometimes requires asm. Here the challenge is going to be rewriting said interface to work with a different OS in the first place.

And then libm is a bunch of computation-heavy floating-point based C routines, where some of the functions can be replaced by instructions on modern cpus. However, most implementations of libm don't provide basic float operations which are instead provided by hardware or compiler-rt depending on the cpu. Since the z80 would be entirely soft-float by necessity, it's certainly possible to use C implementations for everything, but still, they will depend heavily on compiler-rt assembly routines for basic operations.

@pawosm-arm
Copy link
Author

Rebased to follow changes on your branch.

@pawosm-arm pawosm-arm force-pushed the z80-elf branch 2 times, most recently from bff422c to 6c9504d Compare May 6, 2020 17:45
@pawosm-arm
Copy link
Author

updated to follow recent API changes.

@pawosm-arm
Copy link
Author

Also, if you need help implementing a libcall just ask, I can write them from scratch pretty quickly

Yeah, I'm finding more of those symbols, namely:

         U __frameset
         U __frameset0
         U __indcallhl
         U __sand
         U __setflag
         U __lshl
         U __lshru
         U __lsub
         U __sshl
         U __sshrs
         U __sshru
         U __smulu
         U __sxor
         U __lmulu
         U __ladd
         U __lcmpu
         U __ldivs
         U __sdivs
         U __sdivsu
         U __srems

The floats are currently outside of my scope, despite encountering them too:
__fadd, __fdiv, __fmul, __fsub, __fcmp, __fneg

@pawosm-arm
Copy link
Author

@jacobly0 from this file https://github.com/c4ooo/TI84-CE-Wrapper-for-Monochrome-TI-BASIC-Programs./blob/master/ti84pce.inc I can figure out where those builtin function names originate from. Can I assume that you have some access to the sources for those functions? I need a bit of your help as I'm stuck with my hobby project without them... now it's __smulu, but who knows what it's going to be tomorrow...

@adriweb
Copy link

adriweb commented May 11, 2020 via email

@jacobly0
Copy link
Owner

jacobly0 commented May 11, 2020

The source for all of the zds routines are in the previous zilog toolchain release at src/rtl/common but they are written for the ez80 so won't be very useful, certainly not the multiplication routines. You can see copies of the comments that explain what each routine does here. Here's a z80 __smulu taken from elsewhere:

__smulu:
	push	af
	push	bc
	push	de
	ld	e,c
	ld	d,b
	call	.mul
	pop	de
	pop	bc
	pop	af
	ret
.mul:
	xor	a
	cp	h
	jr	z,.swap
	ex	de,hl
.swap:
	ld	c,l
	ld	l,a
	add	a,h
	call	nz,.byte
	ld	a,c
.byte:
	ld	b,8
.next:
	add	hl,hl
	add	a,a
	jr	nc,.skip
	add	hl,de
.skip:
	djnz	.next
	ret

@pawosm-arm
Copy link
Author

Wow, that's a great response, thanks a million :)

@pawosm-arm
Copy link
Author

pawosm-arm commented May 14, 2020

I observed something odd today while trying to use stdint.h's uint32_t type, sizeof(uint32_t) is... 2 (while it's 4 with sdcc as everywhere else...)
fortunately, sizeof(unsigned long) is 4.

@jacobly0
Copy link
Owner

jacobly0 commented May 14, 2020

Then you are using a stdint.h that isn't valid for the z80. The one from glibc definitely won't work. The one packaged with clang would work if you don't have another invalid stdint.h in the include search path. I use a much more simplified version here.

@pawosm-arm
Copy link
Author

You were right, clang's stdint.h was including-next host's stdint.h unless -U__STDC_HOSTED__ flag is passed, this solves big ints sizes problem, thx!

@pawosm-arm
Copy link
Author

Hi @jacobly0, can I ask for your implementation of __lcmpu?

@jacobly0
Copy link
Owner

That's a surprisingly tough one, let me give it a shot...

; speed optimized
__lcmpu:
	or a
	sbc hl,bc
	add hl,bc
	push de
	push bc
	push iy
	pop bc
	ex de,hl
	jr z,.maybeEqual
	sbc hl,bc
	ex de,hl
	pop bc
	jr nz,.notEqual
	jp pe,.overflow
	inc d
.notEqual:
	pop de
	ret
.overflow:
	ld d,$80
	dec d
	pop de
	ret
.maybeEqual:
	sbc hl,bc
	ex de,hl
	pop bc
	pop de
	ret

; size optimized
__lcmpu:
	or a
	sbc hl,bc
	add hl,bc
	push de
	push bc
	push iy
	pop bc
	ex de,hl
	jr z,.maybeEqual
	sbc hl,bc
	push af
	pop hl
	res 6,l
	push hl
	pop af
	db $21
.maybeEqual:
	sbc hl,bc
	ex de,hl
	pop bc
	pop de
	ret

(All assuming you want to avoid index half reg instructions at least)

@pawosm-arm
Copy link
Author

It works, thanks!

@pawosm-arm
Copy link
Author

Hi @jacobly0, can I look a the __setflag implementation too?

@jacobly0
Copy link
Owner

jacobly0 commented May 27, 2020

__setflag:
	ret po
	push af
	dec sp
	pop af
	xor $80
	push af
	inc sp
	pop af
	ret

@pawosm-arm
Copy link
Author

Wow, seems like __lcmpzero (called before __setflag) needs to leave P/V flag set properly. Can I have its code too?

@jacobly0
Copy link
Owner

jacobly0 commented May 27, 2020

; speed optimized
__lcmpzero:
	push bc
	ld c,a
	ld a,l
	or h
	or e
	or d
	jr z,.zero
	ld a,d
	or 1
.zero:
	cp 0
	ld a,c
	pop bc
	ret

; size optimized (and basically what zds does)
__lcmpzero:
	push bc
	ld bc,0
	push bc
	ex (sp),iy
	call __lcmpu
	pop iy
	pop bc
	ret

@pawosm-arm
Copy link
Author

Works lovely, thanks!

@pawosm-arm
Copy link
Author

pawosm-arm commented Jun 2, 2020

False alarm, I've found a bug in the rest of the code.

@pawosm-arm
Copy link
Author

pawosm-arm commented Jun 3, 2020

AArgh, it generated call to __sshl in a simple array access. Also I need __sdivs to complete this function, @jacobly0 can I see your impl. of both?
(I meant __sshl not __smulu which is already there, my mistake).

@jacobly0
Copy link
Owner

Can't merge changes to the asm output without a way to select between asm flavors.

@pawosm-arm
Copy link
Author

Hey @jacobly0 it was 2 years ago, so I guess this PR needs massive rework, at least to solve merge conflicts and to make it compatible with any of the LLVM API changes that occurred during that time.

AFAIR my solution does not output any assembly, it creates ELF binaries directly (namely, ELF LSB relocatable, *unknown arch 0xdc* version 1 (SYSV), not stripped, to have such binaries outputted, the compiler is started with those flags: --target=z80-none-elf -march=z80 -fintegrated-as -fcommon -fno-builtin -U__STDC_HOSTED__), so I don't know what changes to the asm output aren't able to select between asm flavors...

@jacobly0
Copy link
Owner

jacobly0 commented Jan 31, 2022

I'm referring to this, file-local labels that are unmarked or that start with dot are not supported by the assembler/linker I'm using.

Interestingly, my assembler/linker does support outputting ELF files now and those files are correctly dumped by llvm object inspection tools compiled from the z80 branch.

@pawosm-arm
Copy link
Author

Sadly, I don't remember now why I had to make that change, so I'd have to track it back before finding a way to make it correct :(

jacobly0 pushed a commit that referenced this pull request Feb 8, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439
nebulatgs pushed a commit to nebulatgs/llvm-project that referenced this pull request Mar 23, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
jacobly0#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
jacobly0#2  0x00007f2bb3d152e4 in ?? ()
jacobly0#3  0x00007ffcc5f0cc80 in ?? ()
jacobly0#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
jacobly0#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
jacobly0#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
jacobly0#7  0x00007f2bb3d144ee in ?? ()
jacobly0#8  0x746e692f706d742f in ?? ()
jacobly0#9  0x692d747065637265 in ?? ()
jacobly0#10 0x2f653631326b3034 in ?? ()
jacobly0#11 0x646d632e35353532 in ?? ()
jacobly0#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
@jacobly0 jacobly0 force-pushed the z80 branch 3 times, most recently from 67095ed to 03a92e5 Compare May 28, 2022 08:45
@jacobly0 jacobly0 force-pushed the z80 branch 3 times, most recently from a26904a to 142c6ae Compare June 4, 2022 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants