-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: assembling to object files and linking #48
Comments
Linking would be pretty incredible to have. Making up larger projects via includes gets tedious pretty quickly. I'm assuming it wouldn't be possible to generate elf format files and link them using standard tools? |
I have been thinking about this feature for several days, maybe I found something elegant. Since there are several types of object formats (COFF / ELF ...), I think it would be great to have a block type For example, in ELF, it's necessary to have in the header identifier the number of bits used (32/64) as well as the ABI. To do this I thought of the following syntax: #obj elf {
bits = 32 ; maybe guessed from #bits ?
abi = linux ; literal or integer (linux = 3)
machine = x86
} By doing so, the assembler will be able to know if the definition of a CPU is valid for the assembly in such or such object format. |
The problem with existing formats is that they don't exactly support custom architectures. |
That's why I think it's interesting to use a new |
Ohh, that project is so cool! It would be nice to have a list of all projects using the assembler. About linking, I think I'd need more input from you guys about the advantages of using one, and the current annoyances in working without one. @aslak3 , what's your current project structure and build pipeline, and what would they ideally look like? Would you use a makefile or something else? There are also some considerations about the structure of the .asm files. For example, I'd rather have something akin to the Rust compiler, where it reads and considers the entire project as a whole, than to have a C-like compiler, where files must textually include every dependency in a certain order and you end up having to split up declarations and definitions. But then again, that might not be the best architecture for an assembler, but I don't have enough data at hand to make an informed decision... |
An assembler pretty much needs to go with the old linking method of doing things. As nice as a full project system is, it's not viable here. |
For a C toolchain to target customasm, you need to be able to build object files from assembler, that are later linked together into binaries. That's just the way C works, and going against the grain on this just makes it really hard to port existing software. An object file needs to be able to have multiple "segments" (which are pretty much banks with auto-sizing) with a header that specifies the size of each segment. Segments can either be code, read-only constants (usually a series of strings), initialized data (usually global variables) and uninitialized data (which usually isn't stored in the object file at all, it just has its address and size specified in the header). Another important feature of object files is importing symbols from other object files. So you would have a relocation table that specifies where in the segments each imported symbol is found so that the linker can go in and patch those locations with the actual address of the symbol. In the case that the address is encoded into an instruction, it would be important to specify which rule produced the instruction so the linker knows which bits are address bits that it needs to change. Then of course you need to be able to list out all the exported symbols and their address/segment info so that the linker knows what address to insert into the imports of dependent objects. There is standard object file formats. A 32-bit ELF file for example is able to work with a 8-bit, 16-bit or 24-bit CPU no problem, it just means it uses 32-bit values for addresses internally. It doesn't mean the CPU architecture needs to use all those bits. ELF is pretty well documented and there's libraries in most languages for working with it. Not sure if it supports custom relocation rules though, but that's okay, a separate file could be produced with the rules. With the above, a linker could be made. If the linker also understands the rules, it could even be generic enough to work with any customasm CPU definition. That would be pretty awesome actually. |
There is possibly also another solution: make customasm assembly files the "object" files, and treat customasm as a linker instead. I think there might be issues with imported / exported symbols and file-local "static" symbols colliding between files if you just concat all the assembly files together and run customasm on it. There might be a way to use scoped labels to get around that, maybe having each "object" assembly file have a top-level label that all other labels are scoped under, and only exported/imported labels are unprefixed. But if this doesn't work, it might be fairly easy to make customasm support the right semantics. But it would still be nice to have dynamically sized banks, so "executables" can be produced by customasm rather than just ROM images. I think the minimum feature for this would be the ability to generate a header "bank" with the sizes of other banks specified as #d values. Then the ability to make a bank size "dynamic" and assign a label to the size for use in the header bank. But maybe there's better solutions. Then the final piece is probably just wrapping customasm in a script that adapts the command line parameters that a linker would usually use, to the ones that customasm expects. That would be easy enough. I haven't tried it, but that might allow a C program to be ported to a CPU that customasm can support, provided of course that there's a C compiler that can produce the assembly "object" files, and not have to mess with the build system of the C program too much. LCC for example is more than capable of producing customasm assembly files and it has a book that explains how to retarget it. |
So resurrecting this old issue here. I'm using customasm for a custom CPU I made with extremely little RAM and a primitive MMU. That means I need to deal with overlays. And that means I need a linker to manage the code layout, compute the overlay tables, include those in the output, and potentially compute what symbols should go in which overlay blocks. At this point I have three options:
To get an understanding for the difficulty in doing each, I started doing all three as a way to measure the level-of-effort required for the change. Here's what I found:
I think the best approach is to have an opt-in system of separate ruleset(s) for instructions that need to be delayed until link time (I'm going to call these linker-rulesets). During assembly to executable, all linker-rulesets are ignored, and no special behavior happens. During assembly to object file:
The following details are left to end-users:
My main ask in this comment is whether @hlorenzi would be interested in the above as a PR, or if it should stay a private fork. It would be a significant amount of code to add this feature. If there's no interest in the code as part of a PR, I would not include support for the features I'm not using; or I may just write my own purpose-built assembler, I don't know for sure. |
After some more coding, this design is much simpler:
Nice-to-have features that I may add:
|
@Phlosioneer I'd like to discuss this further to see if I understand all the nuances! Would you be available on Discord? You can find an invite to my server on the readme. |
It would be interesting to compile to object files and linking. The SDCC code could be a good starting point.
I used customasm in my little project. I'm able to use 128-bit constants!
https://github.com/physnoct/softmicro
The text was updated successfully, but these errors were encountered: