I'm cruising along with the linker. But, I need some support from the compiler on this.
My goal here is to have an assembly level linker that works both for the llvm-dcpu16 output, but is also friendly for programming directly in assembly. And especially that it allows for parts of the program to be defined in C and other parts in assembly.
I'd like to have just two sections: ".data" and ".code". I know that ".text" is conventional, but I think it's less confusing for neophytes to call it "code". The two sections can be interleaved freely. But, the linker may relocate and consolidate them; or it might not.
Please make the compiler stop emitting all of the .gsuper_personality__not_too_much_foam$.2s2 section headers.
I'm depending on all labels starting with a colon. [I am considering allowing them to also end with a colon instead, but for right now that is not the case.]
I'm depending on local labels starting with a dot and being directly attached to the last global label preceding them. I think this is how things are, but I just wanted to let you know that I'm relying on that. Local labels are mangled with the following convention: "last_global_label$$.local_label". I'm also mangling local symbol references the same way, so
set pc, .local_label
set x, 0x42
set pc, global_label$$.local_label
set x, 0x42
This means that you may not reference a local symbol outside of the global scope it belongs to. Again, I think this is in spec. If you need an escape from that (setjmp/longjmp, maybe?), I'll eventually allow you to fully specificy the post-mangling name, since they're easy enough to guess that it could be useful.
I'm currently accepting labels matching "[a-zA-Z0-9._]+". That's all letters, all digits, the underscore, and the dot, in any combination of any length.
Right now labels are NOT case sensitive, because I downcase the entire source during normalization. However, I recognize that this violates the C standard flagrantly, so I'll be fixing that shortly.
I know y'all are waiting for Notch to update the spec, hoping that he'll give us 8-bit ops and signed arithmetic. But I seriously doubt it'll stop being a 16-bit computer, so I'm defining at least one type so I can get data entry working.
The native, unsigned 16-bit DCPU machine word will be called ".word" with the alias ".uint16_t".
Until I hear that the DCPU spec has been bumped and we have signed arithmetic and 8-bit ops, I am not going to support the K&R-style names for the types. If there is no distinction between ".char", ".uchar", ".short", and ".int", I can't see the multitude of tags as anything but confusing. I want a relatively new programmer to be able to trace his way through the entire process from C source to x10b binary, and suddenly having the compiler generating 25 different type tags all with identical size and behavior is going to make the tracing that much harder.
And I don't know what the hell I'm doing with strings. I mean, I can do just plain string literals and assemble it into a run of .word.
.string "this is some text"
But that doesn't leave any syntax for defining color controls on those strings.
The linker will add an appropriate initialization section, loader/unpacker, and entry point jump. When floppy IO is specified, the linker will also support load-from-disk. When overlays are implemented, the linker will also include appropriate driver code for overlay loading (if and only if it's actually used in the build).
The target entry point right now is ":main".
This means that, hopefully soon, the existing preamble added to all assembly output should be removed. The C compiler/llvm output will no longer be responsible for turning compiled code into a runnable program.
I personally would like to see that we stay as close as possible to established conventions, especially with the sections. If there will be an official standard for that, we may follow that one. But I really don't like establishing "our own way", just because we don't like the naming.
Oh, I have missed the proposal to rename .text to .code. Is any technical reason to do this?
as for "Please make the compiler stop emitting all of the .gsuperpersonality_not_too_much_foam$.2s2 section headers.", which I also missed, could you please give an example? I'm not sure.
I'll compromise on ".text" instead of ".code". I guess it's more old school that way, anyway. I just didn't want 12 year old assembly hackers to confuse "text" for "string" (like I did when I was young).
So, two section names: ".text" and ".data".
@krasin I actually think you've already fixed many of the weird section headers and macros that were being emitted. But, basically, they're of the sort declaring things to be global, static, aligned, etc. Export macros and sections, I think. Which doesn't matter in a binary format that is inherently static-linked and monolithic and where alignment is unimportant.
I suppose if we more clearly identify what they're doing, I can just silently skip the ones nos_link doesn't need but that an incremental linker might. But right now they look like just gibberish and I don't know how to predict them. I'll get you some concrete examples when I get home.
Skipping linkage types is tracked by #49 and it's important to turn it on again.
The motivation is pretty clear: if two .c files have two symbols with the same name, a linker really wants to know what to do:
Ah, that's what they are! That is important!
If you please specify them, their behavior, and their text patterns, and I will support them.
Could you please make me an issue over at https://github.com/netzapper/nos_link with that documentation?
Since LLVM/Clang targets GNU AS and we stick with most of the convention (except the byte/word thing), the following page should help you:
Well, we also violate GAS syntax with labels. I don't like that, but it seems that the community uses this, and it would be hard to get rid of.
Yeah. But still the document should provide a good overview at which AS is capable of and which ops could become usefull.
@Arbaal Thanks! Those were exactly what I was seeing.
What subset of those are actually expected to be emitted?
'cause if the answer is all of them, then y'all need to wait for binuitls. Many of those pseudo ops require a sophisticated macro system that I have no desire to write, and they severely break the traceability from C to binary.
For instance, I have no desire to support .struct and keep track of types and field accesses.
As of right now, the only things I plan to support are
Are there others that are immediately vital?
Let's see then basic stuff works. At every moment, it would be clear what should be the next step.
The question was answered, I will close this issue for now.