This is a fool's errand to bootstrap a complete 6502 assembler. I am doing this for fun, with no expectation that it will be useful for anything. The journey is the goal: to begin with (nearly) nothing, and to slowly craft a functioning assembler. I am indebted to Edmund Grimly Evans for his detailed bootstrapping project description, which is the original inspiration for this project.
The original version of this assembler was a short length of hand-assembled hex code. This resulted in a tiny program which, when fed a string of its own hex source code, could emit its own object code.
From this starting point, each tiny feature was implemented using the features already available: first came support for white space and line breaks, then comments, then string literals, and most recently address labels.
I use Ian Piumarta's excellent lib6502 to emulate a 6502 computer.
run6502
must be installed on the path.
The design assumes that the entire 16-bit address space represents available RAM, with the exception of the space above FFDD, which is reserved for lib6502's getchar() and putchar() ROM subroutines.
The assembler code is not relocatable, and assumes that it will be loaded at 0x1000 and execution will begin there.
Assemble source code to object code:
$ cat foo.asm | cmd/run asm > foo.sym # Pass 0
$ cat foo.sym foo.asm | cmd/run asm > foo.img # Pass 1
Execute:
$ cmd/run foo.img
Disassemble object code to stdout:
$ cmd/disasm foo.img
Build and run the example program:
$ make examples/hello.img
$ cmd/run hello.img
Hello, World!
To call this an assembler is an overstatement. It accepts ASCII hex codes from
stdin
, and emits bytes on stdout
. It offers a few minimal "features" that
make it barely a language:
-
The location counter can be modified using the
*
pseudo-op, however all source code must be contiguous and begin at 0x1000. There is no linker. -
A label can be assigned from the current 16-bit location counter using the
:
pseudo-op. -
The 16-bit value of a label can be read with the
&
pseudo-op. The>
and<
operators emit the high or low byte only. -
An 8-bit relative offset to a label can be inserted with the
~
pseudo-op. -
Comments must begin with
;
and extend to the end of a line. -
A double-quoted string will be inserted into the object code as ASCII bytes.
The assembler recognizes only a subset of the 6502 instruction set mnemonics:
-
Opcodes that take no operands, such as
JSR
andRTS
are allowed. -
Some immediate mode opcodes, such as
LDA #
andCMP #
are recognized. Be sure to include the space and the#
character.
All other addressing modes are unrecognized, and must be inserted using a hex code. The assembler will assume that these hex codes represent valid opcodes and will emit them into the object code stream.
The assembler does nearly zero error checking. It is the programmer's responsibility to ensure the each instruction receives the correct number of operand bytes.
The assembler operates in two passes to provide support for forward label references.
During pass 0, the input assembly is parsed, the location counter is maintained, and labels are assigned their values, but no machine code is output. Instead, when the end of the input is reached, the symbol table is dumped to stdout. This symbol table takes the form of source code, which is valid assembler input for the next pass.
During pass 1, the symbol table is prepended to the same input assembly, and both are fed to the assembler. This enables the assembly code to refer to all symbols, which now have all of their location values defined. Note that the symbol table also begins with the 'P01' psuedo-op, which signals the assembler to operate in pass 1 behavior.
asm.asm
is the assembler source code, written in our minimal assembly
language, and asm
is the reference master assembler object code. Both are
held in source control.
When compiling the assembler, the makefile will iterate two generations.
The make
process will first use asm
to compile generation 0 object code as
file g0
. If this succeeds, it then uses g0
to compile second-generation
object code g1
. If g0
and g1
are bitwise identical, the test passes and
we assume that the new assembler functions properly. make promote
can then be
used to permanently select g1
as the new reference master assembler.