An emulator for computer with a simple architecture and instruction set.
##Overview Machine emulates a computer with a simple memory architecture and small instruction set. It is intended to be used in an educational or research setting, allowing binaries to be run universally and without concern for physical architecture. It is written primarily for my own use in testing new compiler or assembler features, but I would be thrilled if others got use out of it as well. If you have suggested changes or bugs, please let me know!
There are two overall pieces to Machine. The first is a fully-functioning architecture and instruction set that programs wishing to run directly in Machine should use. The second is a set of architecture and instruction set extensions that add support for a protected mode/user mode distinction, which allows kernels to be written for Machine. These extensions are fully backwards-compatible with the normal subset, and any programs written without knowledge of these extensions will run without issue. Note that there's only one version of Machine - the version that includes the extensions - but programs may treat Machine as though it logically only implemented the normal subset.
##Architecture Machine is a 32-bit architecture. It comprises three simple components - memory, registers, and a program counter.
###Registers There are 16 registers, each holding a single 32-bit word. The nth register is denoted r[n].
###Memory Machine has a 32-bit memory address space. It is word-addressed, with each address referencing a single 32-bit word. The nth word in memory is denoted m[n]. The number of words actually available to be written to or read from is determined by the program which is run.
###Program Counter Machine keeps a counter which holds the address of the word in memory which is to be executed next. The counter is incremented before each execution.
##Binaries Binaries are files containing the words of a program, in big-endian byte order. The format for binaries is quite simple. The first word specifies how many words will be available in memory. Its minimum value is one less than the size of the binary.
To begin, memory of the proper size is allocated and initialized to zero. All of the registers are also intialized to zero. Then, the words of the binary are loaded into memory starting with word 1 (this excludes the first word of the binary, the one which encodes memory length). Word 1 of the binary is loaded into word 0 of memory, and so on. The program counter is then set to point at word 0 of memory.
Execution then begins. At each execution except for the first one, the counter is incremented before the command is executed (this allows for jumps to work properly, otherwise jumping to word 10 would end up executing word 11, for example). The word at the address given by the counter is then executed as a machine instruction.
##Instructions In general, the format for instruction words is as follows: The most significant six bits encode the operator. There are 25 operators.
Most instructions require up to three registers for execution (for example, r[A] := r[B] + r[C]). The least significant four bits encode C, the next four bits encode B, and the last four bits encode A:
A C vvvv vvvv VUTSRQPONMLKJIHGFEDCBA9876543210 ^^^^^^ ^^^^ opcode B
So, for example, if the opcode corresponded to the addition instruction, and A = 0, B = 1, C = 2, this would result in r := r + r.
There are two versions of the multiplication, division, greater than, less than, and comparison operators - unsigned and signed. The unsigned versions perform the operations assuming that the values in all registers are 320bit unsigned integers. The signed versions perform the operations assuming that the values in all registers are 32-bit two's complement signed integers. There is no signed add or signed subtract instruction because unsigned addition and unsigned subtraction also work on signed values.
|0||Move||r[A] := r[B]|
|1||Equality||If r[B] == r[C], r[A] := 1, otherwise r[A] := 0|
|2||Greater Than||If r[B] > r[C], r[A] := 1, otherwise r[A] := 0|
|3||Greater Than (signed)||If r[B] > r[C], r[A] := 1, otherwise r[A] := 0|
|4||Less Than||If r[B] < r[C], r[A] := 1, otherwise r[A] := 0|
|5||Less Than (signed)||If r[B] < r[C], r[A] := 1, otherwise r[A] := 0|
|6||Conditional Jump||If r[A] != 0, set the program counter to r[B]|
|7||Load||r[A] := m[r[B]]|
|8||Store||m[r[A]] := r[B]|
|9||Add||r[A] := r[B] + r[C]|
|10||Subtract||r[A] := r[B] - r[C]|
|11||Multiply||r[A] := r[B] * r[C]|
|12||Multiply (signed)||r[A] := r[B] * r[C]|
|13||Divide||r[A] := r[B] / r[C]|
|14||Divide (signed)||r[A] := r[B] / r[C] (rounds towards 0)|
|15||Bitwise And||r[A] := r[B] & r[C]|
|16||Bitwise Or||r[A] := r[B] | r[C]|
|17||Bitwise Exclusive Or||r[A] := r[B] ^ r[C]|
|18||Bitwise Complement||r[A] := ~r[B]|
|19||Left Shift||r[A] := r[B] << r[C]|
|20||Right Shift||r[A] := r[B] >> r[C]|
|21||Compare And Swap||If m[r[A]] = r[B], m[r[A]] := r[C] and r[B] := 1; otherwise, r[B] := 0|
|22||Atomic Add||m[r[A]] := m[r[A]] + r[B]|
|23||Halt||Halt the machine|
|24||Output||The value in r[A] is displayed on the I/O device (as ASCII). Only values in [0,255] allowed.|
|25||Input||Machine waits for input on the I/O device. Input is stored in r[A], which will be a value in [0,255]. If EOF was signaled, r[A] will be all 1's.|
|26||Load Value||The value specified will be loaded into r[A] (see Load Value semantics below).|
###Load Value The Load Value instruction loads a literal value which is encoded in the instruction word itself. The layout of a load value word differs from a normal instruction word since it must encode a literal value as well as the register addresses. When a Load Value instruction is executed, the value is loaded into r[A]. The opcode is coded in the most significant six bits. The register address, A, is coded in the next four bits. The value itself is coded in the remaining 22 bits.
A vvvv VUTSRQPONMLKJIHGFEDCBA9876543210 ^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^ opcode value
##Failure States During normal operation, the machine moves from one state to the next with the execution of each instruction (a machine state in this case refers to the state of all of the registers and all of the words in memory). However, certain operations will result in the machine existing in an invalid state, or Failure State. If the machine moves into a failure state, execution will be halted. The following will result in failure states:
- If the program counter refers to a word which is not part of allocated memory, the machine will fail.
- If the program counter refers to a word which does not code for a valid instruction, the machine will fail.
- If an instruction divides by zero, the machine will fail.
- If an instructions loads from or stores to a word which is not part of allocated memory, the machine will fail.
- If an instruction outputs a value not in the range [0,255], the machine will fail.
- If there are more instructions in a binary (exluding the first word) than there are words allocated in memory, the machine will fail.
Protected Mode Extensions
In addition to the features detailed above, referred to from here on out as the normal subset, there is a set of extensions known as the protected mode extensions which make it possible for Machine to support a fully-functioning kernel with the ability to guarantee memory safety, process termination, and multithreading.
##Architecture The protected mode extensions provide modes, lookaside registers, a callback register, a fault register, a program counter lookaside register, two virtual memory registers, and a program counter timer.
###Modes Machine can be in two modes - protected mode and user mode. At the beginning of execution, Machine is in protected mode.
###Lookaside Registers There are 16 extra registers called lookaside registers, denoted r'. When any illegal operation (defined below) is performed while Machine is in user mode, the contents of the registers are copied into the lookaside registers (that is, for x in [0, 15], r'[x] := r[x]).
###Fault Register There is a single register called the fault register, denoted f. The contents of f are denoted f. When any illegal operation is performed in user mode, the fault code of the operation (defined below) is placed into f. The order of this relative to the setting of the lookaside registers is undefined.
###Program Counter Lookaside Register There is a single register called the program counter lookaside register, denoted pc'. The contents of pc' are denoted pc'. When any illegal operation is performed in user mode, the value of the program counter prior to the execution of the illegal instruction is placed into pc'. The order of this relative to the setting of the lookaside and fault registers is undefined.
###Program Counter Timer There is a single register called the program counter timer, denoted t. The contents of t are denoted t. Prior to the execution of any instruction in user mode, if t is not the maximum possible value (that is, t + 1 != 0), t is decremented (t := t - 1). If, prior to this, t = 0, then instead of t being decremented, it is considered an illegal operation, and a fault is triggered. Additionally, if execution of the instruction causes any other fault, t is not decremented.
###Callback Register There is a single register called the callback register, denoted c. The contents of c are denoted c. When any illegal operation is performed in user mode, after the lookaside registers, fault register, and program counter lookaside register have been set, Machine is placed into protected mode, and execution continues from c.
###Virtual Memory Registers There is a pair of registers called virtual memory registers, denoted v (v and v). When a memory access operation is performed in user mode, the physical address, p, that is accessed is defined in relation to the logical address, l, that is coded by the instruction word, by the relation p = v + l. If p < v or p > v, this is an illegal operation and will trigger a fault.
##Instructions The protected mode extensions include 11 additional instructions known as protected instructions. Executing any of these instructions in user mode is an illegal operation, and will trigger a fault.
|27||Enter User Mode||Machine is placed into user mode, the program counter is set to r[A], all registers are set such that r[X] := r'[X], and execution is continued|
|28||Lookaside Load||r[A] := r'[B]|
|29||Lookaside Store||r'[A] := r[B]|
|30||Set Callback||c := r[A]|
|31||Fault Load||r[A] := f|
|32||PC Lookaside Load||r[A] := pc'|
|33||Set Virtual Memory Low||v := r[A]|
|34||Set Virtual Memory High||v := r[A]|
|35||PC Timer Load||r[A] := t|
|36||PC Timer Store||t := r[A]|
|37||Trigger||do nothing (still triggers fault in user mode)|
Additionally, the Halt (opcode 23), Output (opcode 24), and Input (opcode 25) instructions are considered illegal operations if executed in user mode, and will trigger a fault.
##Failure States The protected mode extensions include two extra failure states.
- If the Enter User Mode instruction is executed when v + r[A] does not address a word in the range [v, v], Machine will fail.
- If the Set Virtual Memory Low or Set Virtual Memory High instructions are executed when r[A] does not address a word in allocated memory, Machine will fail.
##Fault Codes Whenever a fault is triggered, there is an associated fault code which is placed into the fault register, f, which describes the fault. This is a list of faults and associated fault codes. Note that every action which would cause Machine to enter a failure state in protected mode triggers a fault if performed in user mode. Thus, it is not possible for code executing in user mode to cause Machine to enter a failure state.
|0||Executing a protected instruction other than Trigger|
|2||Executing an instruction which would decrement the program counter timer below 0|
|3||Executing an instruction whose address is not in [v, v]|
|4||Accessing a memory word whose address is not in [v, v]|
|5||Executing an instruction which does not code for a valid operation|
|6||Division by 0|
##Building To build Machine, simply do: