Spy is an implementation of the language presented in Computability, Complexity and Languages by Martin D. Davis and Elaine J. Weyuker. I started this project in order to learn about compilers, parsers and languages, as well as virtual machines and optimizations. As such, this project is not intended for production and is more like a research project.
You can donate bitcoins to the following address: 17wcQ6JNXrKjYVEvstgAkq1yV5eY28E7Pp
S bytecode has been designed with portability and ease of implementation even in low-level languages. It is a big-endian binary file that stores both the program and optionally a snapshot.
Fig. 1: A minimal bytecode layout ____ __ __ ____ ____ _ __ __ _ __ __ _ __ __ _ __ __ |____|__|__| |____|____| |_|__|__|...|_|__|__| |_|__|__|...|_|__|__| MAGI MA MI DATA EXEC VAR 1 JMP INST 1 INST N \__________/ \_________/ \___________________/ \___________________/ HEADER INFO DATA Section EXEC Section \_______________________/ \__________________________________________/ Metadata Instructions
S bytecode is big-endian and has the following layout:
- A Header non-padded struct with:
- A 4-byte unsigned integer containing the Magic
- A 2-byte unsigned short major version
- A 2-byte unsigned short (minor version)
- An Info non-padded struct with:
- Number of instructions in DATA section (for serialized data) as a 4-byte unsigned integer.
- Number of instructions in EXEC section (the actual program) as a 4-byte unsigned integer.
An _OPTIONAL_ DATA section of non-padded structs with: - VAR instructions - ONE terminal JMP instruction.
An _OPTIONAL_ EXEC section of non-padded structs with: - An unsigned char opcode - A 2-byte unsigned short representing the variable:
- (1 << 15) NOT set for X
- 0 for y
- (1 << 15) set for Z
- 1 << 15 RESERVED (PC would be nice, maybe stack)
- A 2-byte unsigned short with payload data: - For JNZ: an absolute index for the next instruction
An optional EXTRA section. No special requirements are made regarding this section, except that it must be big-endian and safefly ignored by virtual machines.
"Magic" (MAGI) is an arbitrary integer that identifies S bytecode. It also prevents unsafe handling of strings in C code, as the first bytes will serve as NUL terminators.
Major (MA) and Minor (MI) version numbers identify the bytecode version. As a norm, changes that break existing instruction semantics increment the major version number, while new instructions that can be safely ignored (such as debugging instructions, VM signaling, NOP) or EXTRA sections added at the end of the file.
The INFO section contains the length of the DATA and EXEC sections. This allows linear parsers to allocate memory and set logic boundaries between DATA, EXEC and EXTRA.
The DATA and EXEC sections are technically a single section containing bytecode opcodes and parameters. However, both sections have different purposes: The DATA section contains the state of a running program, and it comes before the EXEC section so a linear VM can pre-load (and jump to the appropiate instruction) before running the program. This section can be of zero size. The recommended layout of this section is to place only VAR instructions and a final JMP instruction, although it is possible to use any instruction here (so using any instruction other than VAR and JMP in this section are undefined behaviour). The EXEC section contains the program itself. Any instruction may be used here, although it is strongly discouraged to use VAR and JMP instructions (as those are not part of the S Language itself and alter the behaviour of the program, and are left as undefined behaviour). Ancillary Section
Finally, the EXTRA section is not mandatory and may contain any data. The purpose of this section is to add information about the program such as original source code, author and even another program. This section runs for the rest of the length of the file. The only two restrictions are that it must be big-endian and must be safely ignored.
Instructions are comprised by an opcode and two parameters (the first usually the variable, the second a positive value).
Variables are unsigned short integers:
- ~(1 << 15) & (var & 0x7fff) for X
- 0 for y
- (1 << 15) | (var & 0x7fff) for Z
Instruction | Word | Opcode | Parameter | Value | Virtual |
---|---|---|---|---|---|
No operation | NOP | 0 | Yes | ||
Increment variable | INC | 1 | Variable | Unsigned Short (0 to 65535) | No |
Decrement variable | DEC | 2 | Variable | Unsigned Short (0 to 65535) | No |
Jump if variable is not zero | JNZ | 3 | Variable | Instruction offset as an absolute value from EXEC. | No |
Tag | TAG | 4 | 1 for A, 2 for B, ..., 5 for E | Unsigned Short (0 to 65535) 0 means halt. Unsigned Short (0 to 65535) indicating the tag index. |
Yes |
Set Variable | VAR | 5 | Variable | Unsigned Short (0 to 65535) as value. | No |
Unconditional Jump | JMP | 6 | Program Counter as Unsigned Short. (0 to 65535) | No | |
Copy variable | CPY | 7 | Copy a variable value into another variable. | No |
Both jumps use absolute addressing, starting at the beginning of the EXEC section.
- Python >=2.6
- nose (for testing)
New BSD