PDD 32: M0 Design Spec

Abstract

This document specifies M0, the set of microcode-like opcodes (ops) on which a future implementation of Parrot internals will be based. This document also specifies how these ops will interact with other subsystems and their requisite runtime environment.

Status

This document is a draft under active development. Nothing is set in stone yet. If you would like to suggest improvements, please talk with cotto, bacek, atrodo or dukeleto.

M0 Milestones

This is a list of things that we expect the M0 VM to be able to do as this spec progresses. They're pretty basic, but that's the idea behind M0.

Completed milestones:

print 42 (hard-coded instructions are fine at this point)
assemble an M0 assembly listing into M0 bytecode
print "hello, world\n"
add two integer registers, store the result
calculate the hash of a hard-coded string (any non-trivial algorithm)

Incomplete milestones:

call a function with an argument, return a value
calculate the sum of a 10-element array of ints
disassemble binary M0 bytecode into an M0 assembly listing
implement and verify crc32c

Definitions

M0

M0 refers to the specification for a minimal virtual machine capable of executing the ops described in this document. The M0 VM should have the same expressive power as C and will live at the same level that C currently does. This means that M0 explicitly does not need to know about higher-level concepts such as garbage collection and Parrot's calling conventions.
M0 ops

This is the lowest-level set of ops supported by an M0-compliant VM. These ops are a subset of the M0 specification.
chunk

A chunk is a self-contained unit of code. Chunks correspond roughly to functions and have a constants table, a metadata segment and a bytecode segment. The constants table is used to store any persistent data. The metadata segment stores information about the code, for example annotations or variable names. The bytecode segment contains the executable ops.
MOP

Meta object protocol.
Lorito

Lorito refers to M0 and the levels above it, such as M1 which is written in M0 ops, and possibily an M2 level which is written in M1 ops. More specifically it refers to a new implementation of Parrot internals. A new MOP is planned for Lorito and it will most probably live at the M1 op level.
Call Frame

This refers to an M0 call frame structure, which is the central datastructure at the M0 level. A call frame contains all the state of an executing chunk, including PC, active exception handlers, INSP registers, parent call frame and so forth. When M0 code implements continuation-passing style, the call frame is what acts as the continuation.

Design Goals

There are three goals that M0 needs to meet:

Have executable code with equivalent power to C.
Have a form of code that can be easily analyzed.
Have a single form of code used to implement the majority of Parrot, rather than having a mix of C and PIR.

We are also designing M0 to be as simple as possible. M0 should contain as few ops as are needed to retain the ability to generate efficient bytecode. The complexity of Parrot will be built on top of M0, in PIR (a.k.a. M1) and above. This simplicity exists to facilitate multiple implementations, e.g. a quick and dirty prototype implementation in JavaScript, a space-optimized implementation targeted for mobile devices, an adaptively-optimizing jit targeted for server use, etc. M0's simplicity should be extreme; the moment M0 gets ops for the sake of developer efficiency or ease of use, it has failed its design goals.

There will be no stacks in M0, not even for argument passing. All internal control flow apart from basic gotos with code segments must be in continuation-passing style. This will give M0-based code several benefits. It will ensure immunity from stack-smashing attacks and related programming errors. It will also eliminate the nested (inferior) runloop problem and opens up interesting possibilities for an Erlang-like concurrency implementation.

CPS will also allow Parrot to use many pre-existing JIT algorithms, such as those described in the paper "Trace Based Compilation in Interpreter-less Execution Environments" [TBC].

Scope of This Document

This document only covers M0 and the minimum VM necessary to support it. Anything beyond that, e.g. the meta-object model or in-memory layout of call frames, is beyond its scope.

Call Frames

This is the central data structure at the M0 level. The call frame graph contains most of the state that is needed to execute M0 bytecode. A pointer to the current call frame will be passed to each op as the implicit first argument, though this pointer (or equivalent) will not be stored in bytecode. It layout is described in detail in the Register Types and Call Frame Structure section.

Call frames were previously referred to as contexts. The name "call frame" was chosen to better reflect common usage and hopefully reduce confusion.

Registers

An M0 VM has 256 registers, each of which is large enough to hold an INSP value.

List of Ops

M0 ops are very low-level and provide the simplest possible way of interacting with the VM. Apart from print/gripe/say (which are intended only for the early stages of an M0 VM's development), M0 ops should painfully minimal. It's expected that common operations will be painful. That's what compilers are for.

The names and number of the ops below may change at any time, possibly without any rational reason, until the spec is more stable. As a way to verify that the op set is complete, we will look at how a representative selection of PIR's existing core ops and the dynops used by HLLs could be reimplemented in terms of M0 ops.

In descriptions, $1 refers to the value of the first argument to an op, $2 to the second and $3 to the third. Any op can operate on any register; types are only by convention. As far as an M0 VM is concerned, the 256 ops are nothing more than an array of 256 int32-sized chunks of data.

Note that this documentation uses a C-like syntax to differentiate between a register's number and the value pointed to by the regster. $1 refers to a register's value *$1 refers to what a register points at.

Ops tend to be consistent in the way they use registers. When applicable, *$1 is used as the destination, *$2 is used as the first argument and *$3 is used as the second argument or modifier.

Control Flow (4 ops)

noop - do nothing

Explicitly do nothing. $1, $2 and $3 are ignored.
goto - go to a fixed offset in the current bytecode segment

Unconditionally transfer control flow to a fixed offset within the current bytecode segment. The target address is calculated as 256 * $1 + $2. $3 is ignored.
goto_if - conditionally go to a fixed offset in the current bytecode segment

Transfer control flow to a fixed offset within the current bytecode segment if *$3 is non-zero when treated as an integer. As with goto, the target address is calculated as 256 * $1 + $2.
goto_chunk - go to an offset in another chunk

Unconditionally transfer control flow to an offset within another chunk's bytecode segment, updating the current call frame's CHUNK, CONSTS, MDS and BCS members to point at the new chunk's name and segments, respectively. *$1 is the index of the target chunk and *$2 is the PC within that chunk. Chunk indicies can be retrieved using chunk name constants. see also "Chunk Name Constants and Bytecode Loading"

Math/numeric ops (12 ops)

add_i - integer addition
add_n - add two numeric registers

Treat *$2 and *$3 as integer or floating-point values, add them and store the result in *$1.
sub_i - subtract two integer registers
sub_n - subtract two numeric registers

Treat *$2 and *$3 as integer or floating-point values, subtract *$3 from *$2 and store the result in *$1.
mult_i - multiply two integer registers
mult_n - multiply two numeric registers

Treat *$2 and *$3 as integer or floating-point values, multiply *$1 by *$3 and store the result in *$1.
div_i - divide two integer registers
div_n - divide two numeric registers

Treat *$2 and *$3 as integer or floating-point values, divide *$1 by *$2 and store the result in *$1.
mod_i - remainder of of $3/$2 (integer)
mod_n - remainder of of $3/$2 (numeric)

Treat *$2 and *$3 as integer or floating-point values, divide *$1 by *$3 and store the remainder in *$1. Division by zero will make the the M0 interpreter sad.
convert_n_i - convert to numeric from integer

Assume that *$2 is an integer value, convert it to a floating-point value and store the result in *$1. $3 is ignored.
convert_i_n - convert to integer from numeric

Assume that *$2 is a floating-point value and that *$1 is an integer. Discard the fractional part of *$2 and store the result in *$1. $3 is ignored. If the integer part of *2 is larger than the interpreter's I registers are capable of storing, the largest supported positive or negative integer is stored in *1, corresponding to the sign of *$2.

Bitwise ops (6 ops)

ashr - right bitshift with sign extension

Shift the value in *$2 right by the number of bits in *$3, with sign extension, and store the result in *$1.
lshr - right bitshift without sign extension

Shift the value in *$2 right by the number of bits in *$2, without sign extension, and store the result in *$1.
shl - left bitshift

Shift the value in *$2 left by the number of bits in *$3 and store the result in *$1.
and - bitwise AND

Store *$2 & *$3 into *$1.
or - bitwise OR

Store *$2 | *$3 into *$1.
xor - bitwise exclusive OR

Store *$2 ^ *$3 into *$1.

Memory/GC ops (4 ops)

gc_alloc - allocate memory from the GC

Allocate *$2 bytes, store the pointer *$1. *$3 contains flags indicating any special properties required of the returned chunk of memory. A pointer returned by gc_alloc does not need to be freed. {{{ TODO: define what flags are valid for *$3 and what criteria exist for adding new ones }}}
sys_alloc - allocate memory using malloc or its equivalent

Allocate *$2 bytes, store the pointer *$1. $3 is ignored. The pointer returned by sys_alloc must be released with sys_free to avoid memory leaks. This is a direct interface to the underlying implementation's malloc() implementation (or equivalent) and does not go through ffi.
sys_free - free memory using free() or its equivalent

Free the region of memory which *$1 points at. $2 and $3 are ignored. This is a direct interface to the underlying implementation's free() implementation (or equivalent) and does not go through ffi.
copy_mem - copy data to/from memory

Write *$3 bytes from the memory location starting at *$2 to *$1. This instruction may access memory managed by the GC. Segfaults may result. Be careful.

Register ops (6)

set - set a register to a value

Set *$1 to whatever is in *$2. $3 is ignored.
set_imm - set a register to a hard-coded value

Set *$1 to $2 * 256 + $3.
deref - dereference a register

Treat *$2 as an array and *$3 as an index into that array. Set *$1 to whatever's at *$2[ *$3 ].
set_ref - change the value of a referenced register

Treat *$1 as an array and *$2 as an offset into that array. Set *$1[ *$2 ] to *$3.
set_byte - manipulate individual bytes in an array

NOTE: This op is not finalized. It may be omitted or changed significantly before being considered final.

Treat *$1 as an array and *$2 a byte offset into that array. Set byte number *$2 of *$1 to the least significant byte of *$3.
get_byte - read individual bytes in an array

NOTE: This op is not finalized. It may be omitted or changed significantly before being considered final.

Treat *$2 as an array and *$3 a byte offset into that array. Set *$1 to byte *3 of *$2. After this op, *$1 can never be more than 255.
set_word - manipulate individual word in an array

NOTE: This op is not finalized. It may be omitted or changed significantly before being considered final.

Treat *$1 as an array of words and *$2 a word offset into that array. Set word number *$2 of *$1 to the word in *$3.
get_word - read individual word in an array

NOTE: This op is not finalized. It may be omitted or changed significantly before being considered final.

Treat *$2 as an array of words and *$3 a word offset into that array. Set *$1 to word number *3 of *$2.

FFI ops (4 ops)

csym - look up a function by name

Look up the function pointer for using the name in *$2 in the C function namespace and put the pointer into *$1. $3 is ignored. *$2 is assumed to point at a C string.

Note that this function wraps the csym C function. Use it to access Parrot_dlopen and Parrot_dlsym to deal wih dynamic libraries.
ccall_arg - set an argument for a ccall

Take two arguments: an argument type in *$1 and an argument source in $2. *$1 is an immediate constant indicating the type of the argument. The possible values of *$1 will be similar to those used by libffi (https://github.com/atgreen/libffi/blob/master/doc/libffi.info ). $2 indicates which M0 register should be used to populate the value of the current argument. $3 is ignored.
ccall_ret - get a return value from a ccall

ccall_ret is similar to ccall_arg, except that it copies the return value of the previously-called function into *$2. $3 is ignored.
ccall - invoke a function

ccall uses the function pointer in *$1 to call a C function, assuming that its arguments have been set up correctly. $2 and $3 are ignored.

Temporary ops (3 ops)

print_s - prints a string

Print the string at *$2 to the filehandle *$1.
print_i - prints an int

Print the integer at *$2 to the filehandle *$1.
print_n - prints a number

Print the floating-point number at *$2 to the filehandle *$1.
exit - exit with the given status (go through ffi)

Call C's exit function with *$1. $2 and $3 are ignored.

These are too high-level and can be written in terms of simpler ops:

new - create a new PMC * allocate memory * set default values =item * store - store a register in a PMC, or vice versa
get_cf - Get the current Lorito call frame
invoke_cf - Invoke the current Lorito call frame * set return PC in call frame * switch code segment, if necessary * get (new) current PC from call frame
new_cf - Create a new Lorito call frame * allocate memory * set default values
load_bytecode - load bytecode into the current Lorito call frame * allocate memory * make a couple of system calls (hand-waving here)
mark_unused - tell the GC that this previously allocated memory is now unused

Textual Representation

M0's textual format will mirror its binary representation. It will consist of a series of named chunks with the following format. Any line beginning with an octothorpe (#) is a comment and will be ignored.

The first line that is not a comment or empty defines the version of M0 that is being used. This is so that if ops are added in the future, we can differentiate between different versions of M0 source code and bytecode.

The current version is 0.

Chunk Format

A chunk consists of a chunk identifier, a constants chunk, a metadata chunk and a bytecode chunk. A chunk represents self-contained computational units, approximately analogous to a function.

Chunk Identifier

A chunk identifier consists of a single line beginning with '.chunk', followed by a chunk name. The name consists of a double quote-delimited utf-8 string which must be unique. The empty string is allowed as a chunk name.

.chunk "chunk_name"

This name is the primary way that code will refer to a chunk. It is roughly analogous to a function's name in an HLL.

Constants Segment

The constants segment contains a numbered list of chunks of data. Data can be either an integer, a floating point number, a quote-delimited utf-8 string, arbitrary data in hex notation or the name of a chunk. For simplicity's sake, strings will only support escaping double-quotes and newlines. Any other data should be stored as a hex string. This data is accessible through the load_const op. Any constant data used by the metadata segment and the bytecode segment will be stored here. Constants denoted by ampersand followed by the name of a chunk can be used at runtime to find the index of a chunk. see also "Chunk Name Constants and Bytecode Loading"

.constants
0 1234
1 1.12345e-12
2 "asdfasdfs"
3 "hello, \"world\""
4 0x00ffbeef
5 "line"
6 23
7 &chunk_name

Metadata Segment

The metadata segment consists of triplets of integers mapping a name and a bytecode offset to a value. The first number is an offset into the bytecode segment. This is the instruction at which the metadata first takes effect. The second number is the offset into the constants table that contains the name of the metadata entry. The third is the offset into the constants table that contains the value.

.metadata
#at pc 1234, "line" is 23
1243 5 6

Bytecode Segment

The bytecode segment consists of a list of mnemonics for instructions and their arguments. All instructions take three int arguments between 0 and 255, even if they aren't all used.

.bytecode
  set   1, 3, 9
  add_i 3, 2, 3
  cmp_i 2, 3, 3
  goto  0, 0, 0

Register Names

Instruction arguments may also use the register names specified in the Types section in place of numeric arguments. This is nothing more than syntactic sugar. The two ops below will assemble to identical bytecode. When disassembling M0 bytecode, it is up to the disassembler to determine which form will be output.

div_i I0, I3, I4
div_i 12, 15, 16

As a way to filter out visual noise from M0 bytecode assembly listings, the value x may be used as an instruction argument to indicate that the argument's value is ignored. x is an alias for 0.

goto 0, 12, x

Labels

In M0's textual bytecode representation, labels may be used to represent bytecode offsets within a chunk. There is at most a one-to-one mapping between instructions and labels; an instruction may be prefixed by at most one label and the use of a label to identify more than one instruction is invalid. A label consists of an identifer ([a-zA-Z][a-zA-Z9-0-9_]*) and may not have the same name as a register (e.g. I0, INTERP, BCS, etc).

A label may be used as the first argument to goto or goto_if as an alternative to specifying the bytecode offset explicitly. Use of labels with any other ops is invalid.

.bytecode
set_imm I0, 0, 0
set_imm I1, 0, 1
loop: add_i I0, I0, I1
goto  loop, x

Labels exist only in M0's textual representation. During assembly they are translated into an immediate bytecode offset. For example, the above snippet would translate to:

.bytecode
# offset 0, 0
set_imm I0,  0, 0
# offset 1, 0
set_imm I1,  0, 1
# offset 2, 0
add_i   I0, I0, 1

# label "loop" translated to offset 0, 2
goto     0,  2, x

It is explicitly valid to use a label before it has been defined. This means that the assembler must be smart enough to accept undefined labels during parsing and fill in their values in bytecode once a chunk is fully parsed.

Full Example

The following should be a working "hello world" program in M0.

.version 0
.chunk "hello"
.constants
0 1
1 "hello, world"
2 0x0A00
3 0
.metadata
.bytecode
#I0 is 1
set_imm I0, 0, 0
deref   I0, CONSTS, I0

#I1 is "hello"
set_imm I1, 0, 1
deref   I1, CONSTS, I1

#I2 is "\n"
set_imm I2, 0, 2
deref   I2, CONSTS, I2

#I3 is 0
set_imm I3, 0, 3
deref   I3, CONSTS, I3

#print "hello" to stdout
# x is arbitrary
print_s I0, I1, x
#print "\n" to stdout
print_s I0, I2, x
#exit with status 0
exit I3, x, x

Binary Representation

M0's binary representation will be stored in .m0b files (M0 binary) and will be composed of a fixed header, a single directory segment and a number of chunks. Chunks correspond roughly to functions and have the following segments:

a constants segment containing the data that the segment needs
a metadata segment that carries any extra data like HLL line numbers, function names, annotations and custom data
a code segment containing the ops

{{{ NOTE: We should design the binary format of M0 in a way that allows it to be mmapped by an interpreter. Which considerations does this imply? }}}

All values stored in M0's binary representation will be stored using little-endian byte order. INSP values are either 4 or 8 bytes. N registers are always 8 bytes long, I regsters may be 4 or 8 bytes, depending on platform support. S and P registers have the same size as the underlying platform's pointers. Converting between M0 bytecode files of differing register sizes will be supported implicitly through the disassembly and reassembly of .m0b files. See also Runtime Layout of Primitives.

The size of a register type refers to both the number of bytes needed to store it in a .m0b file and the runtime size. This means that it is possible for textual M0 to contain a constant integer or numeric value which requires greater precision than the target implementation is capable of supporting. When this happens, the assembler and interpreter should fail with an appropriate error message.

The header has the following structure:

"\376M0B\r\n\032\n" : 8-byte magic number (copied/modified from pbc)
0x00       : 1 byte version number (currently 0)
0x?        : 1 byte number of bytes in I registers (4 or 8)
0x?        : 1 byte number of bytes in P and S registers (same as sizeof(void*)) (4 or 8)
xxxxx      : padding to 8-byte boundary

The directory segment will be a list of offsets to the starts of chunks within the file. It will have the following structure:

int32 : M0_DIR_SEG
int32 : number of chunks in this segment
int32 : number of bytes in this segment (including this header)
[
  int32 : offset of a constants segment
  int32 : size of chunk name in bytes
  [
    char : chunk name
  ]
]

The constants segment will contain any data needed to execute the bytecode. Data in constants table must be explictly loaded into registers as needed.

int32 : M0_CONST_SEG
int32 : number of constants in this segment
int32 : number of bytes in this segment (including this header)
[
  int32 : number of bytes in constant data
  ...    : constant data
]

The metadata segment contains any data about the bytecode that might come in handy. The actual data will live in the constants table; this segment just provides a way to map values to names and bytecode offsets.

int32 : M0_META_SEG
int32 : number of entries in this segment
int32 : number of bytes in this segment (including this header)
[
  int32 : bytecode offset relative to the start of the current bc seg where this data becomes effective
  int32 : offset into constants table for the name of this piece of metadata
  int32 : offset into constants table for the value of this piece of metadata
]

The bytecode segment contains a series of executable ops. A pointer (or its equivalent for a non-C language) to the current call frame will be passed as the first argument to every op, but this pointer will not be stored in bytecode.

int32 : M0_BC_SEG
int32 : number of ops in this segment
int32 : number of bytes in this segment (including this header)
[
  char : opcode
  char : arg1
  char : arg2
  char : arg3
]

Segment Numbers

M0_DIR_SEG       0x01
M0_CONST_SEG     0x02
M0_META_SEG      0x03
M0_BC_SEG        0x04

Opcode Numbers

In the interest of compatibility between implementations, the following is a canonical list of the ops an M0 implementation must implement and the numbers they must correspond to. Other opcode numbers are implementation-specific and are not require to have any defined semantics.

0x00 noop
0x01 goto
0x02 goto_if
0x03 goto_chunk
0x04 add_i
0x05 add_n
0x06 sub_i
0x07 sub_n
0x08 mult_i
0x09 mult_n
0x0A div_i
0x0B div_n
0x0C mod_i
0x0D mod_n
0x0E convert_n_i
0x0F convert_i_n
0x10 ashr
0x11 lshr
0x12 shl
0x13 and
0x14 or
0x15 xor
0x16 gc_alloc
0x17 sys_alloc
0x18 sys_free
0x19 copy_mem 
0x1A set
0x1B set_imm
0x1C deref
0x1D set_ref
0x1E set_byte
0x1F get_byte
0x20 csym
0x21 ccall_arg
0x22 ccall_ret
0x23 ccall
0x24 print_s
0x25 print_i
0x25 print_n
0x26 exit

Chunk Name Constants and Bytecode Loading

Note: Currently there is no facility for loading M0 libraries apart from loading and running a single .m0b file. This will change, at which point it will be necessary to have an efficient way to refer to a specific chunk without performing a runtime lookup of that chunk's name.

Placing &chunk_name in the constants segment provides an efficient way to transfer control flow to a specific chunk while avoiding a runtime translation of the chunk's name to its index. Unfortunately, this can't be a strictly compile-time construct because it is not guaranteed that m0b files will always be loaded in the same order. For example, if a hypothetical libfoo.m0b is loaded first and then loads a hypothetical libbar.m0b, libfoo's first chunk will be at index 0. Any M0 code in libbar which assumes that libbar's chunks start at 0 will break.

To accomodate this, chunk names in the constants segment must not be translated into integer indicies until a .m0b file is loaded into a running interpreter. In serialized bytecode, chunk name constants are stored as strings with the special encoding of 0. When an m0b library is loaded, the interpreter must store the index of the named chunk in the constants segment slot for that constant.

Runtime Layout of Primitives

The runtime layout of INSP registers is as follows.

I registers

I registers are 4 or 8 bytes wide, depending on platform and interpreter support. By default, instructions which work with integer values will assume that I registers are unsigned.

N registers

N registers are 8 bytes wide. By default, instructions which work with N registers will assume IEEE 754 64-bit semantics.

S and P registers

S and P registers are meant to contain pointers to objects. They have the same width as a pointer of the underlying platform, if applicable. If the underlying platform doesn't have pointers (e.g. JavaScript), S and P registers should be able to store the platform's equivalent of a pointer, providing appropriate semantics.

Primitive String Layout

In addition to object strings, M0 supports primitive strings. These are the low-level strings used when interacting directly with external code. The layout of a primitive string in M0 is as follows:

int32  - number of bytes in the string body int32  - encoding char.. - string
body

Primitive String Encoding

The encoding field of a primitive string can have the following values: * primitive string layout:

0 - special 1 - utf-8

In this context, "special" means that the value is used as chunk name constant. When a .m0b file is loaded, these constants are translated into chunk indicies. A primitive string with encoding 0 should never appear at runtime, only in bytecode. This string body is assumed to be utf-8 encoded.

The default encoding is utf-8. Other values have not yet been defined.

Execution

Execution of an binary M0 file beings with the first op in the bytecode segment of the first chunk. Each op consists of a 1-byte op number and 3 byte-sized arguments. The op number is used as an offset into a table of function pointers (or something equivalent). The current call frame is passed as the first argument to all op functions. The remaining three bytes of the op are passed to the op function directly. It is up to the op function how to interpret these values.

M0 Runloop

The runloop is the part of the interpreter that executes ops directly. The runloop does need to accomplish the following:

fetch the instruction at the current PC
execute the instruction
increment the pc, if needed

Execution of an M0 bytecode file starts with the first op of the first bytecode segment in a file and ends when the interpreter exits.

M0 Chunk Calling Conventions

In M0, chunks are roughly analogous to functions. This section describes the recommended implemenation and motivation for M0 calling conventions between chunks.

Basic Building Blocks

Control flow in M0 is based around goto and goto_if for intra-chunk control flow and goto_chunk for moving between chunks. While this is all that's strictly necessary for Turing completeness, a bit more complexity is needed to effectively implement constructs such as continuations, exceptions (and exception handlers), coroutines and so forth. The recommended structure for non-trivial control flow is continuation-passing style, with M0's call frame structure serving as the continuation. Members elements of a call frame may be manipulated using the set, set_imm, set_ref and deref ops.

This section will specify the following: * how to reliably transfer control flow between chunks * how to reliably map between sub names and chunks * how values from one sub are accessed by another (arguments/returns) * whether the caller or callee is responsible for call frame setup/teardow * how to properly initialize a call frame

Exceptions and Exception Handling

{{{ NOTE: This section is in flux. It's all open for discussion and should by no means be considered final. }}}

The exception system is based around call frames, just as regular control flow is. Exception handlers should be static constructs so that they only have a runtime cost when exceptions are thrown. Exception handlers are attached to a call frame's EH register.

Exception handlers must be chainable. When a handler cannot handle an exception, it must invoke its parent handler (preferably tailcalling). If it has no parent handler, it must look through previous call frames until it finds one. There must be a default top-level exception handler that handles all exceptions. This handler should print some debugging information to stderr and exit with a non-zero exit code. It is the M0 code's responsibility to create and install the top-level default exception handler.

An exception is a chunk that knows how to do a few things. It needs to look through the call chain for the first exception handler and invoke it, passing any relevant data to the handler. An exception may be resumed by invoking its parent call frame

Throwing an exception means creating a context, looking up a handler and invoking the handler with the exception as an argument. If an exception handler refuses to handle an exception, the handler is responsible for finding the next handler. The exception's responsibilities end when the first handler is invoked.

Objects

In Lorito, we'll still have INSP registers like PIR, but the S registers will be PMCs that happen to be strings. Additionally, PMCs and objects are being unified. M0's idea of an object/PMC is a blob of memory which obeys a primitive vtable interface, which is really just looking up an integer index in memory.

Memory Model

Memory can be allocated with sys_alloc and gc_alloc and freed with sys_free, if needed. copy_mem is used to copy values into and out of memory.

{{{ Will some memory be protected in M0? }}}

Register Spilling

In the event that the register allocator runs out of registers and needs to spill, an overflow call frame can be created and stored in the SPILLCF register. The overflow call frame will contain a copy of all INSP registers and the SPILLCF register itself. To despill, the register allocator must copy all values fromt the overflow call frame between SPILLCF and the last INSP register (P60) into the current call frame.

This method ensures that all objects referenced by the call frame will remain reachable by the GC and that an arbitrary amount of register spilling is possible, if needed. Register spilling is a last resort, and it is not expected to be a common occurance.

Structs

Structs (as implemented in C) will be an abstraction on top of M0. M0 will not be concerned with them directly.

Register Types and Call Frame Structure

There are four register types, Integer, Numeric, PMC and String. The string type will merely be syntactic sugar for accessing a PMC. From an M0 perpsective, S and P registers will be treated as opaque pointers.

Each M0 register will be 8 bytes (or have equivalent precision). A compliant call frame support up to 256 registers. M0 code can specify a call frame size anywhere between 12 and 255 registers when the call frame is allocated. It is up to the code which generates the M0 ops to ensure that the allocated call frame is large enough to support all the ops that the code will need to use.

M0 registers will not have innate types; they'll be nothing more than a collection of bits which are assumed to have a certain type by M0's ops. The types that ops attribute to register values will be based on the position of the register in an M0 call frame's register set. The proposed structure uses the first 8 registers for information specific to the call frame and has 60 of each primitive type:

number  type
0       CF - pointer to the current call frame
1       PCF - parent call frame
2       PC - current instruction within the current bytecode segment
3       RETPC - return PC
4       EH  - current exception handler
5       CHUNK - the index of the currently-executing chunk
6       CONSTS - pointer to constants segment
7       MDS - pointer to metadata segment
8       BCS - pointer to bytecode segment
9       INTERP - global interpreter data (see C<Interpreter Data>)
10      SPC4RENT - This register for rent.  No pets or smokers, please.
11      SPILLCF - overflow call frame (see C<Register Spilling>)
12-72   I0 - I60
73-133  N0 - N60
134-194 S0 - S60
195-255 P0 - P60

This means that an op which is passed the wrong value could pretty easily cause chaos by clobbering the PC or changing the pointer to the current call frame.

The first 12 registers must only be modified by the register and gc/memory ops. Using other ops to modify these registers is invalid. Implementations are free to assume that it does not happen and may fail when it does.

Interpreter Data

While most of the interesting parts of runtime data will live in a call frame, it still make sense to store some data in a global interpreter data structure. This will include the following items, stored in the same way as call frame data.

number  name         description
0       OP_FUNCS     array of functions that implement ops
1       CHUNKS       array of all loaded chunks
2       CHUNK_INFO   array of chunk metadata (currently just names)
3       CALL_FRAMES  array of all initial call frames
4       CONFIG       static global config data
5       ARGC         number of elements in ARGV
6       ARGV         array of command-line arguments passed to the m0 program

CONFIG should have at least the following values:

number  name            description
0       CFG_M0V         m0b version implemented by the running interpreter
1       CFG_REGSZ       size of a register in bytes
2       CFG_CFSZ        size of a call frame in bytes
3       CFG_IREGSZ      number of significant bytes in an I register
4       CFG_NREGSZ      number of significant bytes in an N register
5       CFG_OPCODESZ    number of bytes in an opcode
6       CFG_PTRSZ       number of bytes in a pointer
7       CFG_ENDIANNESS  host endianness (1 - big, 0 - little)

MOP interaction

A MOP will be implemented at the M1 level, so M0 must merely make it this possible. The M0 does not and should not understand any structures at higher levels. This will loosely couple M0 and M1 and allow them to evolve independently.

FFI

FFI considerations

M0's FFI should be amenable to interpretation both by interpreters implemented in C and high-level languages such as Python or Perl, to direct compilation to machine code, to jitting and to compilation to static C. Where possible, M0's design should not impose any constraints that preclude efficient implementation of any of type of interpreter.

M0 must be implemented such that only programs which make direct use of ffi require it to be presence. An M0 program which does not make use of FFI should run without modification on an implementation that does not support ffi.

Implementation

M0's FFI will implement similar functionality to dlfunc and dlvar in the form of a minimal set of atoms which are sufficient to emulate that functionality.

M0 will know enough about FFI to build a static call frame and call a C function pointer according to C calling conventions. M0 will have four ops to support this: csym, ccall_arg, ccall_ret and ccall. These are documented below.

csym sym, "function_name"

csym looks up the function pointer for function_name in the C function namespace and puts the pointer into the register sym.
ccall_arg arg_type, arg_src

ccall_arg takes two arguments: an argument type and an argument source. arg_type is a constant indicating the type of the argument. The possible values of arg_type will be similar to those used by libffi (https://github.com/atgreen/libffi/blob/master/doc/libffi.info ). arg_src indicates which M0 register should be used to populate the value of the current argument.
ccall_ret arg_type, arg_dest

ccall_ret is similar to ccall_arg, except that it copies the return value of the previously-called function into the register indicated by arg_dest.
ccall fp

ccall uses an existing function pointer to call a C function. It assumes that arguments have been set up correctly. Hilarity and segfaults are likely to result from breaking this assumption.

Examples

The following example code calls a function called hello_func, which takes no arguments and has no return value, in the library libhello.so. The use of strings as direct arguments is a simplification for explanatory purposes. In actual M0 bytecode, the string would be an index into the constants table.

dlopen_fp = csym "Parrot_dlopen"
dlsym_fp  = csym "Parrot_dlsym"
ccall_arg FFI_POINTER, "libhello.so" #hand-waving
ccall     dlopen_fp
ccall_ret FFI_POINTER, libhello

ccall_arg FFI_POINTER, "hello_func" #more hand-waving
ccall     dlsym_fp
ccall_ret FFI_POINTER, hello_func

ccall hello_func

This example shows how to call a function "multiply_int32" which take two 32-bit int arguments and returns an value of the same type and lives in "libmath.so". It assumes that dlsym_fp and dlopen_fp have been initialized as in the first example.

#dlsym_fp is a pointer to Parrot_dlsym
#dlopen_fp is a pointeer to Parrot_dlopen

ccall_arg FFI_POINTER, "libmath.so"
ccall     dlopen_fp
ccall_ret FFI_POINTER, libmath

ccall_arg FFI_POINTER, libmath
ccall_arg FFI_POINTER, "multiply_int32"
ccall     dlsym_fp
ccall_ret FFI_POINTER, multiply_fp

set       arg0, 12
set       arg1, 10

ccall_arg FFI_UINT32, arg0
ccall_arg FFI_UINT32, arg1
ccall     multiply_fp
ccall_ret FFI_UINT32, ret

say ret  # will print "120"

This example shows how to call an internal function (i.e. one that does not live in an external shared library) with the same signature as above.

set       arg0, 12
set       arg1, 10
csym      func, "Parrot_multiply_ints"

ccall_arg FFI_UINT32, arg0
ccall_arg FFI_UINT32, arg1
ccall     func
ccall_ret FFI_UINT32, ret

say ret  # will print "120"

Constants

These are the constants which an M0-compliant FFI implementation should deal with. They are based on the types used by libffi.

FFI_VOID
FFI_UINT
FFI_SINT
FFI_UINT16
FFI_SINT16
FFI_UINT32
FFI_SINT32
FFI_UINT64
FFI_SINT64
FFI_FLOAT
FFI_DOUBLE
FFI_UCHAR
FFI_SCHAR
FFI_USHORT
FFI_SSHORT
FFI_UINT
FFI_SINT
FFI_ULONG
FFI_SLONG
FFI_LONGDOUBLE
FFI_POINTER

Concurrency

This section will describe primitives that will be needed to allow data sharing and synchronization between call frames.

The Actor model of concurrency currently looks promising http://en.wikipedia.org/wiki/Actor_model . One implementation of the Actor model on the JVM is Akka : http://akka.io/

Bytecode Checksum

NOTE: IN FLUX

To ensure that bytecode is not mangled in some way and to prevent running mangled bytecode, the M0 bytecode format stores a SHA256 checksum internally. Since a file cannot store its own checksum, the M0 bytecode format stores a SHA256 (32 byte) checksum directly after the magic number, which represents the checksum of all bytes in m0b file after the checksum.

The bytecode checksum is used by the M0 interpreter, just after reading the magic byte. It verifies that the SHA256 sum in the header matches the calculated SHA256 of the rest of the file, and if it doesn't, throws an exception.

TODO: How do we handle lazy bytecode loading?

Lorito

Description

This section covers the parts of Lorito that are not directly relevant to someone implementing an M0-compatible VM. These topics are important to a complete implementation of Lorito but do not need to be directly addressed by M0 VM implementations.

Shouldn't this be a different spec document? Perhaps PDD33 for M1 ?

Op Composition

Op composition is considered magic-like and will not be part of M0.

Implementation

The newest prototype of this spec is being worked on in the 'm0-prototype' branch:

https://github.com/parrot/parrot/tree/m0-prototype

References

[TBC] - http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-10-01.pdf

Files

pdd32_m0.pod

Latest commit

History

pdd32_m0.pod

File metadata and controls

PDD 32: M0 Design Spec

Abstract

Status

M0 Milestones

Definitions

Design Goals

Scope of This Document

Call Frames

Registers

List of Ops

Control Flow (4 ops)

Math/numeric ops (12 ops)

Bitwise ops (6 ops)

Memory/GC ops (4 ops)

Register ops (6)

FFI ops (4 ops)

Temporary ops (3 ops)

Textual Representation

Chunk Format

Chunk Identifier

Constants Segment

Metadata Segment

Bytecode Segment

Register Names

Labels

Full Example

Binary Representation

Segment Numbers

Opcode Numbers

Chunk Name Constants and Bytecode Loading

Runtime Layout of Primitives

I registers

N registers

S and P registers

Primitive String Layout

Primitive String Encoding

Execution

M0 Runloop

M0 Chunk Calling Conventions

Basic Building Blocks

Exceptions and Exception Handling

Objects

Memory Model

Register Spilling

Structs

Register Types and Call Frame Structure

Interpreter Data

MOP interaction

FFI

FFI considerations

Implementation

Examples

Constants

Concurrency

Bytecode Checksum

Lorito

Description

Op Composition

Implementation

References