Syntax and semantics of Malfunction
Note: This "specification" is preliminary at best, and subject to change when bugs are found, features are added, or I'm bored.
Malfunction does only very basic checking of syntax when loading a program. For any program which passes the syntax checker, Malfunction should do one of the following:
- Produce exactly the same result in the intepreter (
malfunction eval) and compiler (
- Report it as having undefined behaviour in the interpreter (
- Fail to terminate in both the interpreter and the compiler
The behaviour of the interpreter and compiler should agree with the text below, although for a more precise specification you should look at the definition of the interpreter, which is straightforward (no bytecode or other efficiency tricks, just syntax-directed execution).
It's not necessarily a bug in Malfunction if a program compiles with
malfunction compile and then crashes at runtime. This specification
intentionally leaves much behaviour undefined, so you should count
yourself lucky if a program like
(field 0 0) is well-behaved enough
to merely crash.
However, the interpreter should detect all undefined behaviour (Malfunction's semantics are kept quite simple to ensure that this is easy). If a program runs to completion in the intepreter producing a result, it is a bug in Malfunction if the compiled version of the same program either crashes or produces a different value, and should be reported.
This doesn't apply to programs which link with OCaml code: if you use
(global ...) to call an OCaml function, it's up to you to ensure
that you pass inputs that won't make it crash, and the interpreter
will (currently) be of no help checking this.
This file contains various examples of expected output of sample programs. These are run as part of the testsuite, as are the other programs in the test directory.
Comments begin with
; and continue to the end of the line.
A Malfunction input file consists of a single
s-expression. S-expressions (or "sexps") consist of
(, a sequence of
whitespace-separated elements, and
), where elements are:
- Atoms: sequences of ASCII letters, digits, or symbols (the exact set of allowed symbols isn't quite nailed down yet)
\$followed by an atom
- Strings: double-quoted, with embedded
- s-expressions: nested arbitrarily
The top-level sexp must begin with the atom
by a list of bindings (described under
let, below), followed by an
sexp beginning with the atom
For inputs compiled with
malfunction compile (that is, whole
export sexp must be empty. For instance, a program
which evaluates the single sexp
E ignoring its result has this form:
(module (_ E) (export))
Inputs compiled with
malfunction cmx (that is, modules that are
later linked with other modules) may specify any number of values in
export sexp, which must be in the same order as they are listed
in the corresponding
Numbers and arithmetic
There are several numeric types, and associated constant syntax:
- int, e.g.
- int32, e.g.
- int64, e.g.
- bigint, e.g.
- float, e.g.
int64 use 32-bit and 64-bit two's complement arithmetic,
with wrap on overflow.
int uses either 31- or 63- bit two's
complement arithmetic (depending on system word size, and also
wrapping on overflow), and is generally fastest.
float uses IEEE754 double-precision (64-bit)
Various numeric operations are defined:
- Arithmetic operations:
- Bitwise operations:
- Numeric comparisons:
All of these operations take one or two
ints and return an
(+ 10 (* 20 3)) => 70
(<< 1 5) => 32
These operations come in
varieties, which may be used by suffixing
to the operation name. The bitwise operations are not available for
floats. The suffixed operations all take and return values of the
specified numeric type, except:
- the shift operators (
a>>) whose second argument (shift count) is always
- the comparison operators, whose result is always
int(in fact, always
(*.ibig 948324329804.ibig 8493208402394.ibig) => 8054316166085991599150776.ibig
(>>.i32 32.i32 5) => 1.i32
(+.f64 0.1 0.2) => 0.30000000000000004
As well as ordinary finite values, floats support infinite and NaN
values, available as the literals
nan. Note that
nan is unordered with respect to other floats, so
comparisons with it always return false.
Integer types are not automatically coerced, and behaviour is
undefined if the wrong types are passed to an operation. Explicit
conversions are done with
convert.FROM.TO. When converting between
integer types, conversions from smaller to larger types sign-extend
and conversions from larger to smaller truncate. Conversions from an
integer type to float round to the nearest float, which may not be
exactly equal to the specified integer.
Conversions from float to integer type truncate the fractional part. Currently, their behaviour is undefined if the input is outside the representable range of the target type, although this might change in the future.
(convert.i32.i64 42.i32) => 42.i64
(convert.f64.int 3.9) => 3
Functions are defined using the following syntax, and close over all bindings in scope:
(lambda ($arg1 $arg2 $arg3) BODY)
Functions are applied using the following syntax:
(apply FUNC ARG ARG ARG)
Multiple-argument functions are implicitly curried, and may be partially applied (resulting in a closure) or applied to too many arguments (resulting in an application of the returned value). For instance,
(apply (apply (lambda ($a $b) (+ $a $b)) 20) 22) => 42
(apply (lambda ($a) (lambda ($b) (+ $a $b))) 20 22) => 42
However, performance will be higher if functions are applied to exactly the right number of arguments.
Evaluation is eager: functions and arguments are evaluated before their bodies. The function is evaluated before the arguments, and arguments are evaluated left to right.
let introduces a sequence of bindings:
(let BINDING BINDING BINDING ... BODY)
Each binding is of one of the forms:
($var EXP): binds
$varto the result of evaluating
$varscopes over subsequent bindings and the body.
(_ EXP): evaluates
EXPand ignores the result
(rec ($VAR1 EXP1) ($VAR2 EXP2) ...): binds each
$VARmutually recursively. Each
EXPmust be of the form
(lazy ...). Bindings scope over themselves, each other, subsequent bindings, and the body.
For example, here is a definition of the "even" and "odd" predicates
ints, and an application of them to check whether 42 is even (see
(let (rec ($even (lambda ($n) (if (<= $n 1) (== $n 0) (apply $odd (- $n 1))))) ($odd (lambda ($n) (if (<= $n 1) (== $n 1) (apply $even (- $n 1)))))) ($res (apply $even 42)) $res) => 1
(seq EXP EXP...) is equivalent to
(let (_ EXP) (_ EXP)... EXP), and can be used to write sequences of imperative
actions whose results are ignored.
Blocks and fields
Blocks (tuples) are constructed using
(block (tag N) EXP EXP EXP ...), where
N is a constant integer called the "tag" (an integer in
the range 0-199 which may be used with
switch below) and each
is a field of the resulting block.
Fields are projected from a block using
(field N EXP), where
an integer between 0 and one less than the length of the block.
(let ($a (block (tag 0) 1 2 (block (tag 1) 0) 3)) ($b (block (tag 0) (field 2 $a) (field 0 $a))) $b) => (block (tag 0) (block (tag 1) 0) 1)
Fields of blocks can only be accessed at constant, compile-time-known offsets. For random access into a structure, see "Vectors" below.
The general conditional expression is
switch, of the form:
(switch EXP (SEL SEL... EXP) (SEL SEL... EXP) ...)
EXP is evaluated, and matched against each case in the
order they appear. A case (of the form
(SEL SEL... EXP)) matches if
any of its selectors (
SEL) match. The result of the switch is the
result of evaluating the
EXP of the first matching case. Selectors may be:
42: integers, matching themselves. Only
bigintmay be matched. Use comparison operators, which always return ints, to switch on other integer types.
(10 20): integer ranges, matching (in this example) any
10 <= n <= 20
_: default case, matching any integer
(tag 10): matches blocks with tag 10
(tag _): matches any block.
_ matches only integers. To have a case handle all
integers and all blocks, write
(_ (tag _) EXP).
Selectors must be literal constants. To compare against runtime values, use comparison operators.
(let ($sw (lambda ($n) (switch $n (5 (10 20) 100) ((15 50) 200) (_ 300) ((tag 10) 400)))) ($a (apply $sw 5)) ($b (apply $sw 10)) ($c (apply $sw 50)) ($d (apply $sw 60)) ($e (apply $sw (block (tag 10)))) (block (tag 0) $a $b $c $d $e)) => (block (tag 0) 100 100 200 300 400)
Behaviour is undefined if no cases match. If you are compiling a conditional statement to Malfunction and cannot prove that all cases are handled, you must add an explicit default case.
(if A B C) is equivalent to:
(switch A (0 C) (_ (tag _) B))
C is executed if
A evaluates to zero, and
Complex mixtures of conditions in
switch expressions perform well -
the OCaml compiler generates good code for pattern-matching.
Vectors are mutable, fixed-length sequences of slots. The slots are numbered from 0 to one less than the length of the vector, and support random access.
(makevec LEN VAL): creates a vector of length
LEN(must evaluate to a nonnegative integer) initially containing the result of evaluating
VALin all slots.
(load VEC IDX): evaluates
VEC(which must evaluate to a vector) and
IDX(which must evaluate to an integer), and returns the value of the
IDX'th slot (which must be in bounds).
(store VEC IDX VAL): evaluates
IDX(an in-bounds index) and
VAL, and stores
(length VEC): evaluates
VEC(a vector) and returns its length.
As well as standard vectors, byte vectors are also available. They
have the same semantics as vectors, except that the operations are
makevec.byte, etc.) and they may only be used to
store integers in the range 0-255.
Behaviour is undefined if byte vector operations are used on standard vectors, or vice versa.
String literals (
"hello") return byte vectors.
A lazily-evaluated expression is one that's not computed until it's
needed. The expression
(lazy E) constructs a lazy value that wraps
E but does not evaluate it immediately. When the lazy
value is examined using
E will be computed.
E is evaluated at most once for a given
(lazy E). If the resulting
forced twice, then the value computed by
E is cached.
E produces side effects, these occur at the time that
E is first forced. For instance:
(let ($box (makevec 1 42)) ($thunk (lazy (let ($val (load $box 0)) (_ (store $box 0 (+ $val 1))) $val))) (block (tag 0) (load $box 0) (force $thunk) (load $box 0) (force $thunk))) => (block (tag 0) 42 42 43 42)
Accessing OCaml values
OCaml values can be accessed from Malfunction programs by specifying a
module path using
global. For instance, the OCaml function
Stdlib.print_string is referred to as
(global $Stdlib $print_string).
As well as calling OCaml functions, certain OCaml values can be
directly manipulated. The integer types
correspond to the OCaml types
tuples are blocks with tag 0, and OCaml algebraic data types are
represented as a combination of ints and blocks as described in
19.3.4 of the OCaml manual, "Concrete data types".