This document describes NAM, aka CgOp, the Niecza Abstract Machine. NAM is the language used to connect the portable parts of Niecza to the unportable. It is the last Niecza IR which is shared between all cross-compiler backends. It is used primarily to refer to three things: a computing model suitable for running Niecza output, a representation of abstract operations in the model, and a file format for storing modules in the model.
A program for execution by NAM consists of one or more units, one of which is singled out as the main unit by a compiler option. Each unit consists of some global data, a list of dependency units, and a set of meta-objects.
The dependency lists organize the units into a directed acyclic graph. A unit can only see objects from another unit if a dependency is declared. This facilitates recompilation checking.
Meta-objects have per-unit unique identifiers, and can be identified globally by a token known as an xref, which contains the originating unit's identity, the per-unit identifier, and a name to facilitate debugging. Meta-objects come in two basic types; sub bodies and packages. Packages are further subdivided into packages, modules, classes, grammars, roles, and parametric roles.
Sub bodies contain a variety of metadata, including the runtime class, flags for various special types of sub, the signature, the set of lexical variable definitions, and a tree of operations. This tree is structured much like a Lisp program and obeys similar evaluation rules.
NAM code must be statically typable but this may not always be enforced. Different data objects have logical types, which can map many-to-one onto lower-level types, especially in type-poor environments such as Parrot and JavaScript.
Packageoids contain information about the construction of the object, such as methods, attributes, superclasses, the C3 MRO, and the name.
Each metaobject is logically divided into a persistant portion and a temporary portion. The persistant portion is required by the compiler to parse and generate code for depending modules; the temporary portion is not. This allows less data to be loaded.
A native integer, suitable for loop variables and similar purposes.
A native float, suitable for the Perl 6 Num class.
A native bool, as returned by comparison operators.
A reference to a native immutable string.
A reference to a native mutable string.
A Perl 6 variable, with identity, potentially mutable and tied.
A reference to a Perl 6 object; not a variable and cannot be assigned to.
A hash table mapping strings to Perl 6 variables.
An array of Perl 6 variables fixed in length at creation.
An array of Perl 6 variables supporting O(1) deque operations.
The nexus of HOW, WHAT, WHO, and REPR. Details subject to flux.
A reference to a native text input object.
A reference to a native text output object.
A node in the LTM Automaton Descriptor metaobject tree.
A reference to a compiled character class.
A reference to a low-level cursor. Currently a subtype of obj.
A reference to a call frame. Currently a subtype of obj.
These do not appear in nam files as they are expanded in src/CgOp.pm6
.
Evaluates arguments in sequence and returns CORE::Nil
. Useful for embedding a sequence of void nam ops in a Perl 6 statement list.
These should not be used by the frontend. They are used for construction of some internal code fragments, usually in response to _hack settings.
Sets the $line_number
for the $operation
. In the C# backend, line number is only recorded at high-level call sites within the span.
Within $body
, any lexical access to a $lexname
is remapped into a letvar
access to the corresponding $letname
. This is used for inlined functions. If $transparent
is false, the corresponding scope should be seen by OUTER::
and the like (not yet implemented).
Generates the code for $body
bracketed by the labels $n1
and $n2
. If an exception transfers control to $n2
, the exception payload will be returned. Triples of $code
, $label
, and $goto
define exception handling within the block. $sync
forces exception handling to be synchronous with respect to the boundaries, allowing an ON_DIE
handler to function properly.
For each triple, while execution is within the block, an exception of class $class
[1] will cause control to be transferred to $goto
. $name
is used for targetted control exceptions, possibly paired with the identity of the target frame. A name of the empty string is treated as no name; such handlers can only be reached anonymously.
[1] The following class values are currently defined:
1 ON_NEXT &next
2 ON_LAST &last
3 ON_REDO &redo
4 ON_RETURN &return
5 ON_DIE General exception, payload usually Str
6 ON_SUCCEED &succeed, when{} matched
7 ON_PROCEED &proceed
8 ON_GOTO &goto
9 ON_NEXTDISPATCH &nextsame, &nextwith; payload is a Capture
Adds $left and $right together.
The basic polymorphic assignment operator. Depending on the list status of the left variable, either generates a basic store, or a LISTSTORE method call.
Extracts the value stored in a variable.
Creates a new scalar variable of type Any containing Any.
Used for binding parameters in beta-eliminated subs, used to be the heart of the binder but this is no longer so.
Wraps $obj
in a read-only variable with the list nature.
Creates a new variable of type Any containing $obj
.
Wraps $obj
in a read-only variable.
Were used for autovivification; currently unused.
Creates an autovivifiable variable which will call $sub
when written to or bound rw, and otherwise functions as ordinary rw.
Returns the object backing $var
(will be a fake proxy if not tied).
True if $var
has the list nature.
Creates a new variable of type $type
which delegates access to $fetch
and $store
. If $bind
is defined, it will be called on the first rw binding, per the autovivification protocol.
Implements X
or Xop
; if $usefun
is true, the first item in $fvarlist
is taken as a function reference.
Implements Z
or Zop
; if $usefun
is true, the first item in $fvarlist
is taken as a function reference.
A fvarlist
is a fixed sized object like a C# or Java Variable[]
array. A vvarlist
is an O(1) deque like a C++ std::deque<var>
. Most operations on these types are fairly straightforward. vvarlist
also does duty as the most fundamental type of iterator; several operations are designed to do essential iterator tasks. vvarlist_
operations are not cognizant of iterator structure and should not be used on iterators without careful consideration of the effect.
Extracts a single element. BUG: Currently evaluates its arguements in the wrong order.
Return the number of elements in the argument as an int
.
Construct a new fvarlist of compile-time length, like a C# array literal.
Creates a new iterator which iterates over the same values, but all copied into fresh read-write variables. Mostly eager.
Creates a new iterator which mostly-eagerly presents the same values with sublists flattened.
Attempts to extract a value from an iterator without flattening sublists. Returns bool
; if true, the value may be returned by vvarlist_shift
.
See iter_hasarg
.
Adds all elements (non-destructively) from a source list onto the end of a destination list in order.
Creates a new non-aliasing list with all elements aliases of the elements of an old list.
Returns the number of items in a list.
Creates a new list with exactly one initial element. Useful for bootstrapping iterations.
Removes and returns the last element of a nonempty list.
Adds a new element to the end of a list.
Removes and returns the first element of a nonempty list.
Sorts a list (not in place). $cb_obj
must be an invocable object which functions as a two-argument sort routine, returning Num
.
Adds a new element to the beginning of a list.
Adds the contents of a fixed list to the beginning of a variable list in order.
Go to label named $name
(must be a literal string) if $if
is true.
The most general interface to the exception generator.
Passes control to a coroutine identified by the continuation frame $cont
. When said coroutine uses take, the new continuation frame is stored in the $*nextframe
lexical and the value passed is returned.
Throws a basic non-resumable exception. $exception
may be a raw string and it will DWIM.
Calls method $name
on the first argument. The interpretation of the rest of the arguments is controlled by $sig
; for each argument, there is a token in $sig
consisting of a length and a sequence of characters. A zero-length sequence represents an ordinary positional, a string like ":name" represents a named parameter, and "flatcap" represents a | parameter. Note that in the last case, the argument should have type obj
.
See cgoto.
Evaluates arguments in sequence and returns the result of the last one.
Low-level return from a body; does NOT use the control exception mechanism. Probably best regarded as a backend-internal operator.
Creates a new coroutine to invoke $sub
without arguments, and returns the initial continuation frame.
Identical to methodcall, except that the method name is considered forced to INVOKE
.
Passes $thing
to the coroutine which caused the current coroutine to be invoked. When this coroutine is restarted, take returns the value unchanged.
The basic branching operator.
The basic repetition operator. If $once
is passed, the loop is treated as repeat..while. If $until
is passed, the condition is inverted.
The const op causes its argument to be evaluated once and stored in the constant table; it should only be used in cases where the identity will have the same effect, and backends are not required to honor it.
Produces a null value of type $type
, suitable for initializing a variable or other similar usage.
Evaluates and ignores $argument.
Creates a boxing object of a given type. $proto
may be a CORE:: name.
Returns the raw stab
for a class, by CORE:: name or xref node fields.
Implements Mu.new.
Fetches a named slot from an object. $type
must be used consistantly.
Returns the ClassHOW or similar instance for an object.
Instantiates a parameterized role (first argument) with arguments (rest).
Checks REPR-level definedness, not .defined.
Fastest way to create an object. Does not set up variables for attributes.
Implements the but operator for type objects.
Mutates a boxed value in place. Use carefully!
Binds a slot, possibly to a native value.
Obtains a reference to the Sub implementing a private method.
NAM unit files are encoded in JSON, using only numbers, strings, and sequences; mappings and boolean values are excluded. It is helpful to consider a number of "node types" for describing the format of the sequences. Most node types reflect a sequence with a fixed number of children with fixed interpretations. No names are used; all access is by index.
A file contains two JSON objects. The first one is of the "File root" type; the second is an array of the temporary parts of meta-objects. Meta-objects with no temporary object will be null, or possibly omitted if at the end. Currently only subs use the temporary segment.
Name Type Description
mainline_ref Xref Xref to mainline subroutine
name string Unit's unique name
log ... Mostly unused vestige of last stash system
setting string Name of setting unit or null
bottom_ref Xref Xref to sub containing {YOU_ARE_HERE}, or null
filename string Filename of source code or null
modtime number Seconds since 1970-01-01
xref Xref[] Resolves refs from other units
tdeps TDep[] Holds dependency data for recompilation
stash_root StNode Trie holding classes and global variables
xref entries cannot be reordered as they are referenced by index. Filename and modification time are used for checking recompilation necessity; tdeps ("transitive dependency") are used to check for recursive recompilation with minimal file reading. Filename is also used to provide $?FILE
. Each xref entry is either null, a Subroutine, or a Packageoid.
Name Type Description
unit string Names unit of origin
index number Indexes into unit's xref array
name string Descriptive name for debugging
Cross-reference (xref) nodes allow object references to cross unit boundaries without complicating serialization.
Name Type Descripton
unitname string Names unit that is depended on
filename string Absolute filename of source code
modtime number Modification time in POSIX seconds
This is a sequence of tuples; each such tuple has one of the forms [ name, "var", Xref, ChildNode ]
or [ name, "graft", path ]
.
Name Type Description
name string Method name without ! decorator
kind string [1]
var string Variable for implementing sub in param role
body Xref Reference to implementing sub
[1] Allowable kinds are "normal", "private", and "sub".
Name Type Description
name string Attribute name without sigil or twigil
public number Nonzero if attribute should be easy to inspect
ivar string Sub name of BUILD phaser for param roles
ibody Xref Reference to BUILD phaser
Name Type Description
typecode string Always "sub"
name string Sub's name for backtraces
outer_xref Xref OUTER:: sub, may be in a setting unit
flags number [1]
children num[] Supports tree traversals
class string &?BLOCK.WHAT; "Sub" or "Regex"
ltm LtmNode Only for regexes; stores declarative prefix
exports str[][] List of global names
signature Param[] May be null in exotic cases
lexicals Lex[] Come in multiple forms[6]
Temporary portion:
Name Type Description
xref Xref For documentation only
param_role_hack ... [2]
augment_hack ... [3]
hint_hack ... [4]
is_phaser number [5]
body_of Xref Only valid in immediate block of class {} et al
in_class Xref Innermost enclosing body_of
cur_pkg str[] OUR:: as a list of names
lexicals Lex[] Come in multiple forms[6]
nam ... See description of opcodes earlier
[1] The following flags are used:
1 RUN_ONCE Sub does not need pad cloning
2 SPAD_EXISTS Sub needs a static pad
4 GATHER_HACK Assume a "take EMPTY" at end
8 STRONG_USED Not dead code even if unreferenced
16 RETURNABLE Add a return exception handler
32 AUGMENTING Is an augment{} block
[2] Xref to role object if this is a role{} block with parameters
[3] Sequence; first item is a ref to the target packageoid, subsequent items are Method descriptors.
[4] Sequence of [Xref, string] identifying a specific "hint" lexical in a specific sub. This lexical is bound to the return value of the current sub's code; will always be seen with a PREINIT phaser.
[5] If non-null, registers the current sub for a phaser queue.
0 INIT Before global mainline
1 END Not implemented
2 PREINIT Before all mainlines
[6] Either the temporary copy will be null, or the primary copy will have no items, depending on whether this sub needs to have its lexicals inspected by the compiler.
These come in several flavors, but all share the same first two fields, which are used to find the correct lexical and identify its format.
Name Type Description
name string "$?FOO" or similar
typecode string Always 'hint'
This type is used for lexically scoped constants. They cannot be rebound by the scopedlex or corelex operations, but are automatically bound by the handling of hint_hack subs.
Name Type Description
name string "OUTER" or similar
typecode string Always 'label'
This type marks labels. Labels are cloned like subs on entry, and refer to objects which encapsulate a name and a frame reference.
Name Type Description
name string "&infix:<+>" or similar
typecode string Always 'dispatch'
This type is used for dispatch subs. Dispatch subs are created on clone and encapsulate some number of multi candidates, specifically all lexically-visible unshadowed subs with names like the dispatch followed by ":(" and any extra stuff.
Name Type Description
name string "$foo"
typecode string Always 'simple'
flags number 4=NOINIT, 2=LIST, 1=HASH
These are used for run of the mill my-variables. NOINIT is required for variables that are initialized by signature binding.
Name Type Description
name string "$foo"
typecode string Always 'alias'
to string "anon_21934"
These are used for state variables, which need storage in an outer sub, but should only be accessible under the declared name in an inner one.
Name Type Description
name string "Regex"
typecode string Always 'stash'
path... string "GLOBAL"
path... string "STD"
path... string "Regex"
These are used to lexically name packageoids. All packageoids have a stash name; my-scoped packageoids get gensym names. The list of names is stored inline.
Name Type Description
name string "$ALL"
typecode string Always 'common'
path... string "GLOBAL"
path... string "STD"
path... string "$ALL"
These are used for our-scoped variables. As an optimization, direct references like $STD::ALL
generate a gensym-named common lexical.
Name Type Description
name string "&say"
typecode string Always 'sub'
[Xref stored inline here]
These are used for subs, and must be in correspondence with the "zyg" list.
Name Type Description
name string For binding error messages
flags number [1]
slot string Name of lexical to accept value
names str[] All legal named-parameter names
default Xref Sub to call if HAS_DEFAULT; must be child of this
[1] Flag values are as follows.
1 SLURPY *@foo or *%foo (check HASH)
2 SLURPYCAP |$foo
4 RWTRANS \$foo
8 FULL_PARCEL \|$foo
16 OPTIONAL $foo?
32 POSITIONAL $foo, not :$foo
64 READONLY $foo, not $foo is rw
128 LIST @foo
256 HASH %foo
Name Type Description
typecode string A definition keyword or "parametricrole"
name string The object's debug name
exports str[][] List of global names to which object is bound
(The following are only found in class, grammar, role, parametricrole)
attributes attr[] Attributes local to the class
methods methd[] Methods local to the class
superclasses Xref[] Direct superclasses of the class
(The following is only found in class, grammar)
linear_mro Xref[] All superclasses in C3 order