Skip to content

Latest commit

 

History

History
1148 lines (671 loc) · 26.6 KB

nam.pod

File metadata and controls

1148 lines (671 loc) · 26.6 KB

Synopsis

This document describes NAM, aka CgOp, the Niecza Abstract Machine. NAM is the language used to connect the portable parts of Niecza to the unportable. It is the last Niecza IR which is shared between all cross-compiler backends. It is used primarily to refer to three things: a computing model suitable for running Niecza output, a representation of abstract operations in the model, and a file format for storing modules in the model.

General model

A program for execution by NAM consists of one or more units, one of which is singled out as the main unit by a compiler option. Each unit consists of some global data, a list of dependency units, and a set of meta-objects.

The dependency lists organize the units into a directed acyclic graph. A unit can only see objects from another unit if a dependency is declared. This facilitates recompilation checking.

Meta-objects have per-unit unique identifiers, and can be identified globally by a token known as an xref, which contains the originating unit's identity, the per-unit identifier, and a name to facilitate debugging. Meta-objects come in two basic types; sub bodies and packages. Packages are further subdivided into packages, modules, classes, grammars, roles, and parametric roles.

Sub bodies contain a variety of metadata, including the runtime class, flags for various special types of sub, the signature, the set of lexical variable definitions, and a tree of operations. This tree is structured much like a Lisp program and obeys similar evaluation rules.

NAM code must be statically typable but this may not always be enforced. Different data objects have logical types, which can map many-to-one onto lower-level types, especially in type-poor environments such as Parrot and JavaScript.

Packageoids contain information about the construction of the object, such as methods, attributes, superclasses, the C3 MRO, and the name.

Each metaobject is logically divided into a persistant portion and a temporary portion. The persistant portion is required by the compiler to parse and generate code for depending modules; the temporary portion is not. This allows less data to be loaded.

Runtime data objects, by static type

int

A native integer, suitable for loop variables and similar purposes.

num

A native float, suitable for the Perl 6 Num class.

bool

A native bool, as returned by comparison operators.

str

A reference to a native immutable string.

strbuf

A reference to a native mutable string.

var

A Perl 6 variable, with identity, potentially mutable and tied.

obj

A reference to a Perl 6 object; not a variable and cannot be assigned to.

varhash

A hash table mapping strings to Perl 6 variables.

fvarlist

An array of Perl 6 variables fixed in length at creation.

vvarlist

An array of Perl 6 variables supporting O(1) deque operations.

stab

The nexus of HOW, WHAT, WHO, and REPR. Details subject to flux.

treader

A reference to a native text input object.

twriter

A reference to a native text output object.

lad

A node in the LTM Automaton Descriptor metaobject tree.

cc

A reference to a compiled character class.

cursor

A reference to a low-level cursor. Currently a subtype of obj.

frame

A reference to a call frame. Currently a subtype of obj.

Operations

Macros

These do not appear in nam files as they are expanded in src/CgOp.pm6.

cc_expr

construct_lad

getattr

let

newblankhash

newblanklist

noop

rnull(*@arguments)

Evaluates arguments in sequence and returns CORE::Nil. Useful for embedding a sequence of void nam ops in a Perl 6 statement list.

string_var

varattr

Internal to the backend

These should not be used by the frontend. They are used for construction of some internal code fragments, usually in response to _hack settings.

_addmethod

_hintset

_invalidate

_makesub

_newlabel

_parametricrole

Annotations

ann($unused,$line_number,$operation)

Sets the $line_number for the $operation. In the C# backend, line number is only recorded at high-level call sites within the span.

letscope($transparent,{$lexname,$letname}...,$body)

Within $body, any lexical access to a $lexname is remapped into a letvar access to the corresponding $letname. This is used for inlined functions. If $transparent is false, the corresponding scope should be seen by OUTER:: and the like (not yet implemented).

xspan($n1,$n2,$sync,$body,{$class,$name,$goto},..)

Generates the code for $body bracketed by the labels $n1 and $n2. If an exception transfers control to $n2, the exception payload will be returned. Triples of $code, $label, and $goto define exception handling within the block. $sync forces exception handling to be synchronous with respect to the boundaries, allowing an ON_DIE handler to function properly.

For each triple, while execution is within the block, an exception of class $class [1] will cause control to be transferred to $goto. $name is used for targetted control exceptions, possibly paired with the identity of the target frame. A name of the empty string is treated as no name; such handlers can only be reached anonymously.

[1] The following class values are currently defined:

1   ON_NEXT             &next
2   ON_LAST             &last
3   ON_REDO             &redo
4   ON_RETURN           &return
5   ON_DIE              General exception, payload usually Str
6   ON_SUCCEED          &succeed, when{} matched
7   ON_PROCEED          &proceed
8   ON_GOTO             &goto
9   ON_NEXTDISPATCH     &nextsame, &nextwith; payload is a Capture

Operations on numbers

arith

divide

minus

mod

mul

negate

numand

numcompl

numeq

numge

numgt

numle

numlshift

numlt

numne

numor

numrshift

numxor

plus(var $left,var $right) is pure

Adds $left and $right together.

postinc

double

int

num_to_string

Operations on variables

assign($lhs,$rhs)

The basic polymorphic assignment operator. Depending on the list status of the left variable, either generates a basic store, or a LISTSTORE method call.

fetch($var)

Extracts the value stored in a variable.

newblankrwscalar()

Creates a new scalar variable of type Any containing Any.

newboundvar($ro,$list,$var)

Used for binding parameters in beta-eliminated subs, used to be the heart of the binder but this is no longer so.

newrwlistvar($obj)

Wraps $obj in a read-only variable with the list nature.

newrwscalar($obj)

Creates a new variable of type Any containing $obj.

newscalar($obj)

Wraps $obj in a read-only variable.

newvarrayvar

newvhashvar

newvnewarrayvar

newvnewhashvar

Were used for autovivification; currently unused.

newvsubvar($type,$sub,$obj)

Creates an autovivifiable variable which will call $sub when written to or bound rw, and otherwise functions as ordinary rw.

var_get_var($var)

Returns the object backing $var (will be a fake proxy if not tied).

var_islist($var)

True if $var has the list nature.

var_new_tied($type,$bind,$fetch,$store)

Creates a new variable of type $type which delegates access to $fetch and $store. If $bind is defined, it will be called on the first rw binding, per the autovivification protocol.

Operations on strings

chars

chr

ord

streq

strge

strgt

strle

strlt

strne

substr3

char

str

strbuf_append

strbuf_new

strbuf_seal

str_chr

strcmp

str_length

str_substring

str_tolower

str_tonum

str_toupper

Regex engine operations

make

cursor_ast

cursor_backing

cursor_butpos

cursor_dows

cursor_fresh

cursor_from

cursor_item

cursor_O

cursor_pos

cursor_reduced

cursor_start

cursor_synthetic

cursor_unmatch

cursor_unpackcaps

fcclist_new

get_lexer

ladconstruct

mrl_count

mrl_index

newcc

popcut

pushcut

run_protoregex

rxbacktrack

rxbprim

rxcall

rxclosequant

rxcommitgroup

rxend

rxfinalend

rxframe

rxgetpos

rxgetquant

rxincquant

rxinit

rxopenquant

rxpushb

rxpushcapture

rxsetcapsfrom

rxsetclass

rxsetpos

rxsetquant

rxstripcaps

Operations on Perl 6 lists

array_constructor

cross($usefun,$fvarlist)

Implements X or Xop; if $usefun is true, the first item in $fvarlist is taken as a function reference.

grep

map

zip($usefun,$fvarlist)

Implements Z or Zop; if $usefun is true, the first item in $fvarlist is taken as a function reference.

get_first($list)

iter_to_list($iter)

promote_to_list($var)

Operations on low-level lists

A fvarlist is a fixed sized object like a C# or Java Variable[] array. A vvarlist is an O(1) deque like a C++ std::deque<var>. Most operations on these types are fairly straightforward. vvarlist also does duty as the most fundamental type of iterator; several operations are designed to do essential iterator tasks. vvarlist_ operations are not cognizant of iterator structure and should not be used on iterators without careful consideration of the effect.

fvarlist_item($index,$fvl)

Extracts a single element. BUG: Currently evaluates its arguements in the wrong order.

fvarlist_length($fvl)

Return the number of elements in the argument as an int.

fvarlist_new(*@elements)

Construct a new fvarlist of compile-time length, like a C# array literal.

iter_copy_elems($iter)

Creates a new iterator which iterates over the same values, but all copied into fresh read-write variables. Mostly eager.

iter_flatten($iter)

Creates a new iterator which mostly-eagerly presents the same values with sublists flattened.

iter_hasarg($iter)

Attempts to extract a value from an iterator without flattening sublists. Returns bool; if true, the value may be returned by vvarlist_shift.

iter_hasflat($iter)

See iter_hasarg.

vvarlist_append($onto,$new)

Adds all elements (non-destructively) from a source list onto the end of a destination list in order.

vvarlist_clone($old)

Creates a new non-aliasing list with all elements aliases of the elements of an old list.

vvarlist_count($list)

Returns the number of items in a list.

vvarlist_from_fvarlist($fv)

vvarlist_item($index,$list)

vvarlist_new_empty()

vvarlist_new_singleton($var)

Creates a new list with exactly one initial element. Useful for bootstrapping iterations.

vvarlist_pop($list)

Removes and returns the last element of a nonempty list.

vvarlist_push($list,$item)

Adds a new element to the end of a list.

vvarlist_shift($list)

Removes and returns the first element of a nonempty list.

vvarlist_sort($cb_obj,$list)

Sorts a list (not in place). $cb_obj must be an invocable object which functions as a two-argument sort routine, returning Num.

vvarlist_to_fvarlist($list)

vvarlist_unshift($list,$item)

Adds a new element to the beginning of a list.

vvarlist_unshiftn($list,$fvl)

Adds the contents of a fixed list to the beginning of a variable list in order.

Operations involving the operating system

gettimeofday

exit

getargv

note

path_any_exists

path_change_ext

path_combine

path_dir_exists

path_file_exists

path_modified

path_realpath

print

say

slurp

spew

treader_getc

treader_getline

treader_open

treader_slurp

treader_stdin

Operations invoking the context system

at_key

at_pos

asbool

defined

delete_key

exists_key

hash

item

list

num

asstr

obj_asbool

obj_asdef

obj_asnum

obj_asstr

obj_at_key

obj_at_pos

obj_delete_key

obj_exists_key

obj_getbool

obj_getdef

obj_getnum

obj_getstr

Operations on hashes

hash_keys

hash_kv

hash_pairs

hash_values

varhash_clear

varhash_contains_key

varhash_delete_key

varhash_dup

varhash_getindex

varhash_new

varhash_setindex

Operations on activation frames

callframe

frame_caller

frame_file

frame_hint

frame_line

Sequence control operations

callnext($capture)

cgoto($name,$if)

Go to label named $name (must be a literal string) if $if is true.

control($type,$target,$unused,$name,$payload)

The most general interface to the exception generator.

cotake($cont)

Passes control to a coroutine identified by the continuation frame $cont. When said coroutine uses take, the new continuation frame is stored in the $*nextframe lexical and the value passed is returned.

die($exception)

Throws a basic non-resumable exception. $exception may be a raw string and it will DWIM.

do_require($module)

goto($label)

label($name)

label_table

methodcall($name,$sig,*@args)

Calls method $name on the first argument. The interpretation of the rest of the arguments is controlled by $sig; for each argument, there is a token in $sig consisting of a length and a sequence of characters. A zero-length sequence represents an ordinary positional, a string like ":name" represents a named parameter, and "flatcap" represents a | parameter. Note that in the last case, the argument should have type obj.

ncgoto($to,$if)

See cgoto.

prog(*@arguments)

Evaluates arguments in sequence and returns the result of the last one.

return($value)

Low-level return from a body; does NOT use the control exception mechanism. Probably best regarded as a backend-internal operator.

startgather($sub)

Creates a new coroutine to invoke $sub without arguments, and returns the initial continuation frame.

subcall($sig,*@args)

Identical to methodcall, except that the method name is considered forced to INVOKE.

take($thing)

Passes $thing to the coroutine which caused the current coroutine to be invoked. When this coroutine is restarted, take returns the value unchanged.

ternary($cond,$true,$false)

The basic branching operator.

whileloop($until,$once,$cond,$body)

The basic repetition operator. If $once is passed, the loop is treated as repeat..while. If $until is passed, the condition is inverted.

Operations supporting modules

from_json

from_jsync

to_json

to_jsync

Operations on booleans

not

bool

compare

Data control operations

cast

const($value)

The const op causes its argument to be evaluated once and stored in the constant table; it should only be used in cases where the identity will have the same effect, and backends are not required to honor it.

context_get

corelex

letn

letvar

null($type)

Produces a null value of type $type, suitable for initializing a variable or other similar usage.

scopedlex

set_status

sink($argument)

Evaluates and ignores $argument.

status_get

Object model operations

box($proto,$value)

Creates a boxing object of a given type. $proto may be a CORE:: name.

class_ref("mo",$corename) | class_ref("mo",$unit,$xix,$name)

Returns the raw stab for a class, by CORE:: name or xref node fields.

default_new(obj $proto, varhash $args)

Implements Mu.new.

getslot($name,$type,$object)

Fetches a named slot from an object. $type must be used consistantly.

how($obj)

Returns the ClassHOW or similar instance for an object.

instrole(fvarlist $parcel)

Instantiates a parameterized role (first argument) with arguments (rest).

llhow_name(stab $stb)

obj_does(obj $obj, stab $role)

obj_isa(obj $obj, stab $super)

obj_is_defined(obj $obj)

Checks REPR-level definedness, not .defined.

obj_llhow(obj $obj)

obj_newblank(stab $stab)

Fastest way to create an object. Does not set up variables for attributes.

obj_typename(obj $obj)

obj_what(obj $obj)

role_apply(stab $base, stab $role)

Implements the but operator for type objects.

setbox(obj $obj, ::T $value)

Mutates a boxed value in place. Use carefully!

setslot($name, obj $obj, ::T $value)

Binds a slot, possibly to a native value.

stab_privatemethod(stab $stab, str $name)

Obtains a reference to the Sub implementing a private method.

stab_what(stab $stab)

unbox($typename, obj $obj)

CLR interface operations

rawcall

rawscall

File format

NAM unit files are encoded in JSON, using only numbers, strings, and sequences; mappings and boolean values are excluded. It is helpful to consider a number of "node types" for describing the format of the sequences. Most node types reflect a sequence with a fixed number of children with fixed interpretations. No names are used; all access is by index.

A file contains two JSON objects. The first one is of the "File root" type; the second is an array of the temporary parts of meta-objects. Meta-objects with no temporary object will be null, or possibly omitted if at the end. Currently only subs use the temporary segment.

File root

Name            Type    Description
mainline_ref    Xref    Xref to mainline subroutine
name            string  Unit's unique name
log             ...     Mostly unused vestige of last stash system
setting         string  Name of setting unit or null
bottom_ref      Xref    Xref to sub containing {YOU_ARE_HERE}, or null
filename        string  Filename of source code or null
modtime         number  Seconds since 1970-01-01
xref            Xref[]  Resolves refs from other units
tdeps           TDep[]  Holds dependency data for recompilation
stash_root      StNode  Trie holding classes and global variables

xref entries cannot be reordered as they are referenced by index. Filename and modification time are used for checking recompilation necessity; tdeps ("transitive dependency") are used to check for recursive recompilation with minimal file reading. Filename is also used to provide $?FILE. Each xref entry is either null, a Subroutine, or a Packageoid.

Cross-reference

Name            Type    Description
unit            string  Names unit of origin
index           number  Indexes into unit's xref array
name            string  Descriptive name for debugging

Cross-reference (xref) nodes allow object references to cross unit boundaries without complicating serialization.

Transitive dependency node

Name            Type    Descripton
unitname        string  Names unit that is depended on
filename        string  Absolute filename of source code
modtime         number  Modification time in POSIX seconds

Stash node

This is a sequence of tuples; each such tuple has one of the forms [ name, "var", Xref, ChildNode ] or [ name, "graft", path ].

Method node

Name            Type    Description
name            string  Method name without ! decorator
kind            string  [1]
var             string  Variable for implementing sub in param role
body            Xref    Reference to implementing sub

[1] Allowable kinds are "normal", "private", and "sub".

Attribute node

Name            Type    Description
name            string  Attribute name without sigil or twigil
public          number  Nonzero if attribute should be easy to inspect
ivar            string  Sub name of BUILD phaser for param roles
ibody           Xref    Reference to BUILD phaser

Subroutine

Name            Type    Description
typecode        string  Always "sub"
name            string  Sub's name for backtraces
outer_xref      Xref    OUTER:: sub, may be in a setting unit
flags           number  [1]
children        num[]   Supports tree traversals
class           string  &?BLOCK.WHAT; "Sub" or "Regex"
ltm             LtmNode Only for regexes; stores declarative prefix
exports         str[][] List of global names
signature       Param[] May be null in exotic cases
lexicals        Lex[]   Come in multiple forms[6]

Temporary portion:

Name            Type    Description
xref            Xref    For documentation only
param_role_hack ...     [2]
augment_hack    ...     [3]
hint_hack       ...     [4]
is_phaser       number  [5]
body_of         Xref    Only valid in immediate block of class {} et al
in_class        Xref    Innermost enclosing body_of
cur_pkg         str[]   OUR:: as a list of names
lexicals        Lex[]   Come in multiple forms[6]
nam             ...     See description of opcodes earlier

[1] The following flags are used:

1   RUN_ONCE        Sub does not need pad cloning
2   SPAD_EXISTS     Sub needs a static pad
4   GATHER_HACK     Assume a "take EMPTY" at end
8   STRONG_USED     Not dead code even if unreferenced
16  RETURNABLE      Add a return exception handler
32  AUGMENTING      Is an augment{} block

[2] Xref to role object if this is a role{} block with parameters

[3] Sequence; first item is a ref to the target packageoid, subsequent items are Method descriptors.

[4] Sequence of [Xref, string] identifying a specific "hint" lexical in a specific sub. This lexical is bound to the return value of the current sub's code; will always be seen with a PREINIT phaser.

[5] If non-null, registers the current sub for a phaser queue.

0   INIT    Before global mainline
1   END     Not implemented
2   PREINIT Before all mainlines

[6] Either the temporary copy will be null, or the primary copy will have no items, depending on whether this sub needs to have its lexicals inspected by the compiler.

Lexical definition

These come in several flavors, but all share the same first two fields, which are used to find the correct lexical and identify its format.

Name            Type    Description
name            string  "$?FOO" or similar
typecode        string  Always 'hint'

This type is used for lexically scoped constants. They cannot be rebound by the scopedlex or corelex operations, but are automatically bound by the handling of hint_hack subs.

Name            Type    Description
name            string  "OUTER" or similar
typecode        string  Always 'label'

This type marks labels. Labels are cloned like subs on entry, and refer to objects which encapsulate a name and a frame reference.

Name            Type    Description
name            string  "&infix:<+>" or similar
typecode        string  Always 'dispatch'

This type is used for dispatch subs. Dispatch subs are created on clone and encapsulate some number of multi candidates, specifically all lexically-visible unshadowed subs with names like the dispatch followed by ":(" and any extra stuff.

Name            Type    Description
name            string  "$foo"
typecode        string  Always 'simple'
flags           number  4=NOINIT, 2=LIST, 1=HASH

These are used for run of the mill my-variables. NOINIT is required for variables that are initialized by signature binding.

Name            Type    Description
name            string  "$foo"
typecode        string  Always 'alias'
to              string  "anon_21934"

These are used for state variables, which need storage in an outer sub, but should only be accessible under the declared name in an inner one.

Name            Type    Description
name            string  "Regex"
typecode        string  Always 'stash'
path...         string  "GLOBAL"
path...         string  "STD"
path...         string  "Regex"

These are used to lexically name packageoids. All packageoids have a stash name; my-scoped packageoids get gensym names. The list of names is stored inline.

Name            Type    Description
name            string  "$ALL"
typecode        string  Always 'common'
path...         string  "GLOBAL"
path...         string  "STD"
path...         string  "$ALL"

These are used for our-scoped variables. As an optimization, direct references like $STD::ALL generate a gensym-named common lexical.

Name            Type    Description
name            string  "&say"
typecode        string  Always 'sub'
[Xref stored inline here]

These are used for subs, and must be in correspondence with the "zyg" list.

Signature parameter

Name            Type    Description
name            string  For binding error messages
flags           number  [1]
slot            string  Name of lexical to accept value
names           str[]   All legal named-parameter names
default         Xref    Sub to call if HAS_DEFAULT; must be child of this

[1] Flag values are as follows.

1   SLURPY      *@foo or *%foo (check HASH)
2   SLURPYCAP   |$foo
4   RWTRANS     \$foo
8   FULL_PARCEL \|$foo
16  OPTIONAL    $foo?
32  POSITIONAL  $foo, not :$foo
64  READONLY    $foo, not $foo is rw
128 LIST        @foo
256 HASH        %foo

Packageoid

Name            Type    Description
typecode        string  A definition keyword or "parametricrole"
name            string  The object's debug name
exports         str[][] List of global names to which object is bound
(The following are only found in class, grammar, role, parametricrole)
attributes      attr[]  Attributes local to the class
methods         methd[] Methods local to the class
superclasses    Xref[]  Direct superclasses of the class
(The following is only found in class, grammar)
linear_mro      Xref[]  All superclasses in C3 order