PACT is based around the concept of tree transformations. The High Level Language (HLL) handles parsing source text however it wants to and builds
a PAST tree for PACT to handle. That tree is converted into several intermediate forms before being turned into bytecode, PIR, or executed.
Some level of organization is going to be needed for these. This document makes reference to PAST and POST, but that is not guaranteed to continue to be the names for those layers. For those unfamiliar with PCT, PAST is Parrot Abstract Syntax Tree which is intended to be generated by a HLL and POST is Parrot Opcode Syntax Tree which is intended to be a "close to metal" representation. Notably, POST isn't a syntax tree at all so that name isn't very good.
There are four layers of PACT nodes.
- Base - Contains common functionally to all other layers
- PAST - High level syntax trees
- POST - Low level opcode trees
- Bytecode - Low level opcode blocks
Most HLLs will generate PAST trees and let PACT handle the rest. More complex languages may add additional phases to add optimizations or extensions. Some "HLLs" may target POST instead to act as more of a system-level language.
The bytecode layer supports control-flow graphs and the more linear representation needed for output generation. It very notably is not a tree structure, but is a rigid hierarchy.
Any concept used at multiple layers of PACT should have a single common representation. This may be subclasses at different layers, but we should implement each idea once. In addition to those described below, candidates for this section are:
- Constants (Int, Num, String)
- Symbol tables
- Scoping (with Block, Sub, etc subclasses later)
- Basic Blocks (a sequence of things to execute)
Base Node Type
All nodes in a PACT tree are expected to inherit from a single base class. (Possible exception: allowing Integer/String/Float to stand for the appropriate constant. Perhaps have a compiler stage that takes any non PACT::Node PMC and wraps it in the appropriate constant class.)
This allows us to have a consistent handling of some things in all PACT objects. Things this handles include:
- type information (VINSP, class if P)
- class information optional
- At very low level, all ops will be V
- How to handle ops that have multiple return types?
- source location (file/pos)
This level contains high level concepts like "for loops", "exception handlers", and "lexical variables". It is intended to be as easy as possible for HLLs to generate. The conversion from PAST to POST should contain the most amount of "magic" and features.
The heavy lifter of PAST. Op means "node that does something with its children".
DESIGN DECISION: Do we want to continue to distinguish ops by a string type? This actually is fairly easy to dispatch on, so isn't too bad. It does have the advantage of being extremely easy to extend, assuming we design the compiler correctly. Only have to provide a type string to function mapping instead of adding new classes.
A sequential series of PAST::Ops, whose return value (if non-void) is the return value of its last child.
Possibly include result() function from PCT to select which child it uses the return value of.
A PAST block represents a lexical scope. Generally speaking, a PAST::Block eventually becomes a Parrot sub.
Unnamed blocks are always valid targets for inlining, while named ones never are. (This should be trivial to check for: if the block is unnamed and doesn't declare lexicals, then inlining is trivial beyond handling shadowing properly.)
This level is close a 1:1 mapping from node to Parrot opcode. It is intended to be easy to generate code from while still being relatively simple to create "by hand". The tree structure is maintained to keep those generating it from having to worry about things like temporary variables in the simple cases.
The mapping is not quite 1:1 in that common idioms are abstracted away into
single nodes. Although a method call may be
temp = find_method obj,
"method"; obj.temp(args), this can be abstracted into a single "call
method" node. Notably this is used to abstract away details of register
allocation and calling conventions.
The basic object of POST. Each Op represents a single Parrot opcode (or a small set of simple ones) and its children are its arguments.
A sequence of several Op nodes. Non-void Ops return the value of its last child. If any other return value is needed, temporary variables should be used.
POST trees may contain labels, but the usual form will be directly referencing a Ops or Sub.
This level is exactly a 1:1 mapping of nodes to Parrot opcodes. The focus is completely on simplicity of code generation. There is no tree structure at this point. Node structure is based around basic blocks and control flow graphs.
- Op - Parrot opcode, no return value (use registers)
- Register - INSP register
- Block - Sequence of Ops, no return value
- Sub - Parrot sub, contains a block
- Conditional - choice between two blocks based on a register
Ops such as goto may refer to blocks. It is at this level that information such as labels and registers are added. No complex structures such as loops or lexical variables exist. The should all be desugared to register-level variabes, lookup opcodes, conditionals, and blocks.
This layer will also support a structure directly related to bytecode with basic blocks replaced with labels and gotos.
Bytecode structures should never create deep trees. Subs contain blocks, block contain ops, ops contain constants or registers. The only exception is the link from the end of a block to the next block or conditional in CFG form.