PACT is based around the concept of tree transformations. The High Level Language (HLL) handles parsing source text however it wants to and builds as AST tree for PACT to handle. That tree is converted into several intermediate forms before being turned into bytecode, PIR, or executed.
Some level of organization is going to be needed for these. This document makes reference to PAST and POST as a starting point. For those unfamiliar with PCT, PAST is Parrot Abstract Syntax Tree which is intended to be generated by a HLL and POST is Parrot Opcode Syntax Tree which is intended to be a "close to metal" representation. Notably, POST isn't a syntax tree at all so that name isn't very good.
There are four layers of PACT nodes.
- Base - Contains common functionally to all other layers
- AST - Syntax tree
- CFG - Control flow graphs
- Bytecode - Direct representation of bytecode
Most HLLs will generate AST trees and let PACT handle the rest. More complex languages may add additional phases to add optimizations or extensions. Some "HLLs" may target CFGs instead to act as more of a system-level language.
The AST layer will support various layers of abstraction from "loop" to "opcode". This may be supported by a single compiler stage, or separated into sub-packages for various layers of abstraction. It needs to be highly flexable about its input, allowing embedding custom node types and perhaps raw low-level information.
The CFG layer is an in-between layer, containing concepts from both the AST and bytecode layers but distinct from both. It has a more strict structure than the AST: a compilation unit contains subs that point to a start block. Blocks contain opcodes and point to other blocks. Unlike the bytecode layer, it may still contain abstract concepts like variables (register allocation occurs within this layer). It should reuse classes from the other layers where appropriate.
The bytecode layer supports the linear representation of code needed for output generation. It contains a very rigid hierarchy that matches the PBC format. It does use some more abstract concepts like labels. It may also be able to automatically generate constant tables and other meta-data where appropriate.
Any concept used at multiple layers of PACT should have a single common representation. This may be subclasses at different layers, but we should implement each idea once. In addition to those described below, candidates for this section are:
- Constants (Int, Num, String)
- Symbol tables
- Scoping (with Block, Sub, etc subclasses later)
- Basic Blocks (a sequence of things to execute)
Base Node Type
All nodes in a PACT tree are expected to inherit from a single base class. (Possible exception: allowing Integer/String/Float to stand for the appropriate constant. Perhaps have a compiler stage that takes any non PACT::Node PMC and wraps it in the appropriate constant class.)
This allows us to have a consistent handling of some things in all PACT objects. Things this handles include:
- type information (VINSP, class if P)
- class information optional
- At very low level, all ops will be V
- How to handle ops that have multiple return types?
- source location (file/pos)
This level contains high level concepts like "for loops", "exception handlers", and "lexical variables". It is intended to be as easy as possible for HLLs to generate. The conversion from PAST to POST should contain the most amount of "magic" and features.
The heavy lifter of PAST. Op means "node that does something with its children".
DESIGN DECISION: Do we want to continue to distinguish ops by a string type? This actually is fairly easy to dispatch on, so isn't too bad. It does have the advantage of being extremely easy to extend, assuming we design the compiler correctly. Only have to provide a type string to function mapping instead of adding new classes.
A sequential series of PAST::Ops, whose return value (if non-void) is the return value of its last child.
Possibly include result() function from PCT to select which child it uses the return value of.
An AST block represents a lexical scope. Generally speaking, a Block eventually becomes a Parrot sub. Unnamed blocks are generally inlined. Named blocks only have copies inlined.
This level is structured along the lines of the flow of the program. Each sub is represented by a graph of simple blocks that describe the execution flow of the program.
- Op - Parrot opcode, no return value (use variables)
- Variable - A named register location (will be assigned a number by compiler)
- Block - A sequence of Ops, then a link to the next Block or Condition
- Condition - A branching point in the control flow.
- Sub - Contains name, parameters, and a pointer to the starting block
Research Question: How to easily represent exception handlers in CFGs. Possibly an exception handler pointer to another block? Could also attach it to the Sub (for one that covers the entire sub).
Blocks and Conditions
Blocks in CFGs contain a sequence of instructions that are always executed in order. Jumps and branches are represented by links to other Blocks and Conditions at the end of a Block. A Condition contains a test, generally a comparison, and two blocks: one for true and another for false.
This level is exactly a 1:1 mapping of nodes to Parrot opcodes. The focus is completely on simplicity of code generation. Bytecode structures should never contain deep trees. Subs contain blocks, block contain ops, ops contain constants or registers.
- Op - Parrot opcode, no return value (use registers)
- Register - INSP register
- Block - Sequence of Ops, no return value
- Sub - Parrot sub, contains a block
Ops such as goto refer to labels rather than raw instruction counts. No complex structures such as loops or lexical variables exist. They should all be de-sugared to registers, lookup opcodes, conditionals, and labels.
The PACT::Bytecode to PBC compiler will handle simple tasks like collecting up constants for the constants table and basic register allocation. However these will likely be very simple implementations, with preference for more complex algorithms to be used at the CFG level.