Browse files

Death to POST, long live CFGs

Most of the operations currently done on POST can be sanely handled by
either a higher-level AST or on the CFGs.  I've also become more
convinced about the need of CFGs for reasonable optimization and useful
algorithms (like advanced register allocation).

So this commit removes the idea of an opcode tree in favor of control
flow graphs.  It may be that the AST layer may gain specialty stages or
nodes to deal with more low-level operations, but I no longer see a
need to deal with it as a completely separate layer.
  • Loading branch information...
1 parent 450b7b3 commit ee354042c2fef594335e7b109d2f542bf7da7a7e @Benabik Benabik committed Apr 18, 2012
Showing with 75 additions and 67 deletions.
  1. +13 −6 TODO.mkd
  2. +62 −61 docs/nodes.mkd
@@ -29,11 +29,18 @@ useful. The implementation plan looks a little like this:
* Make that able to produce bytecode and PIR
* Register allocation
* Stage structure
-* Build assembly language on top of that
+* Build simple assembly language on top of that
* assembler and disassembler
+ * Primarily for machine consumption and production
+ * PASM replacement
* Add control-flow graph primitives (basic blocks linked by conditionals)
-* Build tree-like POST
-* Build PAST
+* Build complex assembly language on top of that
+ * PIR replacement
+ * More human useful features
+* Build AST that compiles to CFG
+* Build simple language on top of that
+ * Example / Squaak replacement
+ * Sanity check
Share as much across layers as possible. (We want only one
implementation/type for things like registers, variables, symbol
@@ -42,8 +49,8 @@ tables, location information, etc.)
Maintain as much type information as possible. If we start off with an
integer constant, remember that all the way down to the bytecode.
-After that, start building top-down. Add features to PAST and see if any
-additional POST/CFG/bytecode features are needed to support it.
+After that, start building top-down. Add features to the AST and see if
+any additional CFG/bytecode features are needed to support it.
## Languages
@@ -57,7 +64,7 @@ once a PACT assembly format stabilizes, it would be possible (and likely
encouraged) to create a version of the assembler in C to function as a
replacement for PIR/IMCC as the de facto standard for bootstrapping.
-An extended assembly language on top of the opcode tree format would be
+An extended assembly language on top of the control flow graphs would be
easier for human production and let them use the same standard shortcuts
the compiler tends to. Alternatively, this could be a viable target for a
system-level language like Winxed.
@@ -6,12 +6,11 @@ as AST tree for PACT to handle. That tree is converted into several
intermediate forms before being turned into bytecode, PIR, or executed.
Some level of organization is going to be needed for these. This document
-makes reference to PAST and POST, but that is not guaranteed to continue to
-be the names for those layers. For those unfamiliar with PCT, PAST is
-Parrot Abstract Syntax Tree which is intended to be generated by a HLL and
-POST is Parrot Opcode Syntax Tree which is intended to be a "close to
-metal" representation. Notably, POST isn't a syntax tree at all so that
-name isn't very good.
+makes reference to PAST and POST as a starting point. For those unfamiliar
+with PCT, PAST is Parrot Abstract Syntax Tree which is intended to be
+generated by a HLL and POST is Parrot Opcode Syntax Tree which is intended
+to be a "close to metal" representation. Notably, POST isn't a syntax tree
+at all so that name isn't very good.
@@ -20,18 +19,34 @@ name isn't very good.
There are four layers of PACT nodes.
* *Base* - Contains common functionally to all other layers
-* *PAST* - High level syntax trees
-* *POST* - Low level opcode trees
-* *Bytecode* - Low level opcode blocks
+* *AST* - Syntax tree
+* *CFG* - Control flow graphs
+* *Bytecode* - Direct representation of bytecode
-Most HLLs will generate PAST trees and let PACT handle the rest. More
+Most HLLs will generate AST trees and let PACT handle the rest. More
complex languages may add additional phases to add optimizations or
-extensions. Some "HLLs" may target POST instead to act as more of a
+extensions. Some "HLLs" may target CFGs instead to act as more of a
system-level language.
-The bytecode layer supports control-flow graphs and the more linear
-representation needed for output generation. It very notably is not a tree
-structure, but is a rigid hierarchy.
+The AST layer will support various layers of abstraction from "loop" to
+"opcode". This may be supported by a single compiler stage, or separated
+into sub-packages for various layers of abstraction. It needs to be highly
+flexable about its input, allowing embedding custom node types and perhaps
+raw low-level information.
+The CFG layer is an in-between layer, containing concepts from both the AST
+and bytecode layers but distinct from both. It has a more strict structure
+than the AST: a compilation unit contains subs that point to a start block.
+Blocks contain opcodes and point to other blocks. Unlike the bytecode
+layer, it may still contain abstract concepts like variables (register
+allocation occurs within this layer). It should reuse classes from the
+other layers where appropriate.
+The bytecode layer supports the linear representation of code needed for
+output generation. It contains a very rigid hierarchy that matches the PBC
+format. It does use some more abstract concepts like labels. It may also
+be able to automatically generate constant tables and other meta-data where
## Common Nodes
@@ -47,6 +62,7 @@ for this section are:
* Basic Blocks (a sequence of things to execute)
* Variables
* Registers/Temporaries
+* Labels
### Base Node Type
@@ -68,7 +84,7 @@ objects. Things this handles include:
-## PAST
+## AST
This level contains high level concepts like "for loops", "exception
handlers", and "lexical variables". It is intended to be as easy as
@@ -99,69 +115,54 @@ the return value of.
### Block
-A PAST block represents a lexical scope. Generally speaking, a PAST::Block
-eventually becomes a Parrot sub.
+An AST block represents a lexical scope. Generally speaking, a Block
+eventually becomes a Parrot sub. Unnamed blocks are generally inlined.
+Named blocks only have copies inlined.
-Unnamed blocks are always valid targets for inlining, while named ones
-never are. (This should be trivial to check for: if the block is unnamed
-and doesn't declare lexicals, then inlining is trivial beyond handling
-shadowing properly.)
+## CFG
-## POST
+This level is structured along the lines of the flow of the program. Each
+sub is represented by a graph of simple blocks that describe the execution
+flow of the program.
-This level is close a 1:1 mapping from node to Parrot opcode. It is
-intended to be easy to generate code from while still being relatively
-simple to create "by hand". The tree structure is maintained to keep those
-generating it from having to worry about things like temporary variables in
-the simple cases.
+* Op - Parrot opcode, no return value (use variables)
+* Variable - A named register location (will be assigned a number by
+ compiler)
+* Block - A sequence of Ops, then a link to the next Block or Condition
+* Condition - A branching point in the control flow.
+* Sub - Contains name, parameters, and a pointer to the starting block
-The mapping is not quite 1:1 in that common idioms are abstracted away into
-single nodes. Although a method call may be `temp = find_method obj,
-"method"; obj.temp(args)`, this can be abstracted into a single "call
-method" node. Notably this is used to abstract away details of register
-allocation and calling conventions.
+*Research Question:* How to easily represent exception handlers in CFGs.
+Possibly an exception handler pointer to another block? Could also attach
+it to the Sub (for one that covers the entire sub).
-### Op
-The basic object of POST. Each Op represents a single Parrot opcode (or a
-small set of simple ones) and its children are its arguments.
-### Ops
-A sequence of several Op nodes. Non-void Ops return the value of its last
-child. If any other return value is needed, temporary variables should be
-### Label
-POST trees may contain labels, but the usual form will be directly
-referencing a Ops or Sub.
+### Blocks and Conditions
+Blocks in CFGs contain a sequence of instructions that are always executed
+in order. Jumps and branches are represented by links to other Blocks and
+Conditions at the end of a Block. A Condition contains a test, generally a
+comparison, and two blocks: one for true and another for false.
## Bytecode
This level is _exactly_ a 1:1 mapping of nodes to Parrot opcodes. The
-focus is completely on simplicity of code generation. There is no tree
-structure at this point. Node structure is based around basic blocks and
-control flow graphs.
+focus is completely on simplicity of code generation. Bytecode structures
+should _never_ contain deep trees. Subs contain blocks, block contain ops,
+ops contain constants or registers.
* Op - Parrot opcode, no return value (use registers)
* Register - INSP register
* Block - Sequence of Ops, no return value
* Sub - Parrot sub, contains a block
-* Conditional - choice between two blocks based on a register
-Ops such as goto may refer to blocks. It is at this level that information
-such as labels and registers are added. No complex structures such as loops or
-lexical variables exist. The should all be desugared to register-level
-variabes, lookup opcodes, conditionals, and blocks.
-This layer will also support a structure directly related to bytecode with
-basic blocks replaced with labels and gotos.
+Ops such as goto refer to labels rather than raw instruction counts. No
+complex structures such as loops or lexical variables exist. They should
+all be de-sugared to registers, lookup opcodes, conditionals, and labels.
-Bytecode structures should _never_ create deep trees. Subs contain blocks,
-block contain ops, ops contain constants or registers. The only exception is
-the link from the end of a block to the next block or conditional in CFG form.
+The PACT::Bytecode to PBC compiler will handle simple tasks like collecting
+up constants for the constants table and basic register allocation.
+However these will likely be very simple implementations, with preference
+for more complex algorithms to be used at the CFG level.

0 comments on commit ee35404

Please sign in to comment.