Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 175 lines (129 sloc) 6.496 kb
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
1 PACT Nodes
2 ==========
05f6f99 @Benabik Organization: Nodes
Benabik authored
3
4 PACT is based around the concept of tree transformations. The High Level
5 Language (HLL) handles parsing source text however it wants to and builds
509afea @Benabik Typo/whitespace fixes
Benabik authored
6 as AST tree for PACT to handle. That tree is converted into several
05f6f99 @Benabik Organization: Nodes
Benabik authored
7 intermediate forms before being turned into bytecode, PIR, or executed.
8
9 Some level of organization is going to be needed for these. This document
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
10 makes reference to PAST and POST as a starting point. For those unfamiliar
11 with PCT, PAST is Parrot Abstract Syntax Tree which is intended to be
12 generated by a HLL and POST is Parrot Opcode Syntax Tree which is intended
13 to be a "close to metal" representation. Notably, POST isn't a syntax tree
14 at all so that name isn't very good.
05f6f99 @Benabik Organization: Nodes
Benabik authored
15
16
17
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
18 Layers
19 ------
05f6f99 @Benabik Organization: Nodes
Benabik authored
20
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
21 There are four layers of PACT nodes.
05f6f99 @Benabik Organization: Nodes
Benabik authored
22
23 * *Base* - Contains common functionally to all other layers
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
24 * *AST* - Syntax tree
25 * *CFG* - Control flow graphs
26 * *Bytecode* - Direct representation of bytecode
05f6f99 @Benabik Organization: Nodes
Benabik authored
27
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
28 Most HLLs will generate AST trees and let PACT handle the rest. More
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
29 complex languages may add additional phases to add optimizations or
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
30 extensions. Some "HLLs" may target CFGs instead to act as more of a
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
31 system-level language.
32
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
33 The AST layer will support various layers of abstraction from "loop" to
34 "opcode". This may be supported by a single compiler stage, or separated
35 into sub-packages for various layers of abstraction. It needs to be highly
36 flexable about its input, allowing embedding custom node types and perhaps
37 raw low-level information.
38
39 The CFG layer is an in-between layer, containing concepts from both the AST
40 and bytecode layers but distinct from both. It has a more strict structure
41 than the AST: a compilation unit contains subs that point to a start block.
42 Blocks contain opcodes and point to other blocks. Unlike the bytecode
43 layer, it may still contain abstract concepts like variables (register
44 allocation occurs within this layer). It should reuse classes from the
45 other layers where appropriate.
46
47 The bytecode layer supports the linear representation of code needed for
48 output generation. It contains a very rigid hierarchy that matches the PBC
49 format. It does use some more abstract concepts like labels. It may also
50 be able to automatically generate constant tables and other meta-data where
51 appropriate.
05f6f99 @Benabik Organization: Nodes
Benabik authored
52
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
53 Common Nodes
54 ------------
05f6f99 @Benabik Organization: Nodes
Benabik authored
55
56 Any concept used at multiple layers of PACT should have a single common
57 representation. This may be subclasses at different layers, but we should
58 implement each idea once. In addition to those described below, candidates
59 for this section are:
60
61 * Constants (Int, Num, String)
62 * Symbol tables
63 * Scoping (with Block, Sub, etc subclasses later)
64 * Coersions
65 * Basic Blocks (a sequence of things to execute)
66 * Variables
67 * Registers/Temporaries
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
68 * Labels
05f6f99 @Benabik Organization: Nodes
Benabik authored
69
70 ### Base Node Type
71
72 All nodes in a PACT tree are expected to inherit from a single base class.
73 (Possible exception: allowing Integer/String/Float to stand for the
74 appropriate constant. Perhaps have a compiler stage that takes any non
75 PACT::Node PMC and wraps it in the appropriate constant class.)
76
77 This allows us to have a consistent handling of some things in all PACT
78 objects. Things this handles include:
79
80 * type information (VINSP, class if P)
81 * class information optional
82 * At very low level, all ops will be V
83 * How to handle ops that have multiple return types?
84 * children
85 * source location (file/pos)
86 * name
87
88
89
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
90 AST
91 ---
05f6f99 @Benabik Organization: Nodes
Benabik authored
92
93 This level contains high level concepts like "for loops", "exception
94 handlers", and "lexical variables". It is intended to be as easy as
95 possible for HLLs to generate. The conversion from PAST to POST should
96 contain the most amount of "magic" and features.
97
98 * Namespaces?
99 * Classes?
100
101 ### Op
102
aa08679 @cotto apostrophix
cotto authored
103 The heavy lifter of PAST. Op means "node that does something with its
05f6f99 @Benabik Organization: Nodes
Benabik authored
104 children".
105
106 *DESIGN DECISION*: Do we want to continue to distinguish ops by a string
107 type? This actually is fairly easy to dispatch on, so isn't too bad. It
108 does have the advantage of being extremely easy to extend, assuming we
109 design the compiler correctly. Only have to provide a type string to
110 function mapping instead of adding new classes.
111
112 ### Ops
113
114 A sequential series of PAST::Ops, whose return value (if non-void) is the
115 return value of its last child.
116
117 Possibly include result() function from PCT to select which child it uses
118 the return value of.
119
120 ### Block
121
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
122 An AST block represents a lexical scope. Generally speaking, a Block
123 eventually becomes a Parrot sub. Unnamed blocks are generally inlined.
124 Named blocks only have copies inlined.
05f6f99 @Benabik Organization: Nodes
Benabik authored
125
126
127
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
128 CFG
129 ---
05f6f99 @Benabik Organization: Nodes
Benabik authored
130
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
131 This level is structured along the lines of the flow of the program. Each
132 sub is represented by a graph of simple blocks that describe the execution
133 flow of the program.
05f6f99 @Benabik Organization: Nodes
Benabik authored
134
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
135 * Op - Parrot opcode, no return value (use variables)
136 * Variable - A named register location (will be assigned a number by
137 compiler)
138 * Block - A sequence of Ops, then a link to the next Block or Condition
139 * Condition - A branching point in the control flow.
140 * Sub - Contains name, parameters, and a pointer to the starting block
05f6f99 @Benabik Organization: Nodes
Benabik authored
141
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
142 *Research Question:* How to easily represent exception handlers in CFGs.
143 Possibly an exception handler pointer to another block? Could also attach
144 it to the Sub (for one that covers the entire sub).
05f6f99 @Benabik Organization: Nodes
Benabik authored
145
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
146 ### Blocks and Conditions
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
147
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
148 Blocks in CFGs contain a sequence of instructions that are always executed
149 in order. Jumps and branches are represented by links to other Blocks and
150 Conditions at the end of a Block. A Condition contains a test, generally a
151 comparison, and two blocks: one for true and another for false.
05f6f99 @Benabik Organization: Nodes
Benabik authored
152
153
c8a35b0 @Benabik Use more obvious header formatting
Benabik authored
154 Bytecode
155 --------
05f6f99 @Benabik Organization: Nodes
Benabik authored
156
157 This level is _exactly_ a 1:1 mapping of nodes to Parrot opcodes. The
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
158 focus is completely on simplicity of code generation. Bytecode structures
159 should _never_ contain deep trees. Subs contain blocks, block contain ops,
160 ops contain constants or registers.
05f6f99 @Benabik Organization: Nodes
Benabik authored
161
162 * Op - Parrot opcode, no return value (use registers)
163 * Register - INSP register
164 * Block - Sequence of Ops, no return value
165 * Sub - Parrot sub, contains a block
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
166
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
167 Ops such as goto refer to labels rather than raw instruction counts. No
168 complex structures such as loops or lexical variables exist. They should
169 all be de-sugared to registers, lookup opcodes, conditionals, and labels.
761bdf6 @Benabik Remove equivocation about a layer below POST
Benabik authored
170
ee35404 @Benabik Death to POST, long live CFGs
Benabik authored
171 The PACT::Bytecode to PBC compiler will handle simple tasks like collecting
172 up constants for the constants table and basic register allocation.
173 However these will likely be very simple implementations, with preference
174 for more complex algorithms to be used at the CFG level.
Something went wrong with that request. Please try again.