-
Notifications
You must be signed in to change notification settings - Fork 0
Intermediate Representation Lowering
While an AST provides a great view of the structure and semantics of our program, it's not very conducive to generating good machine code. To remedy this an intermediate representation is used to provide a middle ground abstraction where further analysis and optimization can be performed before generating assembly. There are many different types of IR, but the one used in the compiler is a hybrid of a three address code representation and a control flow graph.
For reference here is an example output (using the -emit-ir flag) when compiling a factorial program
fn %factorial(i32 $n) {
@0: (precedes @1 @2)
ptr $0 = allocate 32
store i32 $n at ptr $0
i32 $1 = load ptr $0
i32 $2 = cmp eq i32 $1, i32 0
br to @2 if i32 $2 = 0
@1: (precedes @2)
ret i32 1
@2: (precedes)
i32 $3 = load ptr $0
i32 $5 = load ptr $0
i32 $6 = sub i32 $5, i32 1
i32 $4 = call %factorial, i32 $6
i32 $7 = mul i32 $3, i32 $4
ret i32 $7
}
At this level functions and function calls are preserved. Each Function contains a list of basic blocks and a set of parameters which it receives and return type. The call instruction can be thought of the same way a function call in C is, parameters are passed to the called function and control is transferred to the first basic block. The return statement similarly passes the return value from the callee back to the caller, resuming from the call location.
A basic block describes a section of straight line code. Each basic block has a label which can be the target of break and jump instructions. Control can only enter a basic block from the top and only leave through usage of a break br, jump jmp, or return ret instruction. Every block has a list of blocks which it's preceded and succeeded by which, while not used currently, is useful for optimization.
There are currently 11 types of instructions defined in ./src/ir/instruction.hpp. They are meant to loosely mimic the structure of a RISC ISA while still allowing flexibility in their usage. Instructions operate on a type of virtual register with most instructions including a destination register to contain the output of their operation. Instructions may also have an "immediate" value as an operand which represents a constant or literal.
When lowering C expressions into the IR, many temporaries may be created to hold intermediate values of an expression. An expression such as d = (a + b) * c + 4 would be lowered to this series of instructions.
ptr $d = allocate 32
i32 $0 = load ptr $a
i32 $1 = load ptr $b
i32 $2 = add i32 $0, i32 $1
i32 $3 = load ptr $c
i32 $4 = mul i32 $2, i32 $3
i32 $5 = add i32 $4, i32 4
store i32 $5 at ptr $d
Control structures on the other hand are lowered by creating new basic blocks and jumps. For example an if statement is lowered like so:
int a = 3;
if (a == 3) {
puts('A');
} else {
puts('B');
}
@0: (precedes @1 @2)
ptr $a = allocate 32
store i32 3 at ptr $a
i32 $0 = load ptr $a
i32 $1 = cmp eq i32 $0, i32 3
br to @2 if i32 $1 = 0
@1: (precedes @3)
i32 $2 = call %puts, i8 65
jmp to @3
@2: (precedes @3)
i32 $3 = call %puts, i8 66
@3: (precedes)