# Overview 

IR: Intermediate representation: contains all facts of program derived by compiler. May have one or a sequence of IRs. 

IR can be:
- tree
- graph
- linear
- SSA

IR augmented with symbol tables, can be considered part of the IR.

3 categories:
- graphical IR
- linear IR, similar to pseudo code for abstract machine
- Hybrid IR. eg linear instructions, graphical control flow.

Abstraction level: IR near source, where one IR instruction similar to ASM, or low-level, several IR ops nedded, to form one ASM op.

# Graphical IRs

## Trees

parse tree one form of graphical IR. Represent source code, much larger than original text: one node for each grammary symbol.  

AST another form, near source-level. Keep most infos of parse treee.  

Directed Acyclic Graph (DAG): Identical subtrees can be reused, more compact than AST.  

We can have a lower-level trees closer to machine: represent constant, registers, memory, dereferencing.  

Trees good to represent source code, less ufesul to represent other properties because of their rigidity.  

## Graphs

Control-flow Graph (CFG): units are basic blocks: maximal length sequence of branch-free code.  
Model flows of control between basic blocks. $G = (N,E)$ with $n \in N$ basic block, and $e=(n_i,n_j) \in N)$ a possible transfer of control from block $n_i$ to $n_j$.  
Suppose every CFG has unique entry $n_0$ and exit $n_f$. If multiple entries/exists, can use extra edges to handle it.  
CFG usually used with another IR.  
Node can be shorted than basic block, eg a statement, it has more blocks, longer to traverse.  
CFGs needed for many tasks: IR optimizations, instruction scheduling, register allocation.  

Data-dependence graph: flow of values, point of creation to points of use. A node is an operation (eg a statement), and an edge is the flow of a single value from definition to use.  
Edges represent contraints on sequencing.  
Usually used with another IR. They are used for instructions scheduling.  

Call graph: represent transfer of control between procedures. Each node is a procedure, and an edge is a call.  
Construction difficult because of: separate compilation, function pointers, dyanmic dispatch.

# Linear IR

Ressemble ASM for abstract machine.  
Control flow usually done with conditional branches and jumps.  
Basic blocks in linear IR, they end at branches / jumps.  

Kind of linear IR:
- 1-address code: stack machine, compact code
- 2-address code: 2 operands, destructive operations
- 3-address code: 3 operands, 2 inputs and 1 output. Popular in RISC

stack machine and 3-address quite popular.

## Stack-Machine

stack operands. instructions take operands from stack and push result to stack.  
Easy to create and execute.  

## 3-address code

most operands have 2 inputs and 1 output. Usuallt still quite compact. Freedom to reuse name and values. No destructive operations.  
Othen 3-address ops are low level

### Representing Linear Codes

3-adress codes inplemented as a set of quadruple.  
Block of quadruple can be done with linked-lists, or short-arrays for each basic block.

### Building a Control-Flow Fraph from a Linear Code

do 1CFG for every procedure.  
Find begin/end of each basic bloc.

# Mapping values to names

Need to assign names to values computed during execution. Need names for many, if not all,intermediate results. The chosen name determines a lot for futures optis.

## Naming Temporary Values

## Static Single-Assignment Form

Can encode informations about flow of control and flow of data values. Each name is defined only once. Each use of the name on a folowing operation give information about where value originated.  
To write SSA program with control-flow, $\phi$-functions are needed, ar points where control-flow path meets.  
SSA constraints:
1. Each definition has a different name
2. Each use refers to a single definition

To transform an IR into SSA, inserts $\phi$-functions and change names.  
They may be 1+ $\phi$-functions on top of basic block, but no instruction before. Compiler can reorder them.  
SSA used for optimizations.

## Memory Models

The compiler must also decide where values will reside: register or memory.

2 models:
- register-to-register: keep values in registers, ignoring target machine limits.
- memory-to-memory: all values kept in memory, move from register before use.

reg2reg: use more register than available, reg allocator must map some to physical, and for others do spilling, requires many load store.  
mem2mem: use fewer registers than available, reg allocactor looks for mem values that can be kept in registers, removing load and stores.  

RISC compilers usually use reg2reg, because closer to ISA, that doesn't have many mem2mem instructions. And the fact that values in IR stay in registers means it can, with mem2mem this needs to be computed.

# Symbol Tables

Compiler need to record many informations (eg: variable name, types, functions, struct fields, etc). It may record it in IR.  
Another approach is to use a central repository: symbol table, part of the IR.  
Efficiency of access critical.

## Hash Tables

Constant-time access, store infos for name $n$ in entry $h(n)$.  
Hash table can be used for sparse graphs. An entry $xy$ gives info about edge $(x,y)$.

## Building a Symbol Table

interface:
- Lookup(name) returns stored record
- Insert(name, record) store new record.

In processing syntax, compiler can build attributes for each var / functions, and store info in table. It can also detect multiple declariation / use on undeclared variable / mis-use.

## Handling Nested Scopes

When referencing a name, we need to find the one for the right scope and lifetime. This is done with a scoped symbol table.  
Name resolution: map each variable reference to its declaration.  

Create a new symbol table when entering a new scope. Insert operate on current symbol table. Lookup search on current scope, then goes downard until most outer scope.  
This need 2 more operations: InitializeScope, FinalizeScope.

## The Many Uses for Symbol Tables

Other uses than one central symbols table.  

### Structure table

store fields, methods and types for a struct or class type definition.  
Implems:
- one table per struct def: cleanest idea, more place.  
- selector table: 1 table for field names of all classes. Need to add class id to avoid clash names.
- unified table: store field names in the principal symbols table

First one simplest.

### Linked table

For name resolution in OOP, might inherit fields for superclass.  
Name resolution in method: first look at scoped symbol tables for the method body, then look for fields in the class hierarchy.