## **Intermediate Code Generation**

Ronghui Gu Spring 2019

Columbia University

<sup>\*</sup> Course website: https://www.cs.columbia.edu/ rgu/courses/4115/spring2019

<sup>\*\*</sup> These slides are borrowed from Prof. Edwards.

#### **Intermediate Code Generation**



#### **Intermediate Code Generation**

#### **Intermediate Representation (IR):**

- An abstract machine language
- · Not specific to any particular machine
- Independent of source language

#### IR code generation is not necessary:

- Semantic analysis phase can generate assembly code directly.
- Hinders portability and modularity.

# **Intermediate Representation**

Suppose we wish to build compilers for n source languages and m target machines.

**Case 1: no IR.** Need  $n \times m$  compilers.



## **Intermediate Representation**

Suppose we wish to build compilers for n source languages and m target machines.

**Case 2: IR present.** Need just n front-ends and m back ends.



## **IR properties**

- Must be convenient for semantic analysis phase to produce.
- Must be convenient to translate into real assembly code for all desired target machines.

**Intermediate** 

**Representations/Formats** 

## Stack-Based IR: Java Bytecode

```
int gcd(int a, int b) {
  while (a != b) {
    if (a > b)
        a -= b;
    else
        b -= a;
  }
  return a;
}
```

```
Method int gcd(int, int)
   0 goto 19
   3 iload 1 // Push a
   4 iload 2 // Push b
   5 if icmple 15 // if a \leq b goto 15
8 iload_1 // Push a
9 iload_2 // Push b
10 isub // a - b
  11 istore_1 // Store new a
  12 \text{ goto } \overline{19}
  15 iload_2 // Push b
  16 iload_1 // Push a
  17 isub // b - a
  18 istore 2 // Store new b
  19 iload 1 // Push a
  20 iload 2 // Push b
  21 if icmpne 3 // if a != b goto 3
  24 iload 1 // Push a
  25 ireturn // Return a
```















#### **Stack-Based IRs**

#### **Advantages:**

- · Trivial translation of expressions
- Trivial interpreters
- No problems with exhausting registers
- Often compact

#### **Disadvantages:**

- Semantic gap between stack operations and modern register machines
- · Hard to see what communicates with what
- · Difficult representation for optimization

## **Register-Based IR: Mach SUIF**

```
int gcd(int a, int b)
{
  while (a != b) {
    if (a > b)
        a -= b;
    else
        b -= a;
  }
  return a;
}
```

```
gcd:
gcd. gcdTmp0:
 sne $vr1.s32 <- gcd.a,gcd.b
 sea $vr0.s32 <- $vr1.s32.0
 btrue $vr0.s32,gcd. gcdTmp1 // if !(a != b) goto Tmp1
 sl $vr3.s32 <- gcd.b,gcd.a
 seq $vr2.s32 <- $vr3.s32,0
 btrue $vr2.s32,gcd. gcdTmp4 // if !(a < b) goto Tmp4
 mrk 2, 4 // Line number 4
 sub $vr4.s32 <- gcd.a,gcd.b
 mov gcd. gcdTmp2 <- $vr4.s32
 mov gcd.\overline{a} < -gcd. gcdTmp2 // a = a - b
 jmp gcd. gcdTmp5
gcd. gcdTmp4:
 mrk = 2, 6
 sub $vr5.s32 <- gcd.b,gcd.a
 mov gcd. gcdTmp3 <- $vr5.s32
 mov gcd.\overline{b} < -gcd. gcdTmp3 // b = b - a
gcd. gcdTmp5:
 jmp gcd._gcdTmp0
gcd. gcdTmp1:
 mrk 2, 8
 ret gcd.a // Return a
```

# **Register-Based IRs**

### Most common type of IR

#### **Advantages:**

- Better representation for register machines
- Dataflow is usually clear

#### **Disadvantages:**

- · Slightly harder to synthesize from code
- · Less compact
- · More complicated to interpret

# Three-Address Code & Static Single Assignment

Most register-based IRs use **three-address code**: Arithmetic instructions have (up to) three operands: two sources and one destination.

SSA Form: each variable in an IR is assigned exactly once

## C code: Three-Address:

```
int gcd(int a, int b) WHILE: t = sne a, b
                               bz DONE, t
  while (a != b)
                               t = slt a. b
    if (a < b)
                               bz ELSE, t
      b = a;
                               b = sub b, a
    else
                              imp LOOP
                       ELSE: a = \sup a, b
     a = b;
                       LOOP: jmp WHILE
  return a;
                        DONE:
                               ret a
```

#### SSA:

```
WHILE: t1 = sne a1, b1
bz DONE, t1
t2 = slt a1, b1
bz ELSE, t2
b1 = sub b1, a1
jmp LOOP
ELSE: a1 = sub a1, b1
LOOP: jmp WHILE
DONE: ret a1
```

# Three-Address Code

#### **Address**

#### What is an "Address" in Three-Address Code?

- Name: (from the source program) e.g., x, y, z
- Constant: (with explicit primitive type) e.g., 1, 2, 'a'
- Compiler-generated temporary: ("register") e.g., t1, t2, t3

#### **Instructions of Three-Address Code**

- x = op y, z: where op is a binary operation
- x = op y: where op is a unary operation
- x = y: copy operation
- jmp L: unconditional jump to label L
- bz L, x: jump to L if x is zero
- bnz L, x: jump to L if x is not zero
- param x, call L, y, return z: function calls

# **Three-Address Code (TAC) Generation**

Goal: take statements (AST) and produce a sequence of TAC.

```
Example:

a := b + c * d;

TAC:

t1 = mul c, d

t2 = add b, t1

a = t1
```

Translate expressions and statements

**Translating Expressions** 

# **Example**



# **Example**



t1 = mul c, d

## **Example**



t1 = mul c, d t2 = add b, t1

## Algorithm: Syntax-Directed Translation (SDT)

## For each expression **E**, we'll synthesize two attributes:

- E.addr: the name of the variable (often a temporary variable)
- E.code: the IR instructions generated from E

SDT: each semantic rule corresponds to actions computing two attributes with the following auxiliary functions:

- Call NewTemp to create a new temporary variable

```
CFG rule: E_0 \rightarrow id
```

#### **Actions:**

```
E_0.addr := id
```

 $E_0$ .code := "" empty string

We do not consider scopes here.

```
Example: E_0 = ID("a")
```

 $E_0$ .addr := "a"

 $E_0$ .code := "" empty string

```
CFG rule: E_0 \rightarrow E_1 + E_2
Actions:
    E_0.addr := NewTemp()
    E_0.code :=E_1.code || E_2.code ||
         Gen(E_0.addr, "=", "add", E_1.addr, ",", E_2.addr)
Example: a + b
    E_0 = PLUS(E_1, E_2) E_1 = ID("a") E_2 = ID("b")
    E_1.addr := "a" E_1.code := ""
    E_2.addr := "b" E_2.code := ""
    E_0.addr := "t1"
    E_0.code := "t1 = add a. b"
```

```
Example: \mathbf{b} + \mathbf{c} * \mathbf{d}
E_0 = \text{PLUS} (E_1, E_2) \quad E_1 = \text{ID}(\text{"b"})
E_2 = \text{MUL} (\text{ID}(\text{"c"}), \text{ID}(\text{"d"}))
E_0.\text{code} := E_1.\text{code} \mid\mid E_2.\text{code} \mid\mid
\text{Gen}(E_0.\text{addr, "=", "add", } E_1.\text{addr, ",", } E_2.\text{addr})
```

```
Example: b + c * d
E_0 = PLUS (E_1, E_2) \qquad E_1 = ID("b")
E_2 = MUL (ID("c"), ID("d"))
E_0.code := \underbrace{E_1.code}_{} || E_2.code ||
Gen(E_0.addr, "=", "add", E_1.addr, ",", E_2.addr)
```

```
Example: \mathbf{b} + \mathbf{c} * \mathbf{d}
E_0 = \text{PLUS} (E_1, E_2) \quad E_1 = \text{ID}(\text{"b"})
E_2 = \text{MUL} (\text{ID}(\text{"c"}), \text{ID}(\text{"d"}))
E_0.\text{code} := \text{""} \mid\mid E_2.\text{code} \mid\mid
\text{Gen}(E_0.\text{addr, "=", "add", } E_1.\text{addr, ",", } E_2.\text{addr})
E_1.\text{addr} = \text{"b"}
```

```
Example: b + c * d
E_0 = PLUS (E_1, E_2) \qquad E_1 = ID("b")
E_2 = MUL (ID("c"), ID("d"))
E_0.code := "" \mid | "t1 = mul c, d" \mid |
Gen(E_0.addr, "=", "add", E_1.addr, ",", E_2.addr)
E_1.addr = "b" \qquad E_2.addr = "t1"
```

```
Example: b + c * d
E_0 = PLUS (E_1, E_2) \qquad E_1 = ID("b")
E_2 = MUL (ID("c"), ID("d"))
E_0.code := "" \mid | "t1 = mul c, d" \mid |
Gen(NewTemp(), "=", "add", E_1.addr, ",", E_2.addr)
E_1.addr = "b" \qquad E_2.addr = "t1"
```

```
Example: b + c * d
E_0 = PLUS (E_1, E_2) \qquad E_1 = ID("b")
E_2 = MUL (ID("c"), ID("d"))
E_0.code := "" \mid | "t1 = mul c, d" \mid |
Gen("t2", "=", "add", E_1.addr, ", E_2.addr)
E_1.addr = "b" \qquad E_2.addr = "t1"
```

# **Syntax-Directed Translation (SDT)**

```
Example: b + c * d E_0 = \text{PLUS} \ (E_1, E_2) \qquad E_1 = \text{ID("b")} \\ E_2 = \text{MUL} \ (\text{ID("c")}, \text{ID("d")}) \\ E_0.\text{code} := "" \mid \mid \text{"t1} = \text{mul c, d"} \mid \mid \\ \text{Gen("t2", "=", "add", "b", ",", "t1")} \\ \end{cases}
```

# **Syntax-Directed Translation (SDT)**

```
Example: \mathbf{b} + \mathbf{c} * \mathbf{d}
E_0 = \text{PLUS} (E_1, E_2) \quad E_1 = \text{ID("b")}
E_2 = \text{MUL} (\text{ID("c")}, \text{ID("d")})
E_0.\text{code} := \text{""} \mid \mid \text{"t1} = \text{mul c, d"} \mid \mid
\text{"t2} = \text{add b, t1"}
```

**Translating Statements** 

## **Assignment**

```
CFG rule: S \rightarrow \mathbf{id} := E

Actions:
S.\mathsf{code} := E.\mathsf{code} \mid\mid \mathsf{Gen}(\mathbf{id}, \text{``="}, E.\mathsf{addr})

Example: \mathbf{a} := \mathbf{b} + \mathbf{c}
S = \mathsf{ASG}(\mathsf{ID}(\text{``a"}), E) \quad E = \mathsf{PLUS}(\mathsf{ID}(\text{``b"}), \mathsf{ID}(\text{``c"}))
E.\mathsf{code} := \text{``t1} = \mathsf{add} \; \mathsf{b}, \; \mathsf{c"} \quad E.\mathsf{addr} := \text{``t1'}
S.\mathsf{code} := \text{``t1} = \mathsf{add} \; \mathsf{b}, \; \mathsf{c"} \mid\mid \text{``a} = \mathsf{t1''}
```

### **IF Statement**

```
AST: IF(E, S)
Generated IR:
         E.code
         bz Label_End, E.addr
         S.code
    Label_End:
Example: if (a > b) { a -= b }
         t1 = slt a, b
         bz Label_End, t1
         a = sub a, b
    Label_End:
```

#### **IF-ELSE Statement**

```
AST: IFELSE(E, S_1, S_2)
Generated IR:
         E.code
         bz Label_Else, E.addr
         S_1.code
         jmp Label_End
    Label_Else:
         S_2.code
    Label_End:
```

## Loop

```
AST: WHILE(E, S)
Generated IR:
    Label_While:
        E.code
        bz Label_End, E.addr
        S.code
        jmp Label_While
    Label_End:
```

#### **Basic Blocks**

A **Basic Block** is a sequence of IR instructions with two properties:

- The first instruction is the only entry point (no other branches in; can only start at the beginning)
- Only the last instruction may affect control (no other branches out)
- $\therefore$  If any instruction in a basic block runs, they all do

Typically "arithmetic and memory instructions, then branch"

```
ENTER: t2 = add t1, 1
t3 = slt t2, 10
bz NEXT, t3
```

# **Basic Blocks and Control-Flow Graphs**

```
WHILE: t1 = sne a1, b1 \blacktriangleleft
         bz DONE, t1
         t2 = slt a1, b1
         bz ELSE, t2
         b1 = sub b1, a1
         jmp LOOP
ELSE: a1 = \sup a1, b1
LOOP:
        imp WHILE
DONE:
         ret a1
```

Leaders: branch targets & after conditional branch

# **Basic Blocks and Control-Flow Graphs**

```
WHILE: t1 = sne a1, b1
        bz DONE, t1
        t2 = slt a1, b1
        bz ELSE, t2
        b1 = sub b1, a1
        jmp LOOP
ELSE:
       a1 = sub a1, b1
LOOP:
       imp WHILE
DONE:
        ret a1
```

- Leaders: branch targets & after conditional branch
- Basic blocks: start at a leader; end before next

# **Basic Blocks and Control-Flow Graphs**



- Leaders: branch targets & after conditional branch
- Basic blocks: start at a leader; end before next
- Basic Blocks are nodes of the Control-Flow Graph

#### The LLVM IR

Three-address code instructions; Static single-assignment; Explicit control-flow graph; Local names start with %; Types throughout; User-defined functions

```
int add(int x, int y)
{
   return x + y;
}
```

```
define i32 @add(i32 %x, i32 %y) {
entry:
    %x1 = alloca i32
    store i32 %x, i32* %x1
    %y2 = alloca i32
    store i32 %y, i32* %y2
    %x3 = load i32* %x1
    %y4 = load i32* %y2
    %tmp = add i32 %x3, %y4
    ret i32 %tmp
}
```

#### The LLVM IR

i32: 32-bit signed integer type

alloca: Allocate space on the stack; return a pointer

**store**: Write a value to an address

load: Read a value from an address

add: Add two values to produce a third

ret: Return a value to the caller

#### **Basic Blocks**

```
int cond(bool b) {
    int x;
    if (b) x = 42;
    else x = 17;
    return x:
     entry:
     \%b1 = alloca i1
      store i1 %b, i1* %b1
      %x = alloca i32
      \%b2 = load i1* \%b1
     br i1 %b2, label %then, label %else
then:
                       else:
store i32 42, i32* %x
                       store i32 17, i32* %x
br label % merge
                       br label %merge
           merge:
            %x3 = load i32* %x
            ret i32 %x3
          CFG for 'cond' function
```

```
define i32 @cond(i1 %b) {
entry:
 %b1 = alloca i1
 store i1 %b, i1* %b1
 %x = alloca i32
 \%h2 = load i1* \%h1
 br i1 %b2, label %then, label %else
merge: ; preds = %else, %then
 %x_3 = load i_{32} * %x
 ret i32 %x3
then: ; preds = %entry
 store i32 42, i32* %x
 br label %merge
else: ; preds = %entry
 store i32 17, i32* %x
 br label %merge
```

```
define i32 @acd(i32 %a, i32 %b) {
                                             entry:
                                              %a1 = alloca i32
                                              store i32 %a, i32* %a1
                                              %b2 = alloca i32
                                              store i32 %b, i32* %b2
                                              br label %while
                                             while:
                                                                 : preds = %merge, %entry
                                              %a11 = load i32* %a1
                                              %b12 = load i32* %b2
                                              %tmp13 = icmp ne i32 %a11, %b12
                                              br i1 %tmp13, label %while body, label %merge14
                                             while body:
                                                                 ; preds = %while
int gcd(int a, int b) {
                                              %a3 = load i32* %a1
   while (a != b)
                                              %b4 = load i32* %b2
                                              %tmp = icmp sgt i32 %a3, %b4
       if (a > b) a = a - b;
                                              br i1 %tmp, label %then, label %else
       else b = b - a;
                                             merae:
                                                                 : preds = %else, %then
                                              br label %while
                                             then:
                                                                  : preds = %while body
                                              %a5 = load i32* %a1
                                              %b6 = load i32* %b2
                                              %tmp7 = sub i32 %a5, %b6
                                              store i32 %tmp7, i32* %a1
                                              br label %merge
                                                                  : preds = %while body
                                             plsp.
                                              %b8 = load i32* %b2
                                              %a9 = load i32* %a1
                                              %tmp10 = sub i32 %b8, %a9
                                              store i32 %tmp10, i32* %b2
                                              br label %merge
                                                                 ; preds = %while
                                             merge14:
                                              %a15 = load i32* %a1
                                              ret i32 %a15
```

return a:

