# CS202: Compiler Construction

## In-class Exercises, Week of 04/10/2023

----

# Part 1: Functions and Lfun

## Question 1

Write an `Lfun` program with a function called `add1` that adds 1 to its argument, and a call to the function.

In [1]:
def add1(n: int) -> int:
    return n + 1

add1(5)

6

## Question 2

Write a recursive program to calculate the factorial of 5.

In [2]:
def fact(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)

fact(5)

120

## Question 3

Summarize the changes to the compiler to support functions.

Theme: treat each function as its own mini-program
1. before explicate control: add cases for function definitions, treating function definitions as statements
2. explicate control: construct CFG for each function definition, output program is a list of function definitions (each is a mini-program)
3. after explicate control: call passes we wrote for A6 on each mini-program separately

----

# Part 2: Typechecking for Functions

## Question 4

What are the types of the functions `add1` and `*`?

add1: int -> int

add1: `Callable[[int], int]`

`*` : `Callable[[int, int], int ] `

int * int -> int

## Question 5

Why do we need to specify the types of a function's arguments?

- So we can typecheck function calls
- Overloading
- Modularity need to know the types of inputs in order to typecheck function definition isolation
- Recursive functions: need functions's output type in order to typecheck its body (or else you get an infinite loop)

## Question 6

Write a function `length` such that the following expression returns 3:

```
v = (1, (2, (3, 0)))
print(length(v))
```

def length(v: List[int]) -> int:
    ???

## Question 7

How do we typecheck a function call of the form `f(a1, ..., ak)`?

### New case in tc_exp:
  
  `case Call(f, args)`
1. Assume that we have already typechecked the definition of `f`
2. Typecheck f; it should have the type `Callable[[t1, ..., tk], t]`
3. Typecheck `a1`, ..., `ak` and ensure they have the same types as the arguments in the function definition (t1, ..., tk)(lots of assertions)
4. Return the type t

## Question 8

How do we typecheck a function definition?

```py
def fact(n: int) -> int:
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)

```

New case in tc_stmt:
case FunctionDef(name, args_and_types, body_stmts, ret_type)

Update the type environment (somehow) to include the function name and its type

1. Update the type environment to have types for the function's arguments
2. Typecheck the body_stmts `tc_stmts(body_stmts, env)`
```py
env[name] = Callable(arg_types, return_type)
```

## Question 9

How do we typecheck a `Lfun` program?

Three new cases:

1. FunctionDef(name, params, body_stmts, return_type)
2. Return(e)
3. Call(f, args)

- For `FunctionDef(name, params, body_stmts, return_type)`:
    1. `env[name] = Callable(param_types, return_type)`

    We add the function to the environment before typechecking the body, so that the function can call itself.

    2. Copy env into new_env and modify the new_env. (when we are done we can throw away the new_env)
    3. Add bindings to new_env for the function's parameters
    4. new_env['retval'] = return_type
    5. call `tc_stmts(body_stmts, new_env)`

- For `Return(e)`:
    1. Assert that tc_exp(e, env) == env['retval']

- For `Call(func, args)`:
    - Treat it like a Prim
    - Except you also need to call tc_exp(func, env)
    - Expect that the resulting type is a Callable
    - Check that each arg has the type expected by the function
    - Return type is Callable's return type


----
# Part 3: Changes to RCO and Expose-Alloc

## Question 10

Describe the changes to RCO.

New cases:

1. FunctionDef in rco_stmt
    - just call rco_stmts on the body
    - add name to the list of functions
2. Return in rco_stmt
    - call rco_exp on the return expression
    - should make the return expression atomic
3. Call in rco_exp
    - Like Prim
    - also call rco_exp on the function
4. Var in rco_exp, when the variable is a function reference
    - if the var is a function name, generate a tmp for it and return the tmp

## Question 11

Describe the changes to expose-alloc.

1. Add a case to ea_stmt for `FunctionDef` that calls `ea_stmts` on the body of the function

----

# Part 4: Functions in x86 Assembly

## Question 12

Write an x86 assembly program corresponding to the `add1` program.

In [3]:
from cs202_support.eval_x86 import X86Emulator

"""
use callq as a jmp that remembers where it came from;
retq to return to where you came from
calling convention:
put arguments in the registers %rdi, %rsi, %rdx, %rcx, %r8, %r9
put the return value in %rax
"""

asm = """
add1:
  movq %rdi, %r8
  addq $1, %r8
  movq %r8, %rax
  retq
main:
  movq $5, %rdi
  callq add1
  movq %rax, %rdi
  callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()

CALL TO print_int: 6
  Location                        Value
0  reg rbp                         1000
1  reg rsp                         1000
2  reg rdi                            6
3   reg r8                            6
4  reg rax                            6
5     add1  FunPointer(fun_name='add1')
6     main  FunPointer(fun_name='main')
FINAL STATE:
  Location                        Value
0  reg rbp                         1000
1  reg rsp                         1000
2  reg rdi                            6
3   reg r8                            6
4  reg rax                            6
5     add1  FunPointer(fun_name='add1')
6     main  FunPointer(fun_name='main')
OUTPUT: [6]


Unnamed: 0,Location,Value
0,reg rbp,1000
1,reg rsp,1000
2,reg rdi,6
3,reg r8,6
4,reg rax,6
5,add1,FunPointer(fun_name='add1')
6,main,FunPointer(fun_name='main')


## Question 13

Describe the *calling convention* we will use for functions in Rfun.

calling convention:

- put arguments in the registers rdi, rsi, rdx, rcx, r8, r9
- put return value in rax
- book says that for more than 6 parameters, put the rest on the stack
- our compiler will only use 6 parameters

## Question 14

Describe the management of the *stack* and *root stack* performed on function entry and exit.

On function entry:
- Allocate a new stack frame with slots for stack-allocated variables of the function
- allocate a root stack frame with slots for root-stack allocated variables

On function exit:
- Reclaim the stack space we allocated
- Reclaim the root stack space we allocated

we do it in exactly the same way as for the program expect that we don't initialize the heap for the function

## Question 15

Modify the program from earlier to correctly manage the stack and root stack. Allocate the variable `n` on the stack.

In [4]:
asm = """
add1start:
  movq %rdi, %r8
  addq $1, %r8
  movq %r8, %rax
  jmp add1conlusion
add1:
  pushq %rbp
  movq %rsp, %rbp
  subq $0, %rsp
  jmp add1start
add1conlusion:
  addq $0, %rsp
  popq %rbp
  retq
main:
  movq $5, %rdi
  callq add1
  movq %rax, %rdi
  callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()

CALL TO print_int: 6
        Location                                 Value
0        mem 992                                  1000
1        reg rbp                                  1000
2        reg rsp                                  1000
3        reg rdi                                     6
4         reg r8                                     6
5        reg rax                                     6
6      add1start      FunPointer(fun_name='add1start')
7           add1           FunPointer(fun_name='add1')
8  add1conlusion  FunPointer(fun_name='add1conlusion')
9           main           FunPointer(fun_name='main')
FINAL STATE:
        Location                                 Value
0        mem 992                                  1000
1        reg rbp                                  1000
2        reg rsp                                  1000
3        reg rdi                                     6
4         reg r8                                     6
5        reg rax               

Unnamed: 0,Location,Value
0,mem 992,1000
1,reg rbp,1000
2,reg rsp,1000
3,reg rdi,6
4,reg r8,6
5,reg rax,6
6,add1start,FunPointer(fun_name='add1start')
7,add1,FunPointer(fun_name='add1')
8,add1conlusion,FunPointer(fun_name='add1conlusion')
9,main,FunPointer(fun_name='main')


## Question 16

Modify the program again, to save and restore the *callee-saved registers*.

In [5]:
"""
there are two kinds of registers: callee-saved and caller-saved
function can do whatever it wants with the caller-saved registers
function must maintain the values of the callee-saved registers: rbx, r12, r13, r14
"""

asm = """
add1start:
  movq %rdi, %r8
  addq $1, %r8
  movq %r8, %rax
  jmp add1conlusion
add1:
  pushq %rbp
  movq %rsp, %rbp
  pushq %rbx
  pushq %r12
  pushq %r13
  pushq %r14
  subq $0, %rsp
  jmp add1start
add1conlusion:
  addq $0, %rsp
  popq %r14
  popq %r13
  popq %r12
  popq %rbx
  popq %rbp
  retq
main:
  movq $5, %rdi
  callq add1
  movq %rax, %rdi
  callq print_int
"""

emu = X86Emulator(logging=True)
emu.eval_program(asm)
emu.print_state()

CALL TO print_int: 6
         Location                                 Value
0         mem 960                                  None
1         mem 968                                  None
2         mem 976                                  None
3         mem 984                                  None
4         mem 992                                  1000
5         reg rbp                                  1000
6         reg rsp                                  1000
7         reg rdi                                     6
8         reg rbx                                  None
9         reg r12                                  None
10        reg r13                                  None
11        reg r14                                  None
12         reg r8                                     6
13        reg rax                                     6
14      add1start      FunPointer(fun_name='add1start')
15           add1           FunPointer(fun_name='add1')
16  add1conlusion  FunPoint

Unnamed: 0,Location,Value
0,mem 960,
1,mem 968,
2,mem 976,
3,mem 984,
4,mem 992,1000
5,reg rbp,1000
6,reg rsp,1000
7,reg rdi,6
8,reg rbx,
9,reg r12,


----
# Part 5: Explicate-Control

## Question 16

Describe the changes to explicate-control.

Old version: convert statements to control-flow graph
New version: convert statements to a list of function definitions, each with its own control-flow graph

Explicate control works as before but has three new cases:
1. Add Return to ec_stmt
2. Add Call to ec_exp: like Prim, but call ec_atm on the function
3. Add FunctionDef to ec_stmt: call ec_function on the function definition

## Question 17

Describe the ec_function function.

Change the pass globally:

1. Add a global var to the pass called `current_function` that tracks the function being compiled. It stats out as "main".
2. Add a global var to the pass called `functions` that is a list of function definitions. The ec_function function will add to this list
3. Modify `create_block` to add the function's name as as prefix to the label it creates

ec_function:
1. Save `basic_blocks` and `current_function` so we can restore them later.
2. Set `basic_blocks` to an empty list and `current_function` to the name of the function we are compiling
3. Call ec_stmts on the body statements, with the continuation `Return(0)` 
4. Set `basic_blocks[name + 'start']` to the result of step 3
5. Construct a cfun.FunctionDef with the name, parameters names, and basic_blocks
6. Append the function def to `functions`

