## **Contents**

| 1. | Inti                 | roduction                                               | 2 |  |  |  |  |  |  |
|----|----------------------|---------------------------------------------------------|---|--|--|--|--|--|--|
|    | 1.1.                 | Stages of Compilation                                   | 1 |  |  |  |  |  |  |
|    | 1.2.                 | The rush Programming Language                           | 3 |  |  |  |  |  |  |
|    |                      | 1.2.1. Features                                         | 3 |  |  |  |  |  |  |
| 2. | Analyzing the Source |                                                         |   |  |  |  |  |  |  |
|    | 2.1.                 | Lexical and Syntactical Analysis                        | 6 |  |  |  |  |  |  |
|    |                      | 2.1.1. Formal Syntactical Definition by a Grammer       | 6 |  |  |  |  |  |  |
|    |                      | 2.1.2. Grouping Characters Into Tokens                  |   |  |  |  |  |  |  |
|    |                      | 2.1.3. Constructing a Tree                              |   |  |  |  |  |  |  |
|    |                      | Operator Precedence                                     |   |  |  |  |  |  |  |
|    |                      | Pratt Parsing                                           |   |  |  |  |  |  |  |
|    |                      | Parser Generators                                       | 3 |  |  |  |  |  |  |
|    | 2.2.                 | Semantic Analysis                                       | 3 |  |  |  |  |  |  |
|    |                      | 2.2.1. Defining the Semantics of a Programming Language | 3 |  |  |  |  |  |  |
|    |                      | 2.2.2. The Semantic Analyzer                            | 3 |  |  |  |  |  |  |
|    |                      | Implementation                                          | 4 |  |  |  |  |  |  |
|    |                      | Early Optimizations                                     | 0 |  |  |  |  |  |  |
| 2  | Inte                 | erpreting the Program 22                                | 2 |  |  |  |  |  |  |
| J. |                      | Tree-Walking Interpreters                               |   |  |  |  |  |  |  |
|    | 9.1.                 | 3.1.1. Implementation                                   |   |  |  |  |  |  |  |
|    |                      | 3.1.2. How the Interpreter Executes a Program           |   |  |  |  |  |  |  |
|    |                      | 3.1.3. Supporting Pointers                              |   |  |  |  |  |  |  |
|    | 3 2                  | Using a Virtual Machine                                 |   |  |  |  |  |  |  |
|    | 0.2.                 | 3.2.1. Defining a Virtual Machine                       |   |  |  |  |  |  |  |
|    |                      | 3.2.2. Register-Based and Stack-Based Machines          |   |  |  |  |  |  |  |
|    |                      | 3.2.3. The rush Virtual Machine                         |   |  |  |  |  |  |  |
|    |                      | 3.2.4. How the Virtual Machine Executes a rush Program  |   |  |  |  |  |  |  |
|    |                      | 3.2.5. Fetch-Decode-Execute Cycle of the VM             |   |  |  |  |  |  |  |
|    |                      | 3.2.6. Comparing the VM to the Tree-Walking Interpreter |   |  |  |  |  |  |  |
| ,  | Con                  | npiling to High-Level Targets 3                         |   |  |  |  |  |  |  |
| ⊶. |                      | How a Compiler Translates the AST                       |   |  |  |  |  |  |  |
|    |                      | The Compiler Targeting the rush VM                      |   |  |  |  |  |  |  |
|    |                      | Compilation to WebAssembly                              |   |  |  |  |  |  |  |
|    | 1.0.                 | 4.3.1. WebAssembly Modules                              |   |  |  |  |  |  |  |
|    |                      | 4.3.2. The WebAssembly System Interface                 |   |  |  |  |  |  |  |
|    |                      | 4.3.3. Implementation                                   |   |  |  |  |  |  |  |
|    |                      | Function Calls                                          |   |  |  |  |  |  |  |
|    |                      | Logical Operators                                       |   |  |  |  |  |  |  |
|    |                      | 4.3.4. Considering an Example rush Program              |   |  |  |  |  |  |  |
|    | 41                   | Using LLVM for Code Generation                          |   |  |  |  |  |  |  |
|    | 1.1.                 | 4.4.1. The Role of LLVM in a Compiler                   |   |  |  |  |  |  |  |
|    |                      | 1.1.1. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1                  | ٠ |  |  |  |  |  |  |

|     |       | 4.4.2.       | The LLVM Intermediate Representation   | 47 |
|-----|-------|--------------|----------------------------------------|----|
|     |       |              | Structure of a Compiled rush Program   | 48 |
|     |       | 4.4.3.       | The rush Compiler Using LLVM           | 50 |
|     |       | 4.4.4.       | Final Code Generation: The Linker      | 55 |
|     |       | 4.4.5.       | Conclusions                            | 56 |
|     | 4.5.  | Trans        | spilers                                | 57 |
| 5.  | Con   | npilin       | g to Low-Level Targets                 | 58 |
| •   |       | -            | Level Programming Concepts             | 58 |
|     |       |              | Sections of an ELF File                | 58 |
|     |       |              | Assemblers and Assembly Language       | 58 |
|     |       |              | Registers                              | 59 |
|     |       |              | Using Memory: The Stack                | 61 |
|     |       | 0.1.1.       | Alignment                              | 61 |
|     |       | 5.1.5.       | Calling Conventions                    | 62 |
|     |       |              | Referencing Variables Using Pointers   | 63 |
|     | 5.2.  |              | -V: Compiling to a RISC Architecture   | 65 |
|     | J     |              | Register Layout                        | 65 |
|     |       |              | Memory Access Through the Stack        | 66 |
|     |       |              | Calling Convention                     | 66 |
|     |       |              | The Core Library                       | 67 |
|     |       |              | RISC-V Assembly                        | 68 |
|     |       |              | Supporting Pointers                    | 70 |
|     |       |              | Implementation                         | 70 |
|     |       | 0.2          | Struct Fields                          | 70 |
|     |       |              | Data Flow and Register Allocation      | 72 |
|     |       |              | Functions                              | 76 |
|     |       |              | Let Statements                         | 77 |
|     |       |              | Function Calls and Returns             | 78 |
|     |       |              | Loops                                  | 81 |
|     | 5.3   | x86 (        | 64: Compiling to a CISC Architecture   | 83 |
|     | 0.0.  |              | x64 Assembly                           | 83 |
|     |       |              | Registers                              | 84 |
|     |       |              | Stack Layout and Calling Convention    | 86 |
|     |       |              | Implementation                         | 86 |
|     |       | 0.0.1.       | Struct Fields                          | 87 |
|     |       |              | Memory Management                      | 88 |
|     |       |              | Register Allocation                    | 88 |
|     |       |              | Functions                              | 89 |
|     |       |              | Function Calls                         | 90 |
|     |       |              | Control Flow                           | 90 |
|     |       |              | Integer Division and Float Comparisons | 92 |
|     |       |              | Pointers                               | 94 |
|     | 5.4.  | Concl        | lusion: RISC vs. CISC Architectures    | 95 |
| 6.  |       |              | ughts and Conclusions                  | 96 |
|     |       | Figur        | _                                      | 98 |
|     |       | •            |                                        |    |
| Ιis | st of | <b>Table</b> | os                                     | 99 |

| List of Listings | 100 |
|------------------|-----|
| Bibliography     | 103 |

# **List of Figures**

|                                              | Steps of compilation. (altered)                                                                                                                                                                                                     | 2                                      |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| 2.2.<br>2.3.                                 | Token precedences for the input '(1+2*3)/4**5'                                                                                                                                                                                      | 8<br>10<br>12<br>21                    |
| 3.2.<br>3.3.                                 | Linear memory of the rush VM                                                                                                                                                                                                        | 23<br>28<br>30<br>33                   |
| 4.2.                                         | Steps of compilation when using LLVM                                                                                                                                                                                                | 35<br>47<br>55                         |
| 5.2.<br>5.3.<br>5.4.<br>5.5.<br>5.6.<br>5.7. | Relationship between registers, memory, and the CPU.  Examples of memory alignment.  Stack layout of the RISC-V architecture.  Simplified integer register pool of the RISC-V rush compiler.  Stack layout of the x64 architecture. | 59<br>60<br>61<br>66<br>73<br>86<br>90 |
| 1.2.                                         | Lines of code of the project's components in commit 'f8b9b9a'                                                                                                                                                                       | 4<br>5<br>5                            |
|                                              | Advancing window of a lexer                                                                                                                                                                                                         | 7 9                                    |
|                                              |                                                                                                                                                                                                                                     | 65<br>85                               |
|                                              |                                                                                                                                                                                                                                     |                                        |

# **List of Listings**

| 1.1.  | Generating Fibonacci numbers using rush                           | 3  |
|-------|-------------------------------------------------------------------|----|
| 2.1.  | Grammar for basic arithmetic in EBNF notation                     | 6  |
| 2.2.  | The rush 'Lexer' struct definition                                | 7  |
| 2.3.  | Simplified 'Token' struct definition                              | 8  |
| 2.4.  | Example language a traditional LL(1) parser cannot parse          | 9  |
| 2.5.  | Pratt-parser: Implementation for token precedences                | 11 |
| 2.6.  | Pratt-parser: Implementation for expressions                      | 11 |
| 2.7.  | Pratt-parser: Implementation for grouped expressions              | 12 |
| 2.8.  | Pratt-parser: Implementation for infix-expressions                | 12 |
| 2.9.  | A rush program which adds two integers                            | 14 |
| 2.10. | Fields of the 'Analyzer' struct                                   | 15 |
| 2.11. | Output when compiling an invalid rush program                     | 15 |
| 2.12. | Analyzer: Validation of the 'main' function's signature           | 16 |
| 2.13. | Analyzer: The 'let_stmt' method                                   | 17 |
| 2.14. | Analyzer: Analysis of expressions during semantic analysis        | 18 |
| 2.15. | Analyzer: Obtaining the type of expressions                       | 18 |
| 2.16. | Analyzer: Validation of argument type compatibility               | 19 |
| 2.17. | Analyzer: Determining whether an expression is constant           | 20 |
| 2.18. | Redundant 'while' loop inside a rush program                      | 20 |
| 2.19. | Analyzer: Loop optimization                                       | 21 |
| 3.1.  | Tree-walking interpreter: Type definitions                        | 22 |
| 3.2.  | Tree-walking interpreter: 'Value' and 'InterruptKind' definitions | 23 |
| 3.3.  | Tree-walking interpreter: Beginning of execution                  | 24 |
| 3.4.  | Tree-walking interpreter: Calling of functions                    | 24 |
| 3.5.  | Example rush program                                              | 25 |
| 3.6.  | Struct definition of the VM                                       | 27 |
| 3.7.  | Minimal pointer example in rush                                   | 28 |
| 3.8.  | VM instructions for the minimal pointer example                   | 29 |
| 3.9.  | A recursive rush program                                          | 29 |
| 3.10. | Struct definition of a 'CallFrame'                                | 29 |
| 3.11. | VM instructions matching the AST in Figure 3.4                    | 30 |
| 3.12. | The 'run' method of the rush VM                                   | 31 |
| 3.13. | Parts of the 'run_instruction' method of the rush VM              | 32 |

## 1. Introduction

See Chapter 2, Section 2.1, Section 2.1.1, Figure 2.1, Table 1.3, and Listing 1.1.

## 1.1. Stages of Compilation

a

Figure 1.1: Steps of compilation.

a.

Figure 1.2: Steps of compilation. (altered)

## 1.2. The rush Programming Language

fn fib(n: int) -> int {}

Listing 1.1: Generating Fibonacci numbers using rush.

#### 1.2.1. Features

Table 1.1: Lines of code of the project's components in commit 'f8b9b9a'.

Table 1.2: Most important features of the rush programming language.

Table 1.3: Data types in the rush programming language.

## 2. Analyzing the Source

## 2.1. Lexical and Syntactical Analysis

#### 2.1.1. Formal Syntactical Definition by a Grammer

fn main() {}

Listing 2.1: Grammar for basic arithmetic in EBNF notation.

#### 2.1.2. Grouping Characters Into Tokens

Listing 2.2: The rush 'Lexer' struct definition.

Table 2.1: Advancing window of a lexer.

Listing 2.3: Simplified 'Token' struct definition.

#### 2.1.3. Constructing a Tree

a

Figure 2.1: Abstract syntax tree for '1+2\*\*3'.

Table 2.2: Mapping from EBNF grammar to Rust type definitions.

#### **Operator Precedence**

fn main() {}

Listing 2.4: Example language a traditional LL(1) parser cannot parse.

## **Pratt Parsing**

а

Figure 2.2: Abstract syntax tree for '1+2\*\*3' using Pratt parsing.

Listing 2.5: Pratt-parser: Implementation for token precedences.

fn main() {}

Listing 2.6: Pratt-parser: Implementation for expressions.

Listing 2.7: Pratt-parser: Implementation for grouped expressions.

fn main() {}

Listing 2.8: Pratt-parser: Implementation for infix-expressions.

a

Figure 2.3: Token precedences for the input '(1+2\*3)/4\*\*5'.

#### **Parser Generators**

- 2.2. Semantic Analysis
- 2.2.1. Defining the Semantics of a Programming Language
- 2.2.2. The Semantic Analyzer

Listing 2.9: A rush program which adds two integers.

## Implementation

Listing 2.10: Fields of the 'Analyzer' struct.

fn main() {}

Listing 2.11: Output when compiling an invalid rush program.

Listing 2.12: Analyzer: Validation of the ' $\mbox{\tt main}$ ' function's signature.

Listing 2.13: Analyzer: The 'let\_stmt' method.

Listing 2.14: Analyzer: Analysis of expressions during semantic analysis.

fn main() {}

Listing 2.15: Analyzer: Obtaining the type of expressions.

Listing 2.16: Analyzer: Validation of argument type compatibility.

Listing 2.17: Analyzer: Determining whether an expression is constant.

#### **Early Optimizations**

fn main() {}

Listing 2.18: Redundant 'while' loop inside a rush program.

Listing 2.19: Analyzer: Loop optimization.

a

Figure 2.4: How semantic analysis affects the abstract syntax tree.

## 3. Interpreting the Program

## 3.1. Tree-Walking Interpreters

fn main() {}

Listing 3.1: Tree-walking interpreter: Type definitions.

#### 3.1.1. Implementation

fn main() {}

Listing 3.2: Tree-walking interpreter: 'Value' and 'InterruptKind' definitions.

#### 3.1.2. How the Interpreter Executes a Program

а

Figure 3.1: Call stack at the point of processing the 'return' statement.

Listing 3.3: Tree-walking interpreter: Beginning of execution.

fn main() {}

Listing 3.4: Tree-walking interpreter: Calling of functions.

Listing 3.5: Example rush program.

### 3.1.3. Supporting Pointers

## 3.2. Using a Virtual Machine

## 3.2.1. Defining a Virtual Machine

Listing 3.6: Struct definition of the VM.

### 3.2.2. Register-Based and Stack-Based Machines

#### 3.2.3. The rush Virtual Machine

Listing 3.7: Minimal pointer example in rush.

a

Figure 3.2: Linear memory of the rush VM.  $\,$ 

Listing 3.8: VM instructions for the minimal pointer example.

#### 3.2.4. How the Virtual Machine Executes a rush Program

fn main() {}

Listing 3.9: A recursive rush program.

fn main() {}

Listing 3.10: Struct definition of a 'CallFrame'.

Figure 3.3: Example call stack of the rush VM.  $\,$ 

Listing 3.11: VM instructions matching the AST in Figure 3.4.  $\,$ 

## 3.2.5. Fetch-Decode-Execute Cycle of the VM

fn main() {}

Listing 3.12: The 'run' method of the rush VM.

#### fn main() {}

Listing 3.13: Parts of the 'run\_instruction' method of the rush  $\mathrm{VM}.$ 

# 3.2.6. Comparing the VM to the Tree-Walking Interpreter

а

Figure 3.4: AST and VM instructions of the recursive rush program in Listing 3.9.

# 4. Compiling to High-Level Targets

# 4.1. How a Compiler Translates the AST

a

Figure 4.1: Abstract syntax tree for '1 + 2 < 4'.

# 4.3. Compilation to WebAssembly

# 4.3.1. WebAssembly Modules

| 4.3.2. The WebAssemb | ly System Interface |
|----------------------|---------------------|
|----------------------|---------------------|

# 4.3.3. Implementation

#### **Function Calls**

# **Logical Operators**

# 4.4. Using LLVM for Code Generation

#### 4.4.1. The Role of LLVM in a Compiler

a

Figure 4.2: Steps of compilation when using LLVM.

#### **4.4.2. The LLVM Intermediate Representation**

# Structure of a Compiled rush Program

# 4.4.3. The rush Compiler Using LLVM

#### 4.4.4. Final Code Generation: The Linker

a

Figure 4.3: The linking process.  $\,$ 

#### 4.4.5. Conclusions

# 4.5. Transpilers

# **5. Compiling to Low-Level Targets**

- **5.1. Low-Level Programming Concepts**
- 5.1.1. Sections of an ELF File
- 5.1.2. Assemblers and Assembly Language

Figure 5.1: Level of abstraction provided by assembly.

# 5.1.3. Registers

Figure 5.2: Relationship between registers, memory, and the CPU.  $\,$ 

# 5.1.4. Using Memory: The Stack

# Alignment

 $\mathbf{a}$ 

Figure 5.3: Examples of memory alignment.

# 5.1.5. Calling Conventions

| 5.1.6 | i. Referei | ncing \ | Variables | Using | <b>Pointers</b> |
|-------|------------|---------|-----------|-------|-----------------|
|       |            |         |           |       |                 |

# **5.2. RISC-V: Compiling to a RISC Architecture**

Table 5.1: Registers of the RISC-V architecture.

#### 5.2.1. Register Layout

# 5.2.2. Memory Access Through the Stack

a

Figure 5.4: Stack layout of the RISC-V architecture.

# 5.2.3. Calling Convention

# 5.2.4. The Core Library

### 5.2.5. RISC-V Assembly

## 5.2.6. Supporting Pointers

#### 5.2.7. Implementation

**Struct Fields** 

### Data Flow and Register Allocation

Figure 5.5: Simplified integer register pool of the RISC-V rush compiler.

#### **Functions**

#### **Let Statements**

#### **Function Calls and Returns**

#### Loops

## 5.3. x86\_64: Compiling to a CISC Architecture

### 5.3.1. x64 Assembly

#### 5.3.2. Registers

Table 5.2: General purpose registers of the x64 architecture.

#### 5.3.3. Stack Layout and Calling Convention

a

Figure 5.6: Stack layout of the x64 architecture.

#### 5.3.4. Implementation

#### **Struct Fields**

### **Memory Management**

#### **Register Allocation**

#### **Functions**

#### **Function Calls**

#### **Control Flow**

a

Figure 5.7: Structure of if-expressions in assembly.

### **Integer Division and Float Comparisons**

#### **Pointers**

### **5.4. Conclusion: RISC vs. CISC Architectures**

## **6. Final Thoughts and Conclusions**

## **List of Figures**

## **List of Tables**

## **List of Listings**

# **Bibliography**