Skip to content

nikolasil/compiler-llvm-generator

Repository files navigation

MiniJava to LLVM IR Code Generator

The final phase of the MiniJava compiler suite, developed for the Compilers II course at the National and Kapodistrian University of Athens (UoA). This project implements a backend that transforms semantically verified MiniJava code into optimized LLVM Intermediate Representation (IR).


## Memory Layout & Dynamic Dispatch

A primary challenge of this project was mapping Java's Object-Oriented features (inheritance and polymorphism) to LLVM’s linear memory model.

### 1. Virtual Tables (VTable)

To support method overriding and dynamic dispatch, I implemented a custom VTable structure. Each class has a unique VTable containing:

  • Method Offsets: Precise memory offsets for function pointers.
  • Inheritance Mapping: Tracks which class a specific method implementation "belongs to" to correctly resolve super-calls.
  • Variable Offsets: Calculated byte-offsets for class attributes, accounting for the overhead of the VTable pointer (first 8 bytes of the object).

### 2. Symbol Table Integration

The generator utilizes a multi-layered Symbol Table to maintain scope context (Class -> Method -> Variable) and propagate type information throughout the IR generation phase.


## Code Generation Logic

The generator uses the Visitor Pattern to emit LLVM bitcode. Key features include:

### Control Flow & Labels

Unique counters are maintained for generating branching labels, ensuring that nested loops and conditionals have distinct jump targets in the IR.

  • if-then, if-else, if-end
  • loop targets for while statements.

### Safety Checks

To maintain Java's safety guarantees, the generator automatically emits IR for:

  • Out-of-Bounds (OOB): Array access validation logic.
  • Negative Size (NSZ): Validation during array allocation.

### Register Management

Since LLVM IR uses a Static Single Assignment (SSA) form, the generator manages a virtual register counter to ensure every intermediate result is assigned to a unique %_ register.


## Implementation Highlights

  • Heap Allocation: Generates call i8* @malloc(i32 ...) calls, calculating the total size of objects based on their attribute types.
  • Method Invocation: Implements the "Lookup-Load-Call" sequence:
    1. Load the object pointer.
    2. Access the VTable.
    3. Load the function pointer at the calculated offset.
    4. Execute the call instruction.
  • Context Propagation: Uses a path-based scope string (e.g., Main->foo) and state variables like LastClassAllocated to resolve the types of nested MessageSend calls.

## Getting Started

### Compilation & Usage

The project includes a Makefile to build the Java compiler and a helper script to run the generated LLVM code using clang.

  1. Compile the Generator:

    make
  2. Generate LLVM IR:

     java Main <file.java>
  3. Execute via Clang:

     clang -o output <generated_file.ll>
     ./output

Technical Keypoints

  • SSA Form: Managing immutable registers in LLVM IR.
  • Pointer Arithmetic: Manual calculation of byte-offsets for class fields and methods.
  • Backend Optimization: Generating clean, valid IR that can be further optimized by the LLVM toolchain.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors