The final phase of the MiniJava compiler suite, developed for the Compilers II course at the National and Kapodistrian University of Athens (UoA). This project implements a backend that transforms semantically verified MiniJava code into optimized LLVM Intermediate Representation (IR).
A primary challenge of this project was mapping Java's Object-Oriented features (inheritance and polymorphism) to LLVM’s linear memory model.
To support method overriding and dynamic dispatch, I implemented a custom VTable structure. Each class has a unique VTable containing:
- Method Offsets: Precise memory offsets for function pointers.
- Inheritance Mapping: Tracks which class a specific method implementation "belongs to" to correctly resolve super-calls.
- Variable Offsets: Calculated byte-offsets for class attributes, accounting for the overhead of the VTable pointer (first 8 bytes of the object).
The generator utilizes a multi-layered Symbol Table to maintain scope context (Class -> Method -> Variable) and propagate type information throughout the IR generation phase.
The generator uses the Visitor Pattern to emit LLVM bitcode. Key features include:
Unique counters are maintained for generating branching labels, ensuring that nested loops and conditionals have distinct jump targets in the IR.
if-then,if-else,if-endlooptargets forwhilestatements.
To maintain Java's safety guarantees, the generator automatically emits IR for:
- Out-of-Bounds (OOB): Array access validation logic.
- Negative Size (NSZ): Validation during array allocation.
Since LLVM IR uses a Static Single Assignment (SSA) form, the generator manages a virtual register counter to ensure every intermediate result is assigned to a unique %_ register.
- Heap Allocation: Generates
call i8* @malloc(i32 ...)calls, calculating the total size of objects based on their attribute types. - Method Invocation: Implements the "Lookup-Load-Call" sequence:
- Load the object pointer.
- Access the VTable.
- Load the function pointer at the calculated offset.
- Execute the
callinstruction.
- Context Propagation: Uses a path-based scope string (e.g.,
Main->foo) and state variables likeLastClassAllocatedto resolve the types of nestedMessageSendcalls.
The project includes a Makefile to build the Java compiler and a helper script to run the generated LLVM code using clang.
-
Compile the Generator:
make
-
Generate LLVM IR:
java Main <file.java>
-
Execute via Clang:
clang -o output <generated_file.ll> ./output
- SSA Form: Managing immutable registers in LLVM IR.
- Pointer Arithmetic: Manual calculation of byte-offsets for class fields and methods.
- Backend Optimization: Generating clean, valid IR that can be further optimized by the LLVM toolchain.