Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Tiger Programming Language

This is a compiler for the Tiger programming language, created from the projects in Modern Compiler Implementation in ML, by Appel.


It's not a full compiler yet, but you can run tests & print trees in the SML REPL:

- CM.make "";
- Main.compile "test.tig";
- Main.main "test.tig";
- Main.printAbsyn "test.tig";
- Main.printIR "test.tig";
- Main.printAssem "test.tig";
- Main.test ();
- Main.runTest 42;
- Main.printTestAbsyn 42;
- Main.printTestIR 42;
- Main.printTestAssem 42;


The lexer is specified in tiger.lex and generated by ml-lex.


Strings are enclosed in double-quotes("My String"). Multi-line strings are allowed using the a backslash immediately before a newline and another before any text:

"My multi-line \

Any white-space between the backslashes is removed.

When a backlash is encountered, the STRING state transitions to the STRING_ESCAPE state, where any escape sequences are handled. The STRING_ESCAPE state may transition to the STRING_CONTROL state for handling control characters, or the STRING_LONG_ESCAPE for handling multi-line strings. After an escape sequence, the lexer transitions back to the STRING state.


Multi-line, nested comments are allowed. This is implemented by counting the comment depth. Opening a comment increases the depth by one, closing a comment decreases the depth by one. If the lexer reaches the end of the file and the comment depth is not zero, then there is an unclosed comment in the file. If this occurs, an error is reported.

Error Handling

Error handling is minimal. The line and column of the error are shown along with an error message. Errors do not halt execution of the parser. Non-whitespace characters between the backslashes of multi-line strings are ignored & reported. Unclosed strings are reported as well.

End of File

When the end of the file is reached, the lexer ensures all comments have been closed before returning the EOF token.


The grammer for the parser is specified in tiger.grm and generated by ml-yacc. The grammar follows the general outline of the Tiger reference.

A special exception is made for lvalues, as parsing array lvalues caused a shift/reduce conflict with the parsing of array instantiation. Instead of directly parsing all lvalues in an expression, we parse an ID as an expression and lvalues that are not IDs as another expression. Then we reference the full class of lvalues from the parse.

Type Checker

Error messages are pretty generic at the moment, they could be improved with more specifics, e.g., function expects 3 arguments but you passed 5.

Checking functions should be refactored to only take a type, not an expression as well. There's lots of boilerplate with the current functions.

transOp could be improved to remove duplication, as well as the interface in the Translate module(for ints at least).

Have not yet verified that we check for every improper use of nil.


Translation to x86 architecture is being developed in the PentiumFrame module.

Translation expressions are categorized by Ex, Nx, & Cx constructors, used for expressions, statements, & conditional jumps. For expressions are re-written as While expressions.

Static Links are passed as the first argument to functions. This is hidden within the Translate module, Translate.formals does not include the static link.

Built-in functions are defined in the C runtime and as a child of the Top frame level. Array allocation & initialization is also handled by the runtime, in addition to record allocation, but not record initialization. The string comparison functions are also implemented in C.


The book provides most of the re-ordering & tracing code, but the do_stm, do_exp, & basicBlocks functions were written ourselves.

The book code may eventually be rewritten.

Code Generation

Machine code is currently only generated for the Pentium x86 32-bit architecture, via the PentiumGen structure. UVA's x86 Assembly Guide was used as a register/instruction reference.

Liveness Analysis

Liveness analysis is done in two parts, first a flow graph is created, then an interference graph is created.

The flow graph is created in two steps. Starting from the end of the instruction list, all nodes are created for each Assem.instr along with edges for jumps to labels that have already been processed & edges for instructions falling through. If a jump label has not yet been processed, the label and source node are added to a queue that is used to generate the skipped edges after all instructions have had nodes created.

The interference graph is also created in two steps. First we create in and out liveness sets for each node in the flow graph, mapping nodes to the temporaries live at that node. The in set is only used to help us generate the out set. After the out sets are generated, we use them to generate the interference graph. For each defined temporary in a node, we add interference edges to each temporary live out of that node.

You can’t perform that action at this time.