Tiger Programming Language
This is a compiler for the Tiger programming language, created from the projects in Modern Compiler Implementation in ML, by Appel.
It's not a full compiler yet, but you can run tests & print trees in the SML REPL:
- CM.make "sources.cm"; - Main.compile "test.tig"; - Main.main "test.tig"; - Main.printAbsyn "test.tig"; - Main.printIR "test.tig"; - Main.printAssem "test.tig"; - Main.test (); - Main.runTest 42; - Main.printTestAbsyn 42; - Main.printTestIR 42; - Main.printTestAssem 42;
The lexer is specified in
tiger.lex and generated by
Strings are enclosed in double-quotes(
"My String"). Multi-line strings are
allowed using the a backslash immediately before a newline and another before
"My multi-line \ \string."
Any white-space between the backslashes is removed.
When a backlash is encountered, the STRING state transitions to the STRING_ESCAPE state, where any escape sequences are handled. The STRING_ESCAPE state may transition to the STRING_CONTROL state for handling control characters, or the STRING_LONG_ESCAPE for handling multi-line strings. After an escape sequence, the lexer transitions back to the STRING state.
Multi-line, nested comments are allowed. This is implemented by counting the comment depth. Opening a comment increases the depth by one, closing a comment decreases the depth by one. If the lexer reaches the end of the file and the comment depth is not zero, then there is an unclosed comment in the file. If this occurs, an error is reported.
Error handling is minimal. The line and column of the error are shown along with an error message. Errors do not halt execution of the parser. Non-whitespace characters between the backslashes of multi-line strings are ignored & reported. Unclosed strings are reported as well.
End of File
When the end of the file is reached, the lexer ensures all comments have been closed before returning the EOF token.
The grammer for the parser is specified in
tiger.grm and generated by
ml-yacc. The grammar follows the general outline of the Tiger reference.
A special exception is made for lvalues, as parsing array lvalues caused a shift/reduce conflict with the parsing of array instantiation. Instead of directly parsing all lvalues in an expression, we parse an ID as an expression and lvalues that are not IDs as another expression. Then we reference the full class of lvalues from the parse.
Error messages are pretty generic at the moment, they could be improved with
more specifics, e.g.,
function expects 3 arguments but you passed 5.
Checking functions should be refactored to only take a type, not an expression as well. There's lots of boilerplate with the current functions.
transOp could be improved to remove duplication, as well as the interface in
ints at least).
Have not yet verified that we check for every improper use of
Translation to x86 architecture is being developed in the
Translation expressions are categorized by
used for expressions, statements, & conditional jumps.
For expressions are
Static Links are passed as the first argument to functions. This is hidden
Translate.formals does not include the static
Built-in functions are defined in the C runtime and as a child of the
frame level. Array allocation & initialization is also handled by the runtime,
in addition to record allocation, but not record initialization. The string
comparison functions are also implemented in C.
The book provides most of the re-ordering & tracing code, but the
basicBlocks functions were written ourselves.
The book code may eventually be rewritten.
Machine code is currently only generated for the Pentium x86 32-bit
architecture, via the
PentiumGen structure. UVA's x86 Assembly
Guide was used as a register/instruction reference.
Liveness analysis is done in two parts, first a flow graph is created, then an interference graph is created.
The flow graph is created in two steps. Starting from the end of the
instruction list, all nodes are created for each
Assem.instr along with
edges for jumps to labels that have already been processed & edges for
instructions falling through. If a jump label has not yet been processed, the
label and source node are added to a queue that is used to generate the
skipped edges after all instructions have had nodes created.
The interference graph is also created in two steps. First we create
out liveness sets for each node in the flow graph, mapping nodes to the
temporaries live at that node. The
in set is only used to help us generate
out set. After the
out sets are generated, we use them to generate the
interference graph. For each defined temporary in a node, we add interference
edges to each temporary live out of that node.