Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md
absyn.sml
assem.sml
canon.sml
codegen.sig
env.sml
errormsg.sml
findescape.sml
flow.sml
frame.sig
graph.sig
graph.sml
liveness.sml
main.sml
makegraph.sml
parse.sml
pentiumframe.sml
pentiumgen.sml
prabsyn.sml
printtree.sml
semant.sml
sources.cm
symbol.sml
table.sig
table.sml
temp.sig
temp.sml
tiger.grm
tiger.lex
translate.sml
tree.sml
types.sml

README.md

Tiger Programming Language

This is a compiler for the Tiger programming language, created from the projects in Modern Compiler Implementation in ML, by Appel.

Usage

It's not a full compiler yet, but you can run tests & print trees in the SML REPL:

- CM.make "sources.cm";
- Main.compile "test.tig";
- Main.main "test.tig";
- Main.printAbsyn "test.tig";
- Main.printIR "test.tig";
- Main.printAssem "test.tig";
- Main.test ();
- Main.runTest 42;
- Main.printTestAbsyn 42;
- Main.printTestIR 42;
- Main.printTestAssem 42;

Lexer

The lexer is specified in tiger.lex and generated by ml-lex.

Strings

Strings are enclosed in double-quotes("My String"). Multi-line strings are allowed using the a backslash immediately before a newline and another before any text:

"My multi-line \
    \string."

Any white-space between the backslashes is removed.

When a backlash is encountered, the STRING state transitions to the STRING_ESCAPE state, where any escape sequences are handled. The STRING_ESCAPE state may transition to the STRING_CONTROL state for handling control characters, or the STRING_LONG_ESCAPE for handling multi-line strings. After an escape sequence, the lexer transitions back to the STRING state.

Comments

Multi-line, nested comments are allowed. This is implemented by counting the comment depth. Opening a comment increases the depth by one, closing a comment decreases the depth by one. If the lexer reaches the end of the file and the comment depth is not zero, then there is an unclosed comment in the file. If this occurs, an error is reported.

Error Handling

Error handling is minimal. The line and column of the error are shown along with an error message. Errors do not halt execution of the parser. Non-whitespace characters between the backslashes of multi-line strings are ignored & reported. Unclosed strings are reported as well.

End of File

When the end of the file is reached, the lexer ensures all comments have been closed before returning the EOF token.

Parser

The grammer for the parser is specified in tiger.grm and generated by ml-yacc. The grammar follows the general outline of the Tiger reference.

A special exception is made for lvalues, as parsing array lvalues caused a shift/reduce conflict with the parsing of array instantiation. Instead of directly parsing all lvalues in an expression, we parse an ID as an expression and lvalues that are not IDs as another expression. Then we reference the full class of lvalues from the parse.

Type Checker

Error messages are pretty generic at the moment, they could be improved with more specifics, e.g., function expects 3 arguments but you passed 5.

Checking functions should be refactored to only take a type, not an expression as well. There's lots of boilerplate with the current functions.

transOp could be improved to remove duplication, as well as the interface in the Translate module(for ints at least).

Have not yet verified that we check for every improper use of nil.

Translation

Translation to x86 architecture is being developed in the PentiumFrame module.

Translation expressions are categorized by Ex, Nx, & Cx constructors, used for expressions, statements, & conditional jumps. For expressions are re-written as While expressions.

Static Links are passed as the first argument to functions. This is hidden within the Translate module, Translate.formals does not include the static link.

Built-in functions are defined in the C runtime and as a child of the Top frame level. Array allocation & initialization is also handled by the runtime, in addition to record allocation, but not record initialization. The string comparison functions are also implemented in C.

Canonicalization

The book provides most of the re-ordering & tracing code, but the do_stm, do_exp, & basicBlocks functions were written ourselves.

The book code may eventually be rewritten.

Code Generation

Machine code is currently only generated for the Pentium x86 32-bit architecture, via the PentiumGen structure. UVA's x86 Assembly Guide was used as a register/instruction reference.

Liveness Analysis

Liveness analysis is done in two parts, first a flow graph is created, then an interference graph is created.

The flow graph is created in two steps. Starting from the end of the instruction list, all nodes are created for each Assem.instr along with edges for jumps to labels that have already been processed & edges for instructions falling through. If a jump label has not yet been processed, the label and source node are added to a queue that is used to generate the skipped edges after all instructions have had nodes created.

The interference graph is also created in two steps. First we create in and out liveness sets for each node in the flow graph, mapping nodes to the temporaries live at that node. The in set is only used to help us generate the out set. After the out sets are generated, we use them to generate the interference graph. For each defined temporary in a node, we add interference edges to each temporary live out of that node.

You can’t perform that action at this time.