Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added draft of the report and spelling dictionary
- Loading branch information
Showing
2 changed files
with
130 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
DD2448 | ||
Helin | ||
lexing | ||
lexer | ||
JavaCC | ||
JVM | ||
bytecode | ||
LALR | ||
Nauer | ||
BNF | ||
EBNF | ||
struct | ||
enum | ||
JJTree |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
\documentclass[11pt,oneside,a4paper]{article} | ||
\usepackage{fullpage} | ||
\usepackage{hyperref} | ||
|
||
\begin{document} | ||
\title{Project report in DD2448, Compiler Construction} | ||
\author{Erik Helin, \href{mailto:ehelin@kth.se}{ehelin@kth.se}} | ||
\date{\today} | ||
\maketitle | ||
|
||
\tableofcontents | ||
|
||
\section{Introduction} | ||
This document describes the choice of tools, design decisions and the overall | ||
structure of the code for a MiniJava compiler. | ||
\section{Implementation} | ||
The compiler was first written using the C programming language, with the help | ||
of the tools Bison and Flex for lexing and parsing. However, due to lack of | ||
time, only the lexer and parser was completed. | ||
|
||
I decided to rewrite the compiler from scratch, this time using the Java | ||
programming language, and the tools JavaCC for parsing and Jasmin as assembler | ||
for the JVM bytecode. | ||
|
||
The following sections will discuss the different parts of the compiler. | ||
Section \ref{sec:lexing_and_parsing} also contains a comparison of the | ||
different tools used for implementing the parser. | ||
|
||
For a general discussion about the two different languages used for the | ||
implementation, see section \ref{sec:discussion}. | ||
\subsection{Lexing and parsing} | ||
\label{sec:lexing_and_parsing} | ||
The lexer and parser was first implemented using the tools Flex and Bison, Flex | ||
for the lexer and Bison for the parser. Flex creates a lexer from regular | ||
expressions which tokens can be used together with the parser generated by | ||
Bison. Bison generates a LALR-parser from a grammar in Backus-Nauer Form (BNF). | ||
|
||
The biggest challenge when implementing the LALR parser was to understand | ||
the shift/reduce warnings generated by Bison, as these requires you to | ||
understand the automaton produced by Bison. | ||
|
||
The abstract syntax tree was represented by using a struct for each kind of | ||
node. Each node struct has an enum as their first member that represented | ||
the type of the node. To simplify the traversal of the syntax tree, | ||
function pointers was used for callbacks. An function was then provided that | ||
checked type of the node and called the corresponding callback with the node | ||
casted to the correct type as parameter. | ||
|
||
The second parser was implemented using JavaCC to create a LL-parser. The lexer | ||
could also be generated with the help of JavaCC by using regular expression | ||
similar to Flex. JavaCC uses Extended Backus-Nauer Form (EBNF) to describe the | ||
grammar. The main advantage of using EBNF over BNF was that a lot of rules | ||
could be simplified. For example, the following specifies zero or more variable | ||
declaration in BNF: | ||
\begin{verbatim} | ||
<variable_declarations> ::= "" | <variable_declaration> <variable_declarations> | ||
\end{verbatim} | ||
In EBNF, this can expressed as | ||
\verb|variable_declarations = variable_declaration*| which made the parser much | ||
more succinct. | ||
|
||
The main challenge when writing the LL-parser using JavaCC was to left-factor | ||
the grammar. However, JavaCC produces a top-down parser and allows you to | ||
pass arguments to rules. This made left-factoring the grammar a lot easier, | ||
since now it becomes possible to pass an already parsed expression as an | ||
argument to a rule ''below'' it. | ||
|
||
The abstract syntax tree was represented by using a class for each node. | ||
Interfaces was used to represent a generic statement, expression or type. | ||
JavaCC provided the JJTree tool for creating an abstract syntax tree, but this | ||
tool proved to be too inflexible for my needs. For traversing the abstract | ||
syntax tree, the visitor pattern was used. One problem with the visitor pattern | ||
was to implements several kinds of visitors (that is, a visitor returning a | ||
different type than any existing one). For this to work, a new accept method | ||
had to be implemented in each node in the abstract syntax tree, returning the | ||
new type. | ||
|
||
\subsection{Type checking} | ||
The type checking part of the compiler was implemented in two stages. In the | ||
first stage, the symbol table is built. In the second stage, the symbol table | ||
is used to type check the MiniJava program. The symbol table consisted of three | ||
different kinds of tables: | ||
\begin{description} | ||
\item{\emph{Program table}} Relates names to their corresponding class table | ||
\item{\emph{Class table}} Relates names to fields or their corresponding method | ||
table | ||
\item{\emph{Method table}} Relates names to parameters or local variables. Also | ||
contains the return type of the method. | ||
\end{description} | ||
This data structure turned out to be successful, since when traversing the | ||
abstract syntax tree, the current class and program table can be kept as | ||
instance variables and the algorithm for looking up a type for a variable | ||
becomes: | ||
\begin{enumerate} | ||
\item Check for the name in the current method table | ||
\item Check for the name in the current class table | ||
\item Check for the name if the program table | ||
\end{enumerate} | ||
|
||
When the symbol table is being built, the MiniJava programs is also partly type | ||
checked. Specifically, if there already exists a definition for a class, field, | ||
method (including locals and parameters), the symbol table builder will return | ||
an error. The symbol table builder is implemented with the help of a visitor. | ||
|
||
In the second phase, the symbol table checker is given the newly constructed | ||
symbol table. The symbol table checker will then check the type of each | ||
expression according to the Java specification (adjusted for MiniJava). This is | ||
also implemented with the help of a visitor. | ||
|
||
\subsection{JVM bytecode generation} | ||
\section{Architecture of the code} | ||
\section{Comparison of C and Java} | ||
\label{sec:discussion} | ||
\appendix | ||
\section{Feedback} | ||
\end{document} |