Skip to content

Commit

Permalink
added draft of the report and spelling dictionary
Browse files Browse the repository at this point in the history
  • Loading branch information
helino committed May 13, 2011
1 parent e18d4ce commit baf2afe
Show file tree
Hide file tree
Showing 2 changed files with 130 additions and 0 deletions.
14 changes: 14 additions & 0 deletions doc/.spell.utf-8.add
@@ -0,0 +1,14 @@
DD2448
Helin
lexing
lexer
JavaCC
JVM
bytecode
LALR
Nauer
BNF
EBNF
struct
enum
JJTree
116 changes: 116 additions & 0 deletions doc/report.tex
@@ -0,0 +1,116 @@
\documentclass[11pt,oneside,a4paper]{article}
\usepackage{fullpage}
\usepackage{hyperref}

\begin{document}
\title{Project report in DD2448, Compiler Construction}
\author{Erik Helin, \href{mailto:ehelin@kth.se}{ehelin@kth.se}}
\date{\today}
\maketitle

\tableofcontents

\section{Introduction}
This document describes the choice of tools, design decisions and the overall
structure of the code for a MiniJava compiler.
\section{Implementation}
The compiler was first written using the C programming language, with the help
of the tools Bison and Flex for lexing and parsing. However, due to lack of
time, only the lexer and parser was completed.

I decided to rewrite the compiler from scratch, this time using the Java
programming language, and the tools JavaCC for parsing and Jasmin as assembler
for the JVM bytecode.

The following sections will discuss the different parts of the compiler.
Section \ref{sec:lexing_and_parsing} also contains a comparison of the
different tools used for implementing the parser.

For a general discussion about the two different languages used for the
implementation, see section \ref{sec:discussion}.
\subsection{Lexing and parsing}
\label{sec:lexing_and_parsing}
The lexer and parser was first implemented using the tools Flex and Bison, Flex
for the lexer and Bison for the parser. Flex creates a lexer from regular
expressions which tokens can be used together with the parser generated by
Bison. Bison generates a LALR-parser from a grammar in Backus-Nauer Form (BNF).

The biggest challenge when implementing the LALR parser was to understand
the shift/reduce warnings generated by Bison, as these requires you to
understand the automaton produced by Bison.

The abstract syntax tree was represented by using a struct for each kind of
node. Each node struct has an enum as their first member that represented
the type of the node. To simplify the traversal of the syntax tree,
function pointers was used for callbacks. An function was then provided that
checked type of the node and called the corresponding callback with the node
casted to the correct type as parameter.

The second parser was implemented using JavaCC to create a LL-parser. The lexer
could also be generated with the help of JavaCC by using regular expression
similar to Flex. JavaCC uses Extended Backus-Nauer Form (EBNF) to describe the
grammar. The main advantage of using EBNF over BNF was that a lot of rules
could be simplified. For example, the following specifies zero or more variable
declaration in BNF:
\begin{verbatim}
<variable_declarations> ::= "" | <variable_declaration> <variable_declarations>
\end{verbatim}
In EBNF, this can expressed as
\verb|variable_declarations = variable_declaration*| which made the parser much
more succinct.

The main challenge when writing the LL-parser using JavaCC was to left-factor
the grammar. However, JavaCC produces a top-down parser and allows you to
pass arguments to rules. This made left-factoring the grammar a lot easier,
since now it becomes possible to pass an already parsed expression as an
argument to a rule ''below'' it.

The abstract syntax tree was represented by using a class for each node.
Interfaces was used to represent a generic statement, expression or type.
JavaCC provided the JJTree tool for creating an abstract syntax tree, but this
tool proved to be too inflexible for my needs. For traversing the abstract
syntax tree, the visitor pattern was used. One problem with the visitor pattern
was to implements several kinds of visitors (that is, a visitor returning a
different type than any existing one). For this to work, a new accept method
had to be implemented in each node in the abstract syntax tree, returning the
new type.

\subsection{Type checking}
The type checking part of the compiler was implemented in two stages. In the
first stage, the symbol table is built. In the second stage, the symbol table
is used to type check the MiniJava program. The symbol table consisted of three
different kinds of tables:
\begin{description}
\item{\emph{Program table}} Relates names to their corresponding class table
\item{\emph{Class table}} Relates names to fields or their corresponding method
table
\item{\emph{Method table}} Relates names to parameters or local variables. Also
contains the return type of the method.
\end{description}
This data structure turned out to be successful, since when traversing the
abstract syntax tree, the current class and program table can be kept as
instance variables and the algorithm for looking up a type for a variable
becomes:
\begin{enumerate}
\item Check for the name in the current method table
\item Check for the name in the current class table
\item Check for the name if the program table
\end{enumerate}

When the symbol table is being built, the MiniJava programs is also partly type
checked. Specifically, if there already exists a definition for a class, field,
method (including locals and parameters), the symbol table builder will return
an error. The symbol table builder is implemented with the help of a visitor.

In the second phase, the symbol table checker is given the newly constructed
symbol table. The symbol table checker will then check the type of each
expression according to the Java specification (adjusted for MiniJava). This is
also implemented with the help of a visitor.

\subsection{JVM bytecode generation}
\section{Architecture of the code}
\section{Comparison of C and Java}
\label{sec:discussion}
\appendix
\section{Feedback}
\end{document}

0 comments on commit baf2afe

Please sign in to comment.