added draft of the report and spelling dictionary

edvbld · May 13, 2011 · baf2afe · baf2afe
1 parent e18d4ce
commit baf2afe
Show file tree

Hide file tree

Showing 2 changed files with 130 additions and 0 deletions.
diff --git a/doc/.spell.utf-8.add b/doc/.spell.utf-8.add
@@ -0,0 +1,14 @@
+DD2448
+Helin
+lexing
+lexer
+JavaCC
+JVM
+bytecode
+LALR
+Nauer
+BNF
+EBNF
+struct
+enum
+JJTree
diff --git a/doc/report.tex b/doc/report.tex
@@ -0,0 +1,116 @@
+\documentclass[11pt,oneside,a4paper]{article}
+\usepackage{fullpage}
+\usepackage{hyperref}
+
+\begin{document}
+\title{Project report in DD2448, Compiler Construction}
+\author{Erik Helin, \href{mailto:ehelin@kth.se}{ehelin@kth.se}}
+\date{\today}
+\maketitle
+
+\tableofcontents
+
+\section{Introduction}
+This document describes the choice of tools, design decisions and the overall 
+structure of the code for a MiniJava compiler.
+\section{Implementation}
+The compiler was first written using the C programming language, with the help
+of the tools Bison and Flex for lexing and parsing. However, due to lack of
+time, only the lexer and parser was completed. 
+
+I decided to rewrite the compiler from scratch, this time using the Java
+programming language, and the tools JavaCC for parsing and Jasmin as assembler
+for the JVM bytecode.
+
+The following sections will discuss the different parts of the compiler.
+Section \ref{sec:lexing_and_parsing} also contains a comparison of the
+different tools used for implementing the parser.
+
+For a general discussion about the two different languages used for the
+implementation, see section \ref{sec:discussion}.
+\subsection{Lexing and parsing}
+\label{sec:lexing_and_parsing}
+The lexer and parser was first implemented using the tools Flex and Bison, Flex
+for the lexer and Bison for the parser. Flex creates a lexer from regular
+expressions which tokens can be used together with the parser generated by
+Bison. Bison generates a LALR-parser from a grammar in Backus-Nauer Form (BNF). 
+
+The biggest challenge when implementing the LALR parser was to understand
+the shift/reduce warnings generated by Bison, as these requires you to
+understand the automaton produced by Bison.
+
+The abstract syntax tree was represented by using a struct for each kind of
+node. Each node struct has an enum as their first member that represented 
+the type of the node. To simplify the traversal of the syntax tree, 
+function pointers was used for callbacks. An function was then provided that 
+checked type of the node and called the corresponding callback with the node
+casted to the correct type as parameter.
+
+The second parser was implemented using JavaCC to create a LL-parser. The lexer
+could also be generated with the help of JavaCC by using regular expression
+similar to Flex. JavaCC uses Extended Backus-Nauer Form (EBNF) to describe the
+grammar. The main advantage of using EBNF over BNF was that a lot of rules
+could be simplified. For example, the following specifies zero or more variable
+declaration in BNF:
+\begin{verbatim}
+<variable_declarations> ::= "" | <variable_declaration> <variable_declarations>
+\end{verbatim}
+In EBNF, this can expressed as 
+\verb|variable_declarations = variable_declaration*| which made the parser much 
+more succinct.
+
+The main challenge when writing the LL-parser using JavaCC was to left-factor
+the grammar. However, JavaCC produces a top-down parser and allows you to
+pass arguments to rules. This made left-factoring the grammar  a lot easier, 
+since now it becomes possible to pass an already parsed expression as an 
+argument to a rule ''below'' it.
+
+The abstract syntax tree was represented by using a class for each node.
+Interfaces was used to represent a generic statement, expression or type.
+JavaCC provided the JJTree tool for creating an abstract syntax tree, but this
+tool proved to be too inflexible for my needs. For traversing the abstract
+syntax tree, the visitor pattern was used. One problem with the visitor pattern
+was to implements several kinds of visitors (that is, a visitor returning a
+different type than any existing one). For this to work, a new accept method
+had to be implemented in each node in the abstract syntax tree, returning the
+new type.
+
+\subsection{Type checking}
+The type checking part of the compiler was implemented in two stages. In the
+first stage, the symbol table is built. In the second stage, the symbol table
+is used to type check the MiniJava program. The symbol table consisted of three
+different kinds of tables:
+\begin{description}
+\item{\emph{Program table}} Relates names to their corresponding class table
+\item{\emph{Class table}} Relates names to fields or their corresponding method
+table
+\item{\emph{Method table}} Relates names to parameters or local variables. Also
+contains the return type of the method.
+\end{description}
+This data structure turned out to be successful, since when traversing the
+abstract syntax tree, the current class and program table can be kept as
+instance variables and the algorithm for looking up a type for a variable
+becomes:
+\begin{enumerate}
+\item Check for the name in the current method table
+\item Check for the name in the current class table
+\item Check for the name if the program table
+\end{enumerate}
+
+When the symbol table is being built, the MiniJava programs is also partly type
+checked. Specifically, if there already exists a definition for a class, field,
+method (including locals and parameters), the symbol table builder will return
+an error. The symbol table builder is implemented with the help of a visitor.
+
+In the second phase, the symbol table checker is given the newly constructed 
+symbol table. The symbol table checker will then check the type of each
+expression according to the Java specification (adjusted for MiniJava). This is
+also implemented with the help of a visitor.
+
+\subsection{JVM bytecode generation}
+\section{Architecture of the code}
+\section{Comparison of C and Java}
+\label{sec:discussion}
+\appendix
+\section{Feedback}
+\end{document}