diff --git a/doc/.spell.utf-8.add b/doc/.spell.utf-8.add new file mode 100644 index 0000000..06e6f66 --- /dev/null +++ b/doc/.spell.utf-8.add @@ -0,0 +1,14 @@ +DD2448 +Helin +lexing +lexer +JavaCC +JVM +bytecode +LALR +Nauer +BNF +EBNF +struct +enum +JJTree diff --git a/doc/report.tex b/doc/report.tex new file mode 100644 index 0000000..ebb505b --- /dev/null +++ b/doc/report.tex @@ -0,0 +1,116 @@ +\documentclass[11pt,oneside,a4paper]{article} +\usepackage{fullpage} +\usepackage{hyperref} + +\begin{document} +\title{Project report in DD2448, Compiler Construction} +\author{Erik Helin, \href{mailto:ehelin@kth.se}{ehelin@kth.se}} +\date{\today} +\maketitle + +\tableofcontents + +\section{Introduction} +This document describes the choice of tools, design decisions and the overall +structure of the code for a MiniJava compiler. +\section{Implementation} +The compiler was first written using the C programming language, with the help +of the tools Bison and Flex for lexing and parsing. However, due to lack of +time, only the lexer and parser was completed. + +I decided to rewrite the compiler from scratch, this time using the Java +programming language, and the tools JavaCC for parsing and Jasmin as assembler +for the JVM bytecode. + +The following sections will discuss the different parts of the compiler. +Section \ref{sec:lexing_and_parsing} also contains a comparison of the +different tools used for implementing the parser. + +For a general discussion about the two different languages used for the +implementation, see section \ref{sec:discussion}. +\subsection{Lexing and parsing} +\label{sec:lexing_and_parsing} +The lexer and parser was first implemented using the tools Flex and Bison, Flex +for the lexer and Bison for the parser. Flex creates a lexer from regular +expressions which tokens can be used together with the parser generated by +Bison. Bison generates a LALR-parser from a grammar in Backus-Nauer Form (BNF). + +The biggest challenge when implementing the LALR parser was to understand +the shift/reduce warnings generated by Bison, as these requires you to +understand the automaton produced by Bison. + +The abstract syntax tree was represented by using a struct for each kind of +node. Each node struct has an enum as their first member that represented +the type of the node. To simplify the traversal of the syntax tree, +function pointers was used for callbacks. An function was then provided that +checked type of the node and called the corresponding callback with the node +casted to the correct type as parameter. + +The second parser was implemented using JavaCC to create a LL-parser. The lexer +could also be generated with the help of JavaCC by using regular expression +similar to Flex. JavaCC uses Extended Backus-Nauer Form (EBNF) to describe the +grammar. The main advantage of using EBNF over BNF was that a lot of rules +could be simplified. For example, the following specifies zero or more variable +declaration in BNF: +\begin{verbatim} + ::= "" | +\end{verbatim} +In EBNF, this can expressed as +\verb|variable_declarations = variable_declaration*| which made the parser much +more succinct. + +The main challenge when writing the LL-parser using JavaCC was to left-factor +the grammar. However, JavaCC produces a top-down parser and allows you to +pass arguments to rules. This made left-factoring the grammar a lot easier, +since now it becomes possible to pass an already parsed expression as an +argument to a rule ''below'' it. + +The abstract syntax tree was represented by using a class for each node. +Interfaces was used to represent a generic statement, expression or type. +JavaCC provided the JJTree tool for creating an abstract syntax tree, but this +tool proved to be too inflexible for my needs. For traversing the abstract +syntax tree, the visitor pattern was used. One problem with the visitor pattern +was to implements several kinds of visitors (that is, a visitor returning a +different type than any existing one). For this to work, a new accept method +had to be implemented in each node in the abstract syntax tree, returning the +new type. + +\subsection{Type checking} +The type checking part of the compiler was implemented in two stages. In the +first stage, the symbol table is built. In the second stage, the symbol table +is used to type check the MiniJava program. The symbol table consisted of three +different kinds of tables: +\begin{description} +\item{\emph{Program table}} Relates names to their corresponding class table +\item{\emph{Class table}} Relates names to fields or their corresponding method +table +\item{\emph{Method table}} Relates names to parameters or local variables. Also +contains the return type of the method. +\end{description} +This data structure turned out to be successful, since when traversing the +abstract syntax tree, the current class and program table can be kept as +instance variables and the algorithm for looking up a type for a variable +becomes: +\begin{enumerate} +\item Check for the name in the current method table +\item Check for the name in the current class table +\item Check for the name if the program table +\end{enumerate} + +When the symbol table is being built, the MiniJava programs is also partly type +checked. Specifically, if there already exists a definition for a class, field, +method (including locals and parameters), the symbol table builder will return +an error. The symbol table builder is implemented with the help of a visitor. + +In the second phase, the symbol table checker is given the newly constructed +symbol table. The symbol table checker will then check the type of each +expression according to the Java specification (adjusted for MiniJava). This is +also implemented with the help of a visitor. + +\subsection{JVM bytecode generation} +\section{Architecture of the code} +\section{Comparison of C and Java} +\label{sec:discussion} +\appendix +\section{Feedback} +\end{document}