A simple(ish) lexer, parser, and interpreter for Scheme style prefix notation maths expressions.
(e.g. (+ 1 2 (- 3 4) )).
The parser is written using a simple top-down recursive descent algorithm with backtracking, based on a context-free grammar described below. I have also implemented a visualisation tool to show the step by step process of building the parse tree as a gif (including backtracking).
This was written mainly to help understand the parsing and grammar sections of CS2001 - hence, I have not implemented optimisations like look ahead, as these are not covered in this module.
- Add negative number and integer division support.
- Implement more complex grammars.
- Write a bottom-up table based parser (once it is covered more in lectures).
An example visualisation of the parsing of tokens into an AST:
I defined the grammar as follows:
(Capitalised names are terminal tokens, generated by my lexer).
expression -> OPEN_BRACKET function args CLOSED_BRACKET
function -> ADD | SUBTRACT | MULTIPLY
args -> expression args' | INTEGER args'
args' -> INTEGER args' | expression args' | NIL
- For running the interpreter, no dependencies are needed.
- For generating visualisations, imagemagick and graphviz are required (available on all good package managers).
To generate a visualization (on Mac and Linux), the run.sh script can be used.
This takes in an expression to visualise as the first argument.
For example:
./run.sh '(+ 1 2 3 (* 4 5))'This will generate graph.gif in the current directory.
The interpreter can be built and ran using Maven.
Note that this project uses Java preview features - the --enable-preview flag is needed when running any code from this project.
./mvnw clean package
java --enable-preview -cp target/parser-demo-1.0-SNAPSHOT.jar com.dewally.niklas.parserdemo.prefixCalc.InterpreterSome books I found useful when trying to understand this:
-
Johnsonbaugh, Richard - Discrete Mathematics
-
Aho, Kam, Sethi, Ullman - Compilers: Principles, Techniques, and Tools. (chapters 2 and 4)
