Backtracking parse engine

sanity edited this page Mar 25, 2013 · 2 revisions

This page assumes you've already read and digested the Parsing overview. represents the beating heart of LastCalc.

It is responsible for finding a way to take a list of tokens generated by the Tokenizer, and converting it into an "answer" (normally a single token), by applying various Parsers to it.

The good news is that it's actually surprisingly simple, at the time of writing the class is less than 100 lines of not very dense code.

The action happens in the parse() method. It first defines an ordered set (TreeSet) of ParseSteps, each of which represents the application of a particular parser to a particular token list.

The ordering is determined by ParseStep's getScore() method where a lower score is better. Loosely speaking it goes through the list of tokens that is the result of the parse step, adding 1 to the score if the token is a string, 0.8 if it is a Number, and 0.5 if it is some other type of class.

The rationale is that shorter lists of tokens are better, Strings are bad because they suggest that something hasn't been parsed, numbers are better, but other Java classes are the best since these are often the result of successful parse steps (eg. an Amount object). also has an isMinimal method which indicates that no further parsing is possible.

BacktrackingParseEngine's algorithm is simple but powerful, basically repeatedly attempt to apply parsers to the list of candidate parsesteps, starting with whichever has the lowest (best) score, until either it runs out of time, or a parsestep is determined to be minimal (through it's isMinimal() method).

This effectively implements what is known in AI-circles as "best-first search with backtracking".