Skip to content

Parsing overview

Ian Clarke edited this page Jan 11, 2016 · 10 revisions

The goal of LastCalc's parsing system is to transform a list of tokens into a more desirable list, where more desirable typically means fewer tokens, ideally a single token that isn't a string. This is the "answer". A token can be any Java object, such as a String, a JScience Rational, a Java List, or even a JSoup Document. The beauty of this is that LastCalc can easily build on the rich ecosystem of Java libraries.

These lists of tokens are stored in a TokenList (see TokenList.java). This is similar to a Java List, except that they cannot be modified (ie. they are immutable), and they have various useful methods to create new TokenLists based on existing ones (eg. by appending or replacing tokens).

For example, we might want to transform "3+5", which is three tokens, a number 3, a string "+", and a number 5, into a single token - the number 8.

We do this by trying to apply "parsers" to the list of tokens. For example, there is a class called MathBiOp.java that is responsible for applying simple mathematical operators like '+'. When you call its parse() method, it will try to find sometime like "3+5" and will replace it with "8" - resulting in a shorter token list. If it can't find anything it can parse, it will fail. See Parsers for more information on how you can create your own parser.

The trick is that we need to be smart about how we find a parser that we can successfully use to parse the tokens in any given list, because trying all of them would be very inefficient. See Parser Pickers for more information on how we do this.

The other thing to consider is that sometimes we can successfully apply a parser, but later it turns out that it was a mistake. Consider this:

15 pounds in euros

Initially UnitParser.java, which is responsible for identifying "units" (like "kilograms" or "miles"), might transform "pounds" into a Unit object that represents the weight measurement of "pounds".

But then when AmountConverterParser.java tries to convert this weight to a currency, it fails because you can't.

We solve this through a process known as backtracking, see Backtracking Parse Engine for a detailed explanation of this.

Future

Currently, the parsing mechanism also handles actual math, so "1+1" eventually parses to "2".

A better approach would be for 1+1 to parse to Java objects that represent the expression "1+1", and only then does it execute the addition.

One reason this would be better is that it would then be easier to handle equations (an expression, an equality/inequality, and another expression) - which would bring LastCalc closer to the functionality of a Wolfram Alpha.

Something went wrong with that request. Please try again.