Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite recursion when parsing expression grammar #23

Closed
flewellyn opened this issue Dec 17, 2017 · 5 comments
Closed

Infinite recursion when parsing expression grammar #23

flewellyn opened this issue Dec 17, 2017 · 5 comments

Comments

@flewellyn
Copy link

I am trying to write a parser program that will parse a very minimal expression language; for now, all I want are variable names, string or numeric values, basic comparators (=, !=, <, >, <=, >=), boolean operators (AND, OR, and NOT), symbols for true, false, and null, and parentheses. I want to break these expressions up into parts that I can then assign to a syntax tree.

I've been able to make a parser which can do the simple comparison case, of "variable comparator value" form: x = 1, y = 2, etc. I can make it strip off the parens, though I don't yet know how to implement precedence. And I do have basic handling of an isolated boolean operator: I can parse a single AND expression, or OR expression, of the form "foo = 1 AND bar != 2" or the like.

But when I try to use the generalized form, the system overflows the maximum recursion stack. I'm not sure what I'm doing wrong here?

I've attached my code file. I cribbed some of it from the JSON example, such as the lexeme trick for dealing with whitespace.

parsertest.py.txt

@bugaevc
Copy link
Member

bugaevc commented Dec 17, 2017

You're getting infinite recursion because you've said that (modulo compare and optional lparen) an expression starts with an expression. Of course, that causes parsy to recurse.

There are ways to write code like this differently so that it's not recursive, check out simple_eval in the examples folder.

We're also thinking (see #5) about enhancing parsy to better handle cases like this. In a future version, it should be a) very easy to make code like yours work (automagic recursion elimination) and b) even easier to achieve what you're trying to achieve in more comprehensive way and less lines of code (chaining combinators).

@spookylukey
Copy link
Member

If you are trying to google for other solutions, this is a well-known problem with this type of parsing approach, called left recursive grammar.

@flewellyn
Copy link
Author

I see. So, would I be better off waiting on the left-recursion elimination, trying to rewrite to eliminate it, or go with an LR solution?

I'm having a bit of trouble following the parsing logic in simple_eval.py. I see that the expr top-level parser is the additive parser, which first tries to yield a multiplicative, and then multiplicative tries to yield a simple, which in turn tries to yield an expr or a number. This seems oddly tangled to me; where does the parsing "bottom out", at the number case?

And then both additive and multiplicative contain infinite loops which then break out when they can find no more operations. I'm guessing doing it this way is to handle operator precedence, which makes sense. But the tangled interdependence is hard to follow sans comments.

My apologies, this isn't a general help forum. But thank you for offering advice.

@bugaevc
Copy link
Member

bugaevc commented Dec 18, 2017

I see. So, would I be better off waiting on the left-recursion elimination, trying to rewrite to eliminate it, or go with an LR solution?

Try to eliminate it, it's not at all that hard.

I'm having a bit of trouble following the parsing logic in simple_eval.py. I see that the expr top-level parser is the additive parser, which first tries to yield a multiplicative, and then multiplicative tries to yield a simple, which in turn tries to yield an expr or a number. This seems oddly tangled to me; where does the parsing "bottom out", at the number case?

It's very important that simple starts with either number (which ends the recursion) or a non-optional lparen (and then a recursive expr again). But in any case, we parse at least one more input character.

I'm guessing doing it this way is to handle operator precedence, which makes sense.

Yes, this is to handle operator precedence and associativity (1 - 2 - 3 ≠ 1 - (2 - 3)). If you don't care about those, it gets much simpler, ex. your boolean expressions could be parsed like this:

op = simple '&&' expr | simple '||' expr | '!' expr
simple = 'true' | 'false' | '(' op ')'
expr = simple | op

(This is not parsy code, but should hopefully make sense). Again, this demonstrates the same idea: no left recursion, delegate to simple which always parses at least one character from the input.

But the tangled interdependence is hard to follow sans comments. My apologies, this isn't a general help forum. But thank you for offering advice.

You see, there's a lot of theory behind this (called the formal language theory) — it describes regular expressions, context-free grammars, LL and LR parsers, and so on. That way of manual left recursion elimination and that way to parse expressions are very standard and familiar for people who know the theoretical side of things. simple_eval example is not meant to be an introduction into formal language theory for those who don't, it's more of an example how to use parsy specifically, i.e. how those theoretical concepts (if you know them) map to parsy's operators and combinators.

If you're interested in writing complex parsers or even compilers, I recommend you to read up on these things, e.g. this book will teach you all of that and more.

@flewellyn
Copy link
Author

It's been ages since I read the Dragon Book. I might need a refresher.

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants