LARL parser issue with multiple rules that produce nothing #250

petee-d · 2018-10-15T17:56:29Z

I believe the following example should parse:

from lark import Lark
p = Lark("""
    a: b c d
    b: "B"
    c: | "C"
    d: | "D"
""", start='a', parser='lalr')
print(p.parse('B'))

I would expect it to produce Tree(a, [Tree(b, [Token(B, 'B')]), Tree(c, []), Tree(d, [])]), but instead it fails with

UnexpectedToken: Unexpected token Token(B, 'B') at line 1, column 1.
Expected: C, D

I think the generated parse table is missing reduce b for $END in state 6 - it does so for lookahead C or D, but not for $END. Curiously, it works OK if instead of epsilon producing rules for c and d I use a: b c? d?, but this is much less convenient for my usage as I will have to inspect what types are the optional arguments to the transformer method a to find out what they mean instead of just having them in the correspondingly named argument.

The text was updated successfully, but these errors were encountered:

petee-d · 2018-10-15T17:58:12Z

By the way, the following works, so I suspect it is an issue only with $END.

from lark import Lark
p = Lark("""
    a: b c d e
    !b: "B"
    !c: | "C"
    !d: | "D"
    !e: "E" 
""", start='a', parser='lalr')
print(p.parse('BE'))

petee-d · 2018-10-15T18:11:45Z

Also, I think I see a small bug in https://github.com/lark-parser/lark/blob/master/lark/parsers/lalr_parser.py#L49 - this will raise UnexpectedError with the last encountered token even if get_action was called from https://github.com/lark-parser/lark/blob/master/lark/parsers/lalr_parser.py#L83 and thus the unexpected lookahead was actually the end of input. It should report something like UnexpectedEndOfInput (or UnexpectedToken with some fake end of input token to avoid breaking existing error handling code), not UnexpectedToken with the innocent last token. You can see an example of such misleading error message in my initial bug report.

erezsh · 2018-10-15T18:53:29Z

Wow, you found a deep hidden bug in my implementation! Well done!

It was an off-by-one error, which is reportedly one of the hardest things in programming :)

I pushed a fix to master, check to make sure it solves your issue.

petee-d · 2018-10-15T19:38:09Z

Wow, @erezsh, such time of response is simply amazing! Doing a fix of such a deeply hidden bug from issue to master within an hour is quite something. :) I'm very happy to say my real grammar where I discovered this (100+ lines) now works perfectly. I really love the project, the best parsing library I have seen so far, and I've previously been switching them like socks in my project.

I've heard there are exactly 2 hard problems in computer science: cache invalidation, naming things, and off-by-one errors.

BTW, would you like me to create another issue for the error reporting problem I mentioned in my last comment? It might take some creativity to fix it in a backwards compatible way, probably not something to bundle with another fix.

petee-d · 2018-10-15T19:49:28Z

Oh, actually the $END error reporting problem is already fixed in master. Here I thought one hour was fast, but this problem was fixed even before I reported it. :D

erezsh · 2018-10-15T19:53:33Z

Thanks petee, that's really nice to hear :)
I hope you keep using Lark and help me make it even better.

Luckily cache invalidation isn't a major concern in Python, so I only have 1 hard problems left.

would you like me to create another issue for the error reporting problem

I believe it's already fixed in master, but not released to pip yet. If you find that it still persists in the latest master, then please do open another issue.

erezsh · 2018-10-15T19:54:12Z

Ha, and you noticed it before I managed to correct you. Soon we'll start travelling backwards in time ;)

petee-d · 2018-10-15T19:57:56Z

I would love to help. I'm quite fond of parsing ever since school, I was actually determined to try to debug the problem and make a PR tomorrow... but you beat me to it. :) If I notice any other problem or have a suggestion for improvement, I'll make sure to do the PR before creating an issue. ;)

erezsh · 2018-10-15T19:59:39Z

lol that's probably wise. If you feel like taking a crack at some of the open issues that would be great, although admittedly most of them are boring usability issues, rather than cool algorithmic bugs. (Except for the ones about Earley, but they are already on the way to be fixed)

erezsh added a commit that referenced this issue Oct 15, 2018

Fixed a deep bug in grammar analysis involving empty rules (Issue #250)

6c8ba76

erezsh closed this as completed Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LARL parser issue with multiple rules that produce nothing #250

LARL parser issue with multiple rules that produce nothing #250

petee-d commented Oct 15, 2018

petee-d commented Oct 15, 2018 •

edited

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

petee-d commented Oct 15, 2018

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

erezsh commented Oct 15, 2018

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

LARL parser issue with multiple rules that produce nothing #250

LARL parser issue with multiple rules that produce nothing #250

Comments

petee-d commented Oct 15, 2018

petee-d commented Oct 15, 2018 • edited

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

petee-d commented Oct 15, 2018

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

erezsh commented Oct 15, 2018

petee-d commented Oct 15, 2018

erezsh commented Oct 15, 2018

petee-d commented Oct 15, 2018 •

edited