Grammars with EPSILON issues #6

gitmh · 2016-04-30T22:20:48Z

I just found an example where parsing with the new epsilon version breaks:

S -> T
T -> a T E | z
E -> EPSILON

the word "aaaaz" cannot be parsed successfully but if I use only "T -> a T | z" it works fine. This was the smallest grammar I could think of showing the problem.
Rules with single epsilon work great otherwise like this on with input "a" is fine:
S -> a A
A -> EPSILON

also switching "T" and "a" works with "zaaaa":
S -> T
T -> T a E | z
E -> EPSILON

The nonterminal E is at the same place with the same rule but this time it works. Here is another broken one which might be the same problem:

N0 -> N1 b | b
N1 -> b X | a X | a N1 X
X -> b X | EPSILON

the word "aaab" does not work but should. I know the grammars do not look useful at all :-)

I tried to figure out why this happens but I wasn't able to find it.

lagodiuk · 2016-05-15T14:14:17Z

@gitmh thank you for your observation!
I am extremely sorry for the delayed response (it is just because I was a bit overloaded during the recent weeks).

Also, thanks a lot for the minimalistic examples of grammar, which were rather helpful for the troubleshooting.

The bug was inside method "completer": in my previous implementation, I have forgotten to write the return statement, with the information about whether parsing chart was changed or not (here is the corresponding line of code: https://github.com/lagodiuk/earley-parser-js/blob/master/earley-oop.js#L267), and in such case, by default JavaScript interpreter considers that the return value is "false".

Here is a link to the commit with fix: fb31a9e (in this case I am confident about the positive effect of the change, so I have committed the fix directly to the master branch).

Below presented the links to the demonstrations of parsing the grammars, which you have mentioned:

S -> T
T -> a T E | z
E -> _EPSILON_

https://jsfiddle.net/2mb3w9c1/36/

S -> N0
N0 -> N1 b | b
N1 -> b X | a X | a N1 X
X -> b X | _EPSILON_

https://jsfiddle.net/2mb3w9c1/37/

Also, inside my fix I have adapted a bit the logic of internal method "combinations", which is helpful for the proper displaying of the parsing trees, which contains epsilon transitions among non-epsilon symbols in the right-hand-side (however in this case, epsilon nodes are not displayed in the tree).
For example, the following grammar:

S -> T
T -> a T _EPSILON_ _EPSILON_ | z _EPSILON_

https://jsfiddle.net/2mb3w9c1/38/

lagodiuk · 2016-05-15T14:14:54Z

Fixed.
Commit: fb31a9e

lagodiuk · 2016-05-15T14:48:38Z

Also, I understand that somewhen it would be really useful to have some integration tests for the implementation of the parser.

But, as far as parser could handle potentially all possible context free grammars (CFG), and can parse all possible valid statements of the corresponding CFG, it was a bit unclear how to find the proper and exhaustive approach for testing the implementation of the parser.

And recently I've came up with an idea regarding the exhaustive testing of the parser: #8

lagodiuk added a commit that referenced this issue May 15, 2016

Fix of the issue: #6

fb31a9e

lagodiuk closed this as completed May 15, 2016

lagodiuk added the bug label May 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammars with EPSILON issues #6

Grammars with EPSILON issues #6

gitmh commented Apr 30, 2016

lagodiuk commented May 15, 2016

lagodiuk commented May 15, 2016

lagodiuk commented May 15, 2016 •

edited

Grammars with EPSILON issues #6

Grammars with EPSILON issues #6

Comments

gitmh commented Apr 30, 2016

lagodiuk commented May 15, 2016

lagodiuk commented May 15, 2016

lagodiuk commented May 15, 2016 • edited

lagodiuk commented May 15, 2016 •

edited