Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammars with EPSILON issues #6

Closed
gitmh opened this issue Apr 30, 2016 · 3 comments
Closed

Grammars with EPSILON issues #6

gitmh opened this issue Apr 30, 2016 · 3 comments
Labels

Comments

@gitmh
Copy link

gitmh commented Apr 30, 2016

I just found an example where parsing with the new epsilon version breaks:

S -> T
T -> a T E | z
E -> EPSILON

the word "aaaaz" cannot be parsed successfully but if I use only "T -> a T | z" it works fine. This was the smallest grammar I could think of showing the problem.
Rules with single epsilon work great otherwise like this on with input "a" is fine:
S -> a A
A -> EPSILON

also switching "T" and "a" works with "zaaaa":
S -> T
T -> T a E | z
E -> EPSILON

The nonterminal E is at the same place with the same rule but this time it works. Here is another broken one which might be the same problem:

N0 -> N1 b | b
N1 -> b X | a X | a N1 X
X -> b X | EPSILON

the word "aaab" does not work but should. I know the grammars do not look useful at all :-)

I tried to figure out why this happens but I wasn't able to find it.

lagodiuk added a commit that referenced this issue May 15, 2016
@lagodiuk
Copy link
Owner

@gitmh thank you for your observation!
I am extremely sorry for the delayed response (it is just because I was a bit overloaded during the recent weeks).

Also, thanks a lot for the minimalistic examples of grammar, which were rather helpful for the troubleshooting.

The bug was inside method "completer": in my previous implementation, I have forgotten to write the return statement, with the information about whether parsing chart was changed or not (here is the corresponding line of code: https://github.com/lagodiuk/earley-parser-js/blob/master/earley-oop.js#L267), and in such case, by default JavaScript interpreter considers that the return value is "false".

Here is a link to the commit with fix: fb31a9e (in this case I am confident about the positive effect of the change, so I have committed the fix directly to the master branch).

Below presented the links to the demonstrations of parsing the grammars, which you have mentioned:

S -> T
T -> a T E | z
E -> _EPSILON_

https://jsfiddle.net/2mb3w9c1/36/

S -> N0
N0 -> N1 b | b
N1 -> b X | a X | a N1 X
X -> b X | _EPSILON_

https://jsfiddle.net/2mb3w9c1/37/

Also, inside my fix I have adapted a bit the logic of internal method "combinations", which is helpful for the proper displaying of the parsing trees, which contains epsilon transitions among non-epsilon symbols in the right-hand-side (however in this case, epsilon nodes are not displayed in the tree).
For example, the following grammar:

S -> T
T -> a T _EPSILON_ _EPSILON_ | z _EPSILON_

https://jsfiddle.net/2mb3w9c1/38/

@lagodiuk
Copy link
Owner

Fixed.
Commit: fb31a9e

@lagodiuk
Copy link
Owner

lagodiuk commented May 15, 2016

Also, I understand that somewhen it would be really useful to have some integration tests for the implementation of the parser.

But, as far as parser could handle potentially all possible context free grammars (CFG), and can parse all possible valid statements of the corresponding CFG, it was a bit unclear how to find the proper and exhaustive approach for testing the implementation of the parser.

And recently I've came up with an idea regarding the exhaustive testing of the parser: #8

@lagodiuk lagodiuk added the bug label May 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants