Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange error in EBNF grammar #70

Closed
ArsenShnurkov opened this issue Mar 15, 2017 · 13 comments
Closed

Strange error in EBNF grammar #70

ArsenShnurkov opened this issue Mar 15, 2017 · 13 comments
Labels

Comments

@ArsenShnurkov
Copy link
Contributor

i am trying to create a test grammar for pliant with EBNF syntax.
Here is my grammar:
https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Grammar/syntax2.ebnf

file = ws directives ws ;
ws = [ ows ] ; /* white space */
ows = "_" ; /* obligatory white space */
directives = directive { ows directive } ;
directive = "0" | "1" ;

Here is text to parse:
https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Program.cs#L16

string input = "_0_1_0_0_1_1_";

Why it gives me

Recognized: False, Accepted: False
Error at position 4
(file, 0, 3)
@patrickhuber
Copy link
Owner

I tried to run your grammar definition through the unit test class I have for the EbnfParser class and it is trying to take ows = "_"; as a qualified identifier and it is failing. Trying to dig in to see why.

@ArsenShnurkov
Copy link
Contributor Author

Please try this first:
Grammar:
file = "1" { "2" } "1" ;
File:
string input = "12221";

@patrickhuber
Copy link
Owner

Here is the parse forest for:

file = "1" { "2" } "1";
(Ebnf.Definition, 0, 8) -> (Ebnf.Block, 0, 8)
(Ebnf.Block, 0, 8) -> (Ebnf.Rule, 0, 8)
(Ebnf.Rule, 0, 8) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 6, 1)  (Ebnf.Expression, 2, 7) (';', 23, 1)
(Ebnf.QualifiedIdentifier, 0, 1) -> ('file', 1, 4)
(Ebnf.Expression, 2, 7) -> (Ebnf.Term, 2, 7)
(Ebnf.Term, 2, 7) -> (Ebnf.Factor, 2, 3) (Ebnf.Term, 3, 7)
(Ebnf.Factor, 2, 3) -> (Ebnf.Literal, 2, 3)
(Ebnf.Literal, 2, 3) -> ('"1"', 5, 3)
(Ebnf.Term, 3, 7) -> (Ebnf.Factor, 3, 6) (Ebnf.Term, 6, 7)
(Ebnf.Factor, 3, 6) -> (Ebnf.Repetition, 3, 6)
(Ebnf.Repetition, 3, 6) ->  ('{', 12, 1)  (Ebnf.Expression, 4, 5) ('}', 18, 1)
(Ebnf.Expression, 4, 5) -> (Ebnf.Term, 4, 5)
(Ebnf.Term, 4, 5) -> (Ebnf.Factor, 4, 5)
(Ebnf.Factor, 4, 5) -> (Ebnf.Literal, 4, 5)
(Ebnf.Literal, 4, 5) -> ('"2"', 11, 3)
(Ebnf.Term, 6, 7) -> (Ebnf.Factor, 6, 7)
(Ebnf.Factor, 6, 7) -> (Ebnf.Literal, 6, 7)
(Ebnf.Literal, 6, 7) -> ('"1"', 17, 3)

Running that through the EBNF Parser produced an EBNF object structure without error.

I reduced your original grammar to the smallest subset that I could that still did not parse.

Here is the grammar:

ws = [ ows ] ; /* white space */
ows = "_" ; /* obligatory white space */

Here is the parse forest

 (Ebnf.Definition, 0, 10) -> (Ebnf.Block, 0, 6) (Ebnf.Definition, 6, 10)
 (Ebnf.Block, 0, 6) -> (Ebnf.Rule, 0, 6)
 (Ebnf.Rule, 0, 6) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 4, 1)  (Ebnf.Expression, 2, 5) (';', 14, 1)
 (Ebnf.QualifiedIdentifier, 0, 1) -> ('ws', 1, 2)
 (Ebnf.Expression, 2, 5) -> (Ebnf.Term, 2, 5)
 (Ebnf.Term, 2, 5) -> (Ebnf.Factor, 2, 5)
 (Ebnf.Factor, 2, 5) -> (Ebnf.Optional, 2, 5)
 (Ebnf.Optional, 2, 5) ->  ('[', 6, 1)  (Ebnf.Expression, 3, 4) (']', 12, 1)
 (Ebnf.Expression, 3, 4) -> (Ebnf.Term, 3, 4)
 (Ebnf.Term, 3, 4) -> (Ebnf.Factor, 3, 4)
 (Ebnf.Factor, 3, 4) -> (Ebnf.QualifiedIdentifier, 3, 4)
 (Ebnf.QualifiedIdentifier, 3, 4) -> ('ows', 3, 3)
 (Ebnf.Definition, 6, 10) -> (Ebnf.Block, 6, 10)
 (Ebnf.Block, 6, 10) -> (Ebnf.Rule, 6, 10)
 (Ebnf.Rule, 6, 10) ->  (Ebnf.QualifiedIdentifier, 6, 7)  ('=', 39, 1)  (Ebnf.Expression, 8, 9) (';', 45, 1)
 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)
 (Ebnf.Expression, 8, 9) -> (Ebnf.Term, 8, 9)
 (Ebnf.Term, 8, 9) -> (Ebnf.Factor, 8, 9)
 (Ebnf.Factor, 8, 9) -> (Ebnf.Literal, 8, 9)
 (Ebnf.Literal, 8, 9) -> ('"_"', 3, 3)

So the following line looks incorrect:

 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)

Two things strike me as odd,

  1. the "_" is being picked up as a qualified identifier.
  2. the token for that identifier has a 3,3 for the origin and location. This usually means it is of zero length, and unless there is a nullable rule in the grammar should not happen.

I need to investigate more, but marking this as a bug.

@ArsenShnurkov
Copy link
Contributor Author

ArsenShnurkov commented Mar 23, 2017

Running that through the EBNF Parser produced an EBNF object structure without error.

ok. Why whis EBNF object doesn't match test input when converted into grammar and passed into parser?

@patrickhuber
Copy link
Owner

There may be a secondary bug in the EbnfGrammarGenerator, I'm only looking at the parsing of the EBNF grammars you supplied. Once I find out what is going on I'll move on to debugging the generated grammars.

@ArsenShnurkov
Copy link
Contributor Author

ArsenShnurkov commented Mar 29, 2017

(Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)

  1. What numbers in brackets mean?
  2. Is it possible to add pairs <Line,Position> into dump ?
  3. what is the plan of fixing it?

"build the parse forest as you go, augmenting each Earley item with a pointer to node labelled with a triple (s, i, j)

  • s is a symbol or an LR(0) item (production rule with dot)
  • i and j give the section of the input string derived by this node"

but indexes in the dump don't correspond to positions in input grammar.
i.e. I don't understood how SPPF is constructed.

@patrickhuber
Copy link
Owner

Origin and Location are positions in the parse chart, not the input text. If you count tokens instead of counting text position you can see the relationship.

It is possible to get actual character locations in the parse, but requires call back into the IParseRunner to get the current values. To save on space, the library uses lexer rules for reading tokens and moves the parse forward with the tokens and not character by character. Getting line number would require counting carriage return and line feeds (depending on OS format). The data exists, just not how they are internally stored in the tree structure.

As for fixing, I accept pull requests and debug the error when I can. This is a spare time thing for me so I can't guarantee turn around time.

@ArsenShnurkov
Copy link
Contributor Author

ArsenShnurkov commented Mar 30, 2017

If you count tokens instead of counting text position you can see the relationship.

I counted tokens. First position is 0 before the first token, last position is 10 after the last token.
(';', 14, 1) (']', 12, 1) ('=', 39, 1) (';', 45, 1)
are all > 10.
What is the "Origin" ?

Under the word "plan" i mean the sequence of actions/operations, not the "time estimate".

@patrickhuber
Copy link
Owner

patrickhuber commented Mar 30, 2017

Sorry, my mistake. Non terminals in the parse forest are token positions in the parse chart. Tokens are character offset and capture length.

As to steps to fix, the bug is somewhere in the ParseEngine or ParseRunner classes. I'm trying to determine when the ('"_"' , 3, 3) node gets created, but it is not as clear as I expected. Seeing that this is a Token node, the 3, 3 makes less sense now. Once I can successfully parse your grammar, I'll focus on the grammar generator to figure out why it isn't parsing your text. Finally, renaming and refactoring to get spec compliant Ebnf, Abnf and Bnf.

@patrickhuber
Copy link
Owner

patrickhuber commented Mar 31, 2017

So I think the first part of the bug, where the incorrect token is being selected and the numbers for the token make no sense, is part of the Lexeme recycling logic. Here is the commit with some partial fixes 8bb24be .

Here is the test grammar

ws = [ ows ] ; /* white space */
ows = \"_\" ; /* obligatory white space */

Here is the updated parse forest with the corrected positions.

 (Ebnf.Definition, 0, 10) -> (Ebnf.Block, 0, 6) (Ebnf.Definition, 6, 10)
 (Ebnf.Block, 0, 6) -> (Ebnf.Rule, 0, 6)
 (Ebnf.Rule, 0, 6) ->  (Ebnf.QualifiedIdentifier, 0, 1)  ('=', 4, 1)  (Ebnf.Expression, 2, 5) (';', 14, 1)
 (Ebnf.QualifiedIdentifier, 0, 1) -> ('ws', 1, 2)
 (Ebnf.Expression, 2, 5) -> (Ebnf.Term, 2, 5)
 (Ebnf.Term, 2, 5) -> (Ebnf.Factor, 2, 5)
 (Ebnf.Factor, 2, 5) -> (Ebnf.Optional, 2, 5)
 (Ebnf.Optional, 2, 5) ->  ('[', 6, 1)  (Ebnf.Expression, 3, 4) (']', 12, 1)
 (Ebnf.Expression, 3, 4) -> (Ebnf.Term, 3, 4)
 (Ebnf.Term, 3, 4) -> (Ebnf.Factor, 3, 4)
 (Ebnf.Factor, 3, 4) -> (Ebnf.QualifiedIdentifier, 3, 4)
 (Ebnf.QualifiedIdentifier, 3, 4) -> ('ows', 8, 3)
 (Ebnf.Definition, 6, 10) -> (Ebnf.Block, 6, 10)
 (Ebnf.Block, 6, 10) -> (Ebnf.Rule, 6, 10)
 (Ebnf.Rule, 6, 10) ->  (Ebnf.QualifiedIdentifier, 6, 7)  ('=', 39, 1)  (Ebnf.Expression, 8, 9) (';', 45, 1)
 (Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 35, 3)
 (Ebnf.Expression, 8, 9) -> (Ebnf.Term, 8, 9)
 (Ebnf.Term, 8, 9) -> (Ebnf.Factor, 8, 9)
 (Ebnf.Factor, 8, 9) -> (Ebnf.Literal, 8, 9)
 (Ebnf.Literal, 8, 9) -> ('"_"', 41, 3)

You'll notice the QualifiedIdentifier is still listed as '"_"'. This is actually supposed to be ows. It appears that the lexeme for ows is being freed back into the token factory for some reason. Because the '"_"' is the next DfaLexeme to be used, the freed lexeme is grabbed up and used in the parse. That is why it shows up twice in the parse forest and has such a weird position.

It also appears that the position doesn't start at 0 in the first token, so I'll need to make sure I grab it before it is incremented.

@patrickhuber
Copy link
Owner

Commit 8bb24be fixes the bug in the ParseRunner that was causing lexemes to be improperly managed during cleanup.

Next step, tackling the grammar generator.

@patrickhuber
Copy link
Owner

Looks like the Parse Engine has a bug in it. Here is the debug output of the parse of your input for the grammar you supplied with the setting turned on to optimize right recursion:

0                                                 file ->●ws directives ws		(0)                     Start
0                                                 ws ->●[ows]		(0)                                  Predict
0                                                 file -> ws●directives ws		(0)                     Predict
0                                                 [ows] ->●ows		(0)                                 Predict
0                                                 [ows] ->●		(0)                                    Predict
0                                                 ws -> [ows]●		(0)                                 Predict
0                                                 [ows] : Pliant.Charts.TransitionState             Transition
0                                                 directives ->●directive {ows directive}		(0)      Predict
0                                                 ows ->●_		(0)                                     Predict
0                                                 directive ->●0		(0)                               Predict
0                                                 directive ->●1		(0)                               Predict
1                                                 ows -> _●		(0)                                    Scan _
0                                                 ows : Pliant.Charts.TransitionState               Transition
1                                                 ws -> [ows]●		(0)                                 Complete
1                                                 file -> ws●directives ws		(0)                     Complete
1                                                 directives ->●directive {ows directive}		(1)      Predict
1                                                 directive ->●0		(1)                               Predict
1                                                 directive ->●1		(1)                               Predict
2                                                 directive -> 0●		(1)                              Scan 0
1                                                 directives : Pliant.Charts.TransitionState        Transition
1                                                 directive : Pliant.Charts.TransitionState         Transition
2                                                 file -> ws directives●ws		(0)                     Complete
2                                                 ws ->●[ows]		(2)                                  Predict
2                                                 file -> ws directives ws●		(0)                    Predict
2                                                 [ows] ->●ows		(2)                                 Predict
2                                                 [ows] ->●		(2)                                    Predict
2                                                 ws -> [ows]●		(2)                                 Predict
2                                                 ws : Pliant.Charts.TransitionState                Transition
2                                                 [ows] : Pliant.Charts.TransitionState             Transition
2                                                 ows ->●_		(2)                                     Predict
3                                                 ows -> _●		(2)                                    Scan _
2                                                 ows : Pliant.Charts.TransitionState               Transition
3                                                 file -> ws directives ws●		(0)                    Complete

This fails at earley set id 3, which is the fourth earley set based on zero index.

Here is the way to turn off right recursion optimization:

var grammar = /* your grammar here*/
var parseEngine = new ParseEngine(grammar, new ParseEngineOptions(optimizeRightRecursion: false));

Here is the debug output for parsing your input with optimizing right recursion turned off:

0                                                 file ->●ws directives ws		(0)                     Start
0                                                 ws ->●[ows]		(0)                                  Predict
0                                                 file -> ws●directives ws		(0)                     Predict
0                                                 [ows] ->●ows		(0)                                 Predict
0                                                 [ows] ->●		(0)                                    Predict
0                                                 ws -> [ows]●		(0)                                 Predict
0                                                 directives ->●directive {ows directive}		(0)      Predict
0                                                 ows ->●_		(0)                                     Predict
0                                                 directive ->●0		(0)                               Predict
0                                                 directive ->●1		(0)                               Predict
1                                                 ows -> _●		(0)                                    Scan _
1                                                 [ows] -> ows●		(0)                                Complete
1                                                 ws -> [ows]●		(0)                                 Complete
1                                                 file -> ws●directives ws		(0)                     Complete
1                                                 directives ->●directive {ows directive}		(1)      Predict
1                                                 directive ->●0		(1)                               Predict
1                                                 directive ->●1		(1)                               Predict
2                                                 directive -> 0●		(1)                              Scan 0
2                                                 directives -> directive●{ows directive}		(1)      Complete
2                                                 {ows directive} ->●ows directive {ows directive}		(2)Predict
2                                                 {ows directive} ->●		(2)                          Predict
2                                                 directives -> directive {ows directive}●		(1)     Predict
2                                                 file -> ws directives●ws		(0)                     Complete
2                                                 ows ->●_		(2)                                     Predict
2                                                 ws ->●[ows]		(2)                                  Predict
2                                                 file -> ws directives ws●		(0)                    Predict
2                                                 [ows] ->●ows		(2)                                 Predict
2                                                 [ows] ->●		(2)                                    Predict
2                                                 ws -> [ows]●		(2)                                 Predict
3                                                 ows -> _●		(2)                                    Scan _
3                                                 {ows directive} -> ows●directive {ows directive}		(2)Complete
3                                                 [ows] -> ows●		(2)                                Complete
3                                                 ws -> [ows]●		(2)                                 Complete
3                                                 file -> ws directives ws●		(0)                    Complete
3                                                 directive ->●0		(3)                               Predict
3                                                 directive ->●1		(3)                               Predict
4                                                 directive -> 1●		(3)                              Scan 1
4                                                 {ows directive} -> ows directive●{ows directive}		(2)Complete
4                                                 {ows directive} ->●ows directive {ows directive}		(4)Predict
4                                                 {ows directive} ->●		(4)                          Predict
4                                                 {ows directive} -> ows directive {ows directive}●		(2)Predict
4                                                 directives -> directive {ows directive}●		(1)     Complete
4                                                 file -> ws directives●ws		(0)                     Complete
4                                                 ows ->●_		(4)                                     Predict
4                                                 ws ->●[ows]		(4)                                  Predict
4                                                 file -> ws directives ws●		(0)                    Predict
4                                                 [ows] ->●ows		(4)                                 Predict
4                                                 [ows] ->●		(4)                                    Predict
4                                                 ws -> [ows]●		(4)                                 Predict
5                                                 ows -> _●		(4)                                    Scan _
5                                                 {ows directive} -> ows●directive {ows directive}		(4)Complete
5                                                 [ows] -> ows●		(4)                                Complete
5                                                 ws -> [ows]●		(4)                                 Complete
5                                                 file -> ws directives ws●		(0)                    Complete
5                                                 directive ->●0		(5)                               Predict
5                                                 directive ->●1		(5)                               Predict
6                                                 directive -> 0●		(5)                              Scan 0
6                                                 {ows directive} -> ows directive●{ows directive}		(4)Complete
6                                                 {ows directive} ->●ows directive {ows directive}		(6)Predict
6                                                 {ows directive} ->●		(6)                          Predict
6                                                 {ows directive} -> ows directive {ows directive}●		(4)Predict
6                                                 {ows directive} -> ows directive {ows directive}●		(2)Complete
6                                                 directives -> directive {ows directive}●		(1)     Complete
6                                                 file -> ws directives●ws		(0)                     Complete
6                                                 ows ->●_		(6)                                     Predict
6                                                 ws ->●[ows]		(6)                                  Predict
6                                                 file -> ws directives ws●		(0)                    Predict
6                                                 [ows] ->●ows		(6)                                 Predict
6                                                 [ows] ->●		(6)                                    Predict
6                                                 ws -> [ows]●		(6)                                 Predict
7                                                 ows -> _●		(6)                                    Scan _
7                                                 {ows directive} -> ows●directive {ows directive}		(6)Complete
7                                                 [ows] -> ows●		(6)                                Complete
7                                                 ws -> [ows]●		(6)                                 Complete
7                                                 file -> ws directives ws●		(0)                    Complete
7                                                 directive ->●0		(7)                               Predict
7                                                 directive ->●1		(7)                               Predict
8                                                 directive -> 0●		(7)                              Scan 0
8                                                 {ows directive} -> ows directive●{ows directive}		(6)Complete
8                                                 {ows directive} ->●ows directive {ows directive}		(8)Predict
8                                                 {ows directive} ->●		(8)                          Predict
8                                                 {ows directive} -> ows directive {ows directive}●		(6)Predict
8                                                 {ows directive} -> ows directive {ows directive}●		(4)Complete
8                                                 {ows directive} -> ows directive {ows directive}●		(2)Complete
8                                                 directives -> directive {ows directive}●		(1)     Complete
8                                                 file -> ws directives●ws		(0)                     Complete
8                                                 ows ->●_		(8)                                     Predict
8                                                 ws ->●[ows]		(8)                                  Predict
8                                                 file -> ws directives ws●		(0)                    Predict
8                                                 [ows] ->●ows		(8)                                 Predict
8                                                 [ows] ->●		(8)                                    Predict
8                                                 ws -> [ows]●		(8)                                 Predict
9                                                 ows -> _●		(8)                                    Scan _
9                                                 {ows directive} -> ows●directive {ows directive}		(8)Complete
9                                                 [ows] -> ows●		(8)                                Complete
9                                                 ws -> [ows]●		(8)                                 Complete
9                                                 file -> ws directives ws●		(0)                    Complete
9                                                 directive ->●0		(9)                               Predict
9                                                 directive ->●1		(9)                               Predict
10                                                directive -> 1●		(9)                              Scan 1
10                                                {ows directive} -> ows directive●{ows directive}		(8)Complete
10                                                {ows directive} ->●ows directive {ows directive}		(10)Predict
10                                                {ows directive} ->●		(10)                         Predict
10                                                {ows directive} -> ows directive {ows directive}●		(8)Predict
10                                                {ows directive} -> ows directive {ows directive}●		(6)Complete
10                                                {ows directive} -> ows directive {ows directive}●		(4)Complete
10                                                {ows directive} -> ows directive {ows directive}●		(2)Complete
10                                                directives -> directive {ows directive}●		(1)     Complete
10                                                file -> ws directives●ws		(0)                     Complete
10                                                ows ->●_		(10)                                    Predict
10                                                ws ->●[ows]		(10)                                 Predict
10                                                file -> ws directives ws●		(0)                    Predict
10                                                [ows] ->●ows		(10)                                Predict
10                                                [ows] ->●		(10)                                   Predict
10                                                ws -> [ows]●		(10)                                Predict
11                                                ows -> _●		(10)                                   Scan _
11                                                {ows directive} -> ows●directive {ows directive}		(10)Complete
11                                                [ows] -> ows●		(10)                               Complete
11                                                ws -> [ows]●		(10)                                Complete
11                                                file -> ws directives ws●		(0)                    Complete
11                                                directive ->●0		(11)                              Predict
11                                                directive ->●1		(11)                              Predict
12                                                directive -> 1●		(11)                             Scan 1
12                                                {ows directive} -> ows directive●{ows directive}		(10)Complete
12                                                {ows directive} ->●ows directive {ows directive}		(12)Predict
12                                                {ows directive} ->●		(12)                         Predict
12                                                {ows directive} -> ows directive {ows directive}●		(10)Predict
12                                                {ows directive} -> ows directive {ows directive}●		(8)Complete
12                                                {ows directive} -> ows directive {ows directive}●		(6)Complete
12                                                {ows directive} -> ows directive {ows directive}●		(4)Complete
12                                                {ows directive} -> ows directive {ows directive}●		(2)Complete
12                                                directives -> directive {ows directive}●		(1)     Complete
12                                                file -> ws directives●ws		(0)                     Complete
12                                                ows ->●_		(12)                                    Predict
12                                                ws ->●[ows]		(12)                                 Predict
12                                                file -> ws directives ws●		(0)                    Predict
12                                                [ows] ->●ows		(12)                                Predict
12                                                [ows] ->●		(12)                                   Predict
12                                                ws -> [ows]●		(12)                                Predict
13                                                ows -> _●		(12)                                   Scan _
13                                                {ows directive} -> ows●directive {ows directive}		(12)Complete
13                                                [ows] -> ows●		(12)                               Complete
13                                                ws -> [ows]●		(12)                                Complete
13                                                file -> ws directives ws●		(0)                    Complete
13                                                directive ->●0		(13)                              Predict
13                                                directive ->●1		(13)                              Predict

You can see it goes all the way to the end and succeeds.

The bug is most likely in this private method in the ParseEngine.cs

private bool IsNextStateQuasiComplete(IState state)

For now, you can use the optimization disabling work around and I'll work on getting a fix in place based on these test results.

@patrickhuber
Copy link
Owner

The latest deployment version 5.0.0 should fix this error. I've tested it with your sample grammar and it appears to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants