-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange error in EBNF grammar #70
Comments
I tried to run your grammar definition through the unit test class I have for the EbnfParser class and it is trying to take ows = "_"; as a qualified identifier and it is failing. Trying to dig in to see why. |
Please try this first: |
Here is the parse forest for:
Running that through the EBNF Parser produced an EBNF object structure without error. I reduced your original grammar to the smallest subset that I could that still did not parse. Here is the grammar:
Here is the parse forest
So the following line looks incorrect:
Two things strike me as odd,
I need to investigate more, but marking this as a bug. |
ok. Why whis EBNF object doesn't match test input when converted into grammar and passed into parser? |
There may be a secondary bug in the EbnfGrammarGenerator, I'm only looking at the parsing of the EBNF grammars you supplied. Once I find out what is going on I'll move on to debugging the generated grammars. |
(Ebnf.QualifiedIdentifier, 6, 7) -> ('"_"', 3, 3)
"build the parse forest as you go, augmenting each Earley item with a pointer to node labelled with a triple (s, i, j)
but indexes in the dump don't correspond to positions in input grammar. |
Origin and Location are positions in the parse chart, not the input text. If you count tokens instead of counting text position you can see the relationship. It is possible to get actual character locations in the parse, but requires call back into the IParseRunner to get the current values. To save on space, the library uses lexer rules for reading tokens and moves the parse forward with the tokens and not character by character. Getting line number would require counting carriage return and line feeds (depending on OS format). The data exists, just not how they are internally stored in the tree structure. As for fixing, I accept pull requests and debug the error when I can. This is a spare time thing for me so I can't guarantee turn around time. |
I counted tokens. First position is 0 before the first token, last position is 10 after the last token. Under the word "plan" i mean the sequence of actions/operations, not the "time estimate". |
Sorry, my mistake. Non terminals in the parse forest are token positions in the parse chart. Tokens are character offset and capture length. As to steps to fix, the bug is somewhere in the ParseEngine or ParseRunner classes. I'm trying to determine when the ('"_"' , 3, 3) node gets created, but it is not as clear as I expected. Seeing that this is a Token node, the 3, 3 makes less sense now. Once I can successfully parse your grammar, I'll focus on the grammar generator to figure out why it isn't parsing your text. Finally, renaming and refactoring to get spec compliant Ebnf, Abnf and Bnf. |
So I think the first part of the bug, where the incorrect token is being selected and the numbers for the token make no sense, is part of the Lexeme recycling logic. Here is the commit with some partial fixes 8bb24be . Here is the test grammar
Here is the updated parse forest with the corrected positions.
You'll notice the QualifiedIdentifier is still listed as '"_"'. This is actually supposed to be ows. It appears that the lexeme for ows is being freed back into the token factory for some reason. Because the '"_"' is the next DfaLexeme to be used, the freed lexeme is grabbed up and used in the parse. That is why it shows up twice in the parse forest and has such a weird position. It also appears that the position doesn't start at 0 in the first token, so I'll need to make sure I grab it before it is incremented. |
Commit 8bb24be fixes the bug in the ParseRunner that was causing lexemes to be improperly managed during cleanup. Next step, tackling the grammar generator. |
Looks like the Parse Engine has a bug in it. Here is the debug output of the parse of your input for the grammar you supplied with the setting turned on to optimize right recursion:
This fails at earley set id 3, which is the fourth earley set based on zero index. Here is the way to turn off right recursion optimization:
Here is the debug output for parsing your input with optimizing right recursion turned off:
You can see it goes all the way to the end and succeeds. The bug is most likely in this private method in the ParseEngine.cs
For now, you can use the optimization disabling work around and I'll work on getting a fix in place based on these test results. |
The latest deployment version 5.0.0 should fix this error. I've tested it with your sample grammar and it appears to work. |
i am trying to create a test grammar for pliant with EBNF syntax.
Here is my grammar:
https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Grammar/syntax2.ebnf
Here is text to parse:
https://github.com/ArsenShnurkov/Pliant-test-for-include/blob/43923b148b0dfccc3490d2e8e03b03dc65bd0884/main/Program.cs#L16
Why it gives me
The text was updated successfully, but these errors were encountered: