-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty String Rules (Still) Fail to Match #112
Comments
ah thanks for raising this issue! I really appreciate the details for reproducing! I'll take a look 😄 |
just to share some progress, I started looking into this today, and was able to reduce the bug test case to: #[test]
fn parse_nested_empty() {
let grammar: Grammar = "
<start> ::= <a> <empty>
<a> ::= 'a' <empty>
<empty> ::= ''"
.parse()
.unwrap();
let input = "a";
let parses = parse(&grammar, input);
assert_eq!(parses.count(), 1);
} The bug seems to be caused by incorrectly de-duplicating Earley states. Because "empty" rules do not advance the input cursor, different predictions get treated as "duplicates". I have a fix working locally, but it basically "solves" the problem by duplicating all parse attempts 😅 Next chance I get, I will brainstorm for a more realistic solution |
Any progress on a fix? I may be hitting the same issue – here's another bare-bones example you can check against: #[test]
fn test() {
let grammar: Grammar = "
<balanced> ::= <left> <balanced> <right>
| ''
<left> ::= <opt-ws> '(' <opt-ws>
<right> ::= <opt-ws> ')' <opt-ws>
<opt-ws> ::= '' | <ws>
<ws> ::= ' ' | ' ' <ws>
"
.parse()
.unwrap();
let input = "()";
assert!(
grammar.parse_input(input).next().is_some(),
"can't parse: {input}"
);
} Greatly appreciate everybody's work so far! Would love to use However, this issue is a blocker for my use case 🙁 |
@CrockAgile in #92 you reference this tutorial, which has a section on empty rules: https://loup-vaillant.fr/tutorials/earley-parsing/empty-rules – maybe there's something helpful there, where it discusses Aycock and Horspool's (2002) paper? If you get any insights from fixing this issue, it might be worth including the findings in your previous write-up (thanks again for that): https://gist.github.com/CrockAgile/d8cb79607b6c6c49c06a97bd652dc219 Either way, we are in good company – as mentioned by MARPA's author:
|
Thank you for another example! I do have a working fix, based on the exact solution you linked. So glad we are on the same page. But while implementing it, a surprising amount of refactoring was needed (allowing for Earley matches on "nullable productions", not just complete "States", if you are curious 😅 ) That refactor introduced a significant performance regression 😱 But I fixed that last night! So TLDR: I should have it up for review tomorrow 🤞 |
this fix should now be available under but this was my first time publishing this crate, so let me know if something is broken 😅 |
Looks good here - thank you for addressing this! |
Describe the bug
Empty string rules still fail to match in some situations. This is the same as #108 although after it's fix it doesn't happen in all situations but still happens.
To Reproduce
Load grammar from
tests/fixtures/bnf.bnf
Try to parse the first line of the text:
"<syntax> ::= <rule> | <rule> <syntax>\n"
This match will fail.
Try to parse the same line with whitespace before the newline
"<syntax> ::= <rule> | <rule> <syntax> \n"
This will parse properly
The first example should pass given
<line-end>
may begin with<opt-whitespace>
which has an empty string ruleUnfortunately I haven't been able to find a smaller example. Weirdly it does parse
"<syntax> ::= <rule>"
correctly but not the more complicated example above.If there support for special characters like
\n
or a built in definition for<EOL>
it could be good to have a test that a grammer made fromtests/fixtures/bnf.bnf
can parse the file itself.Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: