Empty String Rules (Still) Fail to Match #112

nskobelevs · 2022-11-14T23:28:25Z

Describe the bug
Empty string rules still fail to match in some situations. This is the same as #108 although after it's fix it doesn't happen in all situations but still happens.

To Reproduce

Load grammar from tests/fixtures/bnf.bnf
Try to parse the first line of the text:
"<syntax> ::= <rule> | <rule> <syntax>\n"
This match will fail.
Try to parse the same line with whitespace before the newline
"<syntax> ::= <rule> | <rule> <syntax> \n"
This will parse properly

The first example should pass given <line-end> may begin with <opt-whitespace> which has an empty string rule

<line-end>       ::= <opt-whitespace> <EOL>
<opt-whitespace> ::= "" | " " <opt-whitespace>
<EOL>            ::= "
"

Unfortunately I haven't been able to find a smaller example. Weirdly it does parse "<syntax> ::= <rule>" correctly but not the more complicated example above.

If there support for special characters like \n or a built in definition for <EOL> it could be good to have a test that a grammer made from tests/fixtures/bnf.bnf can parse the file itself.

Desktop (please complete the following information):

MacOS Ventura 13.0.1

The text was updated successfully, but these errors were encountered:

shnewto · 2022-11-14T23:44:17Z

ah thanks for raising this issue! I really appreciate the details for reproducing!

I'll take a look 😄

CrockAgile · 2022-11-17T20:48:53Z

just to share some progress, I started looking into this today, and was able to reduce the bug test case to:

#[test]
fn parse_nested_empty() {
    let grammar: Grammar = "
    <start> ::= <a> <empty>
    <a> ::= 'a' <empty>
    <empty> ::= ''"
        .parse()
        .unwrap();

    let input = "a";

    let parses = parse(&grammar, input);
    assert_eq!(parses.count(), 1);
}

The bug seems to be caused by incorrectly de-duplicating Earley states. Because "empty" rules do not advance the input cursor, different predictions get treated as "duplicates".

I have a fix working locally, but it basically "solves" the problem by duplicating all parse attempts 😅

Next chance I get, I will brainstorm for a more realistic solution

amascolo · 2022-11-28T18:18:59Z

Any progress on a fix?

I may be hitting the same issue – here's another bare-bones example you can check against:

#[test]
fn test() {
    let grammar: Grammar = "
    <balanced> ::= <left> <balanced> <right>
                 | ''
    
    <left> ::= <opt-ws> '(' <opt-ws>
    <right> ::= <opt-ws> ')' <opt-ws>
    
    <opt-ws> ::= '' | <ws>
    <ws> ::= ' ' | ' ' <ws>
    "
    .parse()
    .unwrap();

    let input = "()";

    assert!(
        grammar.parse_input(input).next().is_some(),
        "can't parse: {input}"
    );
}

Greatly appreciate everybody's work so far!

Would love to use bnf as I don't see an established Rust library that supports Earley parsing.

However, this issue is a blocker for my use case 🙁

amascolo · 2022-11-28T18:31:09Z

@CrockAgile in #92 you reference this tutorial, which has a section on empty rules: https://loup-vaillant.fr/tutorials/earley-parsing/empty-rules – maybe there's something helpful there, where it discusses Aycock and Horspool's (2002) paper?

If you get any insights from fixing this issue, it might be worth including the findings in your previous write-up (thanks again for that): https://gist.github.com/CrockAgile/d8cb79607b6c6c49c06a97bd652dc219

Either way, we are in good company – as mentioned by MARPA's author:

Earley's original 1970 algorithm actually had a bug in its handling of nullables. There was an easy fix, but it made the algorithm slower. Since efficiency was already seen as the reason to prefer other parsers, this was a big deal.

CrockAgile · 2022-11-29T00:56:38Z

Thank you for another example! I do have a working fix, based on the exact solution you linked. So glad we are on the same page.

But while implementing it, a surprising amount of refactoring was needed (allowing for Earley matches on "nullable productions", not just complete "States", if you are curious 😅 )

That refactor introduced a significant performance regression 😱

But I fixed that last night! So

TLDR: I should have it up for review tomorrow 🤞

CrockAgile · 2022-11-30T21:57:12Z

this fix should now be available under v0.4.3

but this was my first time publishing this crate, so let me know if something is broken 😅

nskobelevs · 2022-11-30T23:05:01Z

Looks good here - thank you for addressing this!

CrockAgile mentioned this issue Nov 29, 2022

Detect and auto complete "nullable" productions #113

Merged

CrockAgile closed this as completed in #113 Nov 30, 2022

This was referenced Dec 7, 2022

Nullable productions (Still, Still) Fail to Match #117

Closed

Right-recursive productions Fail to Match #118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty String Rules (Still) Fail to Match #112

Empty String Rules (Still) Fail to Match #112

nskobelevs commented Nov 14, 2022

shnewto commented Nov 14, 2022

CrockAgile commented Nov 17, 2022

amascolo commented Nov 28, 2022

amascolo commented Nov 28, 2022

CrockAgile commented Nov 29, 2022

CrockAgile commented Nov 30, 2022

nskobelevs commented Nov 30, 2022

Empty String Rules (Still) Fail to Match #112

Empty String Rules (Still) Fail to Match #112

Comments

nskobelevs commented Nov 14, 2022

shnewto commented Nov 14, 2022

CrockAgile commented Nov 17, 2022

amascolo commented Nov 28, 2022

amascolo commented Nov 28, 2022

CrockAgile commented Nov 29, 2022

CrockAgile commented Nov 30, 2022

nskobelevs commented Nov 30, 2022