-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traversal Trees #120
Traversal Trees #120
Conversation
I again plan to leave this PR open to give some time for review. Also I still have to read it all over myself with fresh eyes, and add a lot of comments 😅 |
src/grammar.rs
Outdated
bnf.parse::<Grammar>().unwrap().parse_input(input).count(), | ||
5 | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amascolo are these parse trees what you expect?
<and> ::= <and> " AND " <terminal>
├── <and> ::= <and> " AND " <terminal>
│ ├── <and> ::= <terminal>
│ │ └── <terminal> ::= "AND"
│ │ └── "AND"
│ ├── " AND "
│ └── <terminal> ::= "AND"
│ └── "AND"
├── " AND "
└── <terminal> ::= "AND"
└── "AND"
<and> ::= <and> " AND " <terminal>
├── <and> ::= <and> " " <terminal>
│ ├── <and> ::= <and> " " <terminal>
│ │ ├── <and> ::= <terminal>
│ │ │ └── <terminal> ::= "AND"
│ │ │ └── "AND"
│ │ ├── " "
│ │ └── <terminal> ::= "AND"
│ │ └── "AND"
│ ├── " "
│ └── <terminal> ::= "AND"
│ └── "AND"
├── " AND "
└── <terminal> ::= "AND"
└── "AND"
<and> ::= <and> " " <terminal>
├── <and> ::= <and> " AND " <terminal>
│ ├── <and> ::= <and> " " <terminal>
│ │ ├── <and> ::= <terminal>
│ │ │ └── <terminal> ::= "AND"
│ │ │ └── "AND"
│ │ ├── " "
│ │ └── <terminal> ::= "AND"
│ │ └── "AND"
│ ├── " AND "
│ └── <terminal> ::= "AND"
│ └── "AND"
├── " "
└── <terminal> ::= "AND"
└── "AND"
<and> ::= <and> " " <terminal>
├── <and> ::= <and> " " <terminal>
│ ├── <and> ::= <and> " AND " <terminal>
│ │ ├── <and> ::= <terminal>
│ │ │ └── <terminal> ::= "AND"
│ │ │ └── "AND"
│ │ ├── " AND "
│ │ └── <terminal> ::= "AND"
│ │ └── "AND"
│ ├── " "
│ └── <terminal> ::= "AND"
│ └── "AND"
├── " "
└── <terminal> ::= "AND"
└── "AND"
<and> ::= <and> " " <terminal>
├── <and> ::= <and> " " <terminal>
│ ├── <and> ::= <and> " " <terminal>
│ │ ├── <and> ::= <and> " " <terminal>
│ │ │ ├── <and> ::= <terminal>
│ │ │ │ └── <terminal> ::= "AND"
│ │ │ │ └── "AND"
│ │ │ ├── " "
│ │ │ └── <terminal> ::= "AND"
│ │ │ └── "AND"
│ │ ├── " "
│ │ └── <terminal> ::= "AND"
│ │ └── "AND"
│ ├── " "
│ └── <terminal> ::= "AND"
│ └── "AND"
├── " "
└── <terminal> ::= "AND"
└── "AND"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, these look like the parse trees I was expecting!
Thanks even for writing them out in the same order as they appeared in #117 (comment)
b868b3c
to
a734d6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
snapshot testing has exposed some nondeterminism in the Earley parsing. the parse trees are valid, but ambiguous grammars may parse inputs in inconsistent orders between test executions. I will investigate this (likely a |
ea1c1d4
to
baf5a09
Compare
@CrockAgile thanks again, amazing to see these fixes merged – any chance of releasing them as |
Closes #117
Closes #118 (hopefully! 🤞)
Closes #119
Closes #115
Try Again!
This PR is the result of iterating on #119. #119 attempted to resolve the same issue, but @amascolo generously raised examples that still failed.
This PR (maybe!) resolves these additional cases, which are now also included as tests.
I will include the identical "root cause" section here as #119 so that this PR may be readable on its own.
Root Cause
Consider the grammar:
When parsing input "a", there are two routes: "shortfail" and "longsuccess". As the names suggest, "shortfail" requires fewer traversals, but always fails. "longsuccess" requires more traversal steps, and should eventually succeed.
The issue is caused because both paths predict the non-terminal "char". Practical Earley parsing requires de-duplicating predictions or else recursive grammars fall into infinite loops. The existing Earley implementation in BNF does roughly the following:
(work roughly alternates between the short and long routes because new traversals are appended to the end of the work queue)
All the
<longN>
productions are necessary because otherwise the<longsuccess>
route is able to predict before its completion.Existing Issues
Where this PR differs from #119 is its approach to "duplicate detection". Previously, duplicate Earley states/traversals were identified by which
Production
and how manyTerm
s had been matched. This turns out to be insufficient, because partially "matched"Production
s (i.e.<shortfail> ::= <char> • 'never'
) could have matched non-terminals via different paths.New Duplicate Detection
Traversals now match/complete terms by building trees 🌳
A new prediction is the root trunk of a tree, and each matched/completed term adds a new branch. Assuming there are two different traversals which can complete
base
, a traversal tree segment may look like:Performance
On my machine, there was seemingly no performance cost to these changes. I believe there was a cost to adding the new logic for "prior completed" traversals. But that cost was offset by the improvement of traversal trees, instead of reference counted term matching vectors.
Extra
Basically every time I have worked on an Earley bug, I have ended up adding the same manual logging to help with debugging. I decided to commit that logging this time!
There is a new
tracing::event!
which by default is a noop, but with the "tracing" feature enabled adds logging events.For Earley, traversal state events are logged when created and during predict/scan/complete. It helps quite a bit with debugging!