Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate errors in reporting grammars that can cause the parser to run indefinetely #848

Merged
merged 5 commits into from
Apr 28, 2023

Conversation

Tartasprint
Copy link
Contributor

@Tartasprint Tartasprint commented Apr 26, 2023

Fixes #830 #571

The bugs up there are now fixed. But new were found.

The first change I made was fixing a typo at this line:

if ident == "soi" || ident == "eoi" { => if ident == "SOI" || ident == "EOI" {

Since the keywords are the uppercase ones, and that I don't think keywords are case insensitive, this was only a typo. But I'm not sure about that.

Since pest_meta::validator::is_non_failing and pest_meta::validator::is_non_progressing are very similar I fixed/analyzed both of them.
All the edge cases (I hope) in those two functions have been either treated or commented. Here is a summary of the cases that were only commented (marked either by BUG if there is an example of a bug or by WARNING otherwise) and should be addressed:

  • 4 bugs due to POP_ALL, PEEK_ALL, PEEK_SLICE being both non failing and non consuming when the stack is empty (or too small for PEEK_SLICE)
  • 1 bug in is_non_failing due to the negative lookahead: in case the inner expression always fail, the negative lookahead never fails.
  • 1 bug in is_non_progressing: the rule r = @{PUSH("") ~ POP} is both non-progressing and non-failing but is not detected, and causes the parser to run indefinitely on the rule fe = @{(PUSH("")~POP)*} or the second choice of unreachable = @{(PUSH("")~POP)|"other"}. This bug is not commented in the code.
  • 1 warning in is_non_progressing: the lookahead expressions don't consume the input, but they might consume the stack if instructions such as POP or DROP are present inside. This might be seen as progression: the stack length decreases strictly, converging towards the end of parsing. The problem is that the parser does not follow this behavior.
  • 2 warnings relative to recursion. Those are harder to prove correct or incorrect and need more checking.
  • 1 warning about handling a rule that is undefined in is_non_failing. The rules have been checked to be defined earlier (with validate_undefined), but since this souldn't happen we might want to panic or to return an error.
  • 1 warning about the current use of is_non_progressing and is_non_failing in the other validation steps, since all the assumptions the two functions make are not fullfiled

I commented and justified most of the choices because those two functions can be quite tricky (to me at least) and to allow verification of those choices by others.

There might be considerable performance enhancements by running validators (or at least these two validation steps) on all the rules, while updating meta data (e.g. flags for is_non_failing and is_non_progressing). This way we avoid checking multiple times the same expressions. I can propose/give more detail on those changes later.

It seems strange to propose changes knowing that there are bugs, but since many other are being fixed, and that those bugs were already there (unnoticed), I hope they will be accepted.

@Tartasprint Tartasprint requested a review from a team as a code owner April 26, 2023 23:13
@Tartasprint Tartasprint requested review from tomtau and removed request for a team April 26, 2023 23:13
Copy link
Contributor

@tomtau tomtau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and congrats on your first PR here! 🎆 Can you add new tests for the bugs that were fixed in this PR (if they aren't covered by existing tests)?

@tomtau tomtau requested review from dragostis and CAD97 April 26, 2023 23:58
@Tartasprint
Copy link
Contributor Author

The is_non_failing and is_non_progressing functions have now most of the cases tested.
I did not write tests for the bugs/warnings listed up there since I don't have a way for fixing them right now.

@Tartasprint Tartasprint requested a review from tomtau April 27, 2023 20:21
@Tartasprint
Copy link
Contributor Author

I found a new bug, I added it in the list up there

1 bug in is_non_progressing: the rule r = @{PUSH("") ~ POP} is both non-progressing and non-failing but is not detected, and causes the parser to run indefinitely on the rule fe = @{(PUSH("")~POP)*} or the second choice of unreachable = @{(PUSH("")~POP)|"other"}. This bug is not commented in the code.

Copy link
Contributor

@tomtau tomtau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good -- just a few small fmt/linter fixes: https://github.com/pest-parser/pest/actions/runs/4824023698/jobs/8596311317?pr=848#step:4:10
can you run cargo fmt --all and cargo clippy --all --fix (that should automatically fix it, otherwise you can see the report and apply the suggestions)?

for the msrv error, it seems unrelated and I opened this PR: #850

@tomtau tomtau merged commit 7f8411a into pest-parser:master Apr 28, 2023
9 checks passed
@Tartasprint Tartasprint deleted the fix-non-progressing branch April 28, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

validator not reporting an error for indefinite parsing
2 participants