-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bullet-proof report left recursion #307
Conversation
…h situations as: - r = 'a'? r - r = 'a'* r - r = !'a' r (lookahead not advance parse position) - r = &'a' r (lookahead not advance parse position) - r = !{} r (lookahead not advance parse position) - r = &{} r (lookahead not advance parse position) - r = '' r (empty literal not advance parse position) - r = [] r (empty character class not advance parse position) Fixes pegjs#190 and pegjs#26.
…t exist. As pass can be combined with other passes by the end user, we shan't assume that AST will be correct.
@Mingun Thanks for your pull request. I looked at your code but ultimately I decided to implement the fix for #190 little bit differently myself. Few notes:
If you avoid issues like this it’s much more likely I’ll review your PRs faster and actually accept them. |
I am inclined to consider both tasks as a bug, and same. I definitely don't remember, but in my opinion,
How you found left recursion if you don't check, consume the parser an input or not? Therefore, I don't consider that it was two tasks. Your implementation seems to me unreasonably non-optimal, though.
I tried to adhere to your style. Though as in a code there are no comments, it is difficult to be guided by something here. In tests I didn't notice any divergences in style, you can specify more precisely, where that I could consider it further?
I did different сommits for fix and for tests consciously. It seems to me that it is so easier to review them. As you didn't mention anywhere that you don't like such approach, I did as it seemed to me more conveniently.
You aren't right. You can create such parser. Besides, API allows to create the AST parser manually therefore you shouldn't rely on AST from the PEG parser. By the way, same concerns an order of run of passes. Now many passes rely that will be executed in a certain order. However API allows you to change it. I consider it a problem. Example of grammar, that freeze parser and create left recursion, that will:
In your implementation you won't notice it because it is hided by other bug in your implementation.
|
Having a little more thought, I understood that -- it isn't pleasant to me in your implementation. The matter is that you used the wrong predicate for determination of, whether there can be the left recursion or not. The problem is aggravated with that in implementation of this predicate have bugs, as a result it doesn't even work as it is possible to assume from its name (because there is no the comment describing how it shall work. In this case names of function insufficiently, please, comment on a code! From you won't decrease). Your assumption is that if expression matches an empty input, the left recursion is possible. It is wrong. This sufficient condition but not necessary. The necessary condition that expression doesn't consume any input. It not the same that The bug is in implementation of
|
So far, left recursion detector assumed that left recursion occurs only when the recursive rule is at the very left-hand side of rule's expression: start = start This didn't catch cases like this: start = "a"? start In general, if a rule reference can be reached without consuming any input, it can lead to left recursion. This commit fixes the detector to consider that. Fixes #190.
One was a bug because existing functionality was misbehaving (not doing what it was supposed to do, that is prevent all left recursion). The other was a feature because PEG.js wasn’t originally designed to prevent infinite loops via
May be, GitHub doesn’t tell.
I optimize for primarily for readability and simplicity.
You loop through all the expressions several times, which is not necessary for testing some “facets”, e.g. indirect recursion. But this is a minor point and there were few examples you could have followed. I was probably too strict on you on this point, sorry.
I never mentioned this anywhere because I consider atomic commits and clean history an elementary software engineering practice. Although I guess I should spell these things out somewhere because the amount of people who don’t follow things I consider elementary is way too big.
You can, but the point is that the
No. If you use the API to supply the compiler with an AST that doesn’t correspond to any valid PEG.js grammar, and it explodes, it’s your problem. The compiler is (and always will be) designed to accept only certain inputs.
I don’t. Granular passes allow separation of concerns, which is very important. There will likely be more passes in the future, not less. Plugin writers must just live with that and avoid messing up with pass ordering. As for the implementation bugs you found, I’m still investigating. |
These are both correctly detected.
Right, thanks for spotting that. Filed #359, will fix soon.
I think the whole confusion here originated in that I used the word “matches” in the sense “consumes”, which was misleading. I’ll rename the
Not sure if that’s what you had in mind but your examples made me file #360. |
This makes it more clear that the function isn't about the input the expression *matched* but about the input it *consumed* when it matched. Based on a comment by @Mingun: #307 (comment)
How it works: main idea in what for
sequence
it is insufficiently simple to check the first element of sequence because exist cases described in #190, that also have left recursion. Grammarsit is possible to present in the equivalent form
As you see, that is classical left recursion. In general, any production, which doesn't consume an input, can lead to the left recursion.
Therefore, if we walk through AST of the rule and we will consume the smallest theoretically possible input, we will be able to define, whether it can lead to the left recursion. We need check all
sequence
elements while one not consume some input. After that any left recursion is inpossible. So, main advantages in checking not only first element ofsequence
node, but all, while some input won't be consumed.