Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bullet-proof report left recursion #307

Closed

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Dec 12, 2014

How it works: main idea in what for sequence it is insufficiently simple to check the first element of sequence because exist cases described in #190, that also have left recursion. Grammars

S = a? S
S = a* S

it is possible to present in the equivalent form

S = S / a S
S = S / a + S

As you see, that is classical left recursion. In general, any production, which doesn't consume an input, can lead to the left recursion.

Therefore, if we walk through AST of the rule and we will consume the smallest theoretically possible input, we will be able to define, whether it can lead to the left recursion. We need check all sequence elements while one not consume some input. After that any left recursion is inpossible. So, main advantages in checking not only first element of sequence node, but all, while some input won't be consumed.

…h situations as:

- r = 'a'? r
- r = 'a'* r
- r = !'a' r (lookahead not advance parse position)
- r = &'a' r (lookahead not advance parse position)
- r = !{} r (lookahead not advance parse position)
- r = &{} r (lookahead not advance parse position)
- r = '' r (empty literal not advance parse position)
- r = [] r (empty character class not advance parse position)

Fixes pegjs#190 and pegjs#26.
…t exist.

As pass can be combined with other passes by the end user, we shan't assume that AST will be correct.
@dmajda
Copy link
Contributor

dmajda commented Apr 1, 2015

@Mingun Thanks for your pull request. I looked at your code but ultimately I decided to implement the fix for #190 little bit differently myself.

Few notes:

  • Implement one task at a time. Issues Online parser infinite loops #190 are Infinite loop #26 related, but they have a distinct status (bug vs. feature) and thus a different priority. This suggests they should be fixed separately. Changing the error message is also a separate thing.
  • Make implemented algorithms descriptive and readable. For example, your code uses one visitor for two tasks (reporting left recursion and detecting whether an expression may not consume any input), conflating them unnecessarily.
  • Follow existing code style. For example, you used different commenting density & style, used too few empty lines (e.g. no empty lines between functions or after var statements), and style of your tests also differs from current code.
  • Use one commit per logical change. The Online parser infinite loops #190 fix should have been just one commit fixing both code and tests. Also, no “fixup” commits please.
  • Avoid incorrectness. Your code assumes an empty character class ([]) doesn’t consume any input, which is not true — it just never matches.

If you avoid issues like this it’s much more likely I’ll review your PRs faster and actually accept them.

@dmajda dmajda closed this Apr 1, 2015
@Mingun
Copy link
Contributor Author

Mingun commented Apr 1, 2015

Implement one task at a time. Issues #190 are #26 related, but they have a distinct status (bug vs. feature) and thus a different priority. This suggests they should be fixed separately. Changing the error message is also a separate thing.

I am inclined to consider both tasks as a bug, and same. I definitely don't remember, but in my opinion, Feature label at #26 appeared later, than this PR was created.

Make implemented algorithms descriptive and readable. For example, your code uses one visitor for two tasks (reporting left recursion and detecting whether an expression may not consume any input), conflating them unnecessarily.

How you found left recursion if you don't check, consume the parser an input or not? Therefore, I don't consider that it was two tasks. Your implementation seems to me unreasonably non-optimal, though.

Follow existing code style. For example, you used different commenting density & style, used too few empty lines (e.g. no empty lines between functions or after var statements), and style of your tests also differs from current code.

I tried to adhere to your style. Though as in a code there are no comments, it is difficult to be guided by something here. In tests I didn't notice any divergences in style, you can specify more precisely, where that I could consider it further?

Use one commit per logical change. The #190 fix should have been just one commit fixing both code and tests. Also, no “fixup” commits please.

I did different сommits for fix and for tests consciously. It seems to me that it is so easier to review them. As you didn't mention anywhere that you don't like such approach, I did as it seemed to me more conveniently.

Avoid incorrectness. Your code assumes an empty character class ([]) doesn’t consume any input, which is not true — it just never matches.

You aren't right. You can create such parser. Besides, API allows to create the AST parser manually therefore you shouldn't rely on AST from the PEG parser. By the way, same concerns an order of run of passes. Now many passes rely that will be executed in a certain order. However API allows you to change it. I consider it a problem.

Example of grammar, that freeze parser and create left recursion, that will:

freeze = (![])*;
leftRecursion = !(![] leftRecursion);

In your implementation you won't notice it because it is hided by other bug in your implementation.
You don't come into expressions of the optional, one_or_more, zero_or_more, simple_and and simple_not nodes. On this example your pass won't work:

// Must be detected, as recursive. You implementation fail. My pass
start1 = ('' (start1/start1))?;
// Must be detected, as non-recursive. You implementation pass. My pass
start2 = ('a' (start2/start2))?;

@Mingun
Copy link
Contributor Author

Mingun commented Apr 4, 2015

Having a little more thought, I understood that -- it isn't pleasant to me in your implementation. The matter is that you used the wrong predicate for determination of, whether there can be the left recursion or not. The problem is aggravated with that in implementation of this predicate have bugs, as a result it doesn't even work as it is possible to assume from its name (because there is no the comment describing how it shall work. In this case names of function insufficiently, please, comment on a code! From you won't decrease).

Your assumption is that if expression matches an empty input, the left recursion is possible. It is wrong. This sufficient condition but not necessary. The necessary condition that expression doesn't consume any input. It not the same that !matchesEmpty(). As you noted, expression [] doesn't match empty input, however it doesn't consume an input and therefore, leads to the left recursion in what you can will be convinced (see my example above).

The bug is in implementation of matchesEmpty that the following grammars will be erraticly considered, as matches to an empty input though it not so (you can it will be convinced). I assume that the matchesEmpty function shall do that tells it name: for any expression to determine, it can be matched with an empty input or not:

start = !''// not matches anything, but matchesEmpty() == true
start = ![]// not matches anything, but matchesEmpty() == true
start = !. // not matches anything, but matchesEmpty() == true
start = &. // never matches empty, but matchesEmpty() == true

Mingun referenced this pull request Apr 4, 2015
So far, left recursion detector assumed that left recursion occurs only
when the recursive rule is at the very left-hand side of rule's
expression:

  start = start

This didn't catch cases like this:

  start = "a"? start

In general, if a rule reference can be reached without consuming any
input, it can lead to left recursion. This commit fixes the detector to
consider that.

Fixes #190.
@dmajda
Copy link
Contributor

dmajda commented Jul 17, 2015

Implement one task at a time. Issues #190 are #26 related, but they have a distinct status (bug vs. feature) and thus a different priority. This suggests they should be fixed separately. Changing the error message is also a separate thing.

I am inclined to consider both tasks as a bug, and same.

One was a bug because existing functionality was misbehaving (not doing what it was supposed to do, that is prevent all left recursion). The other was a feature because PEG.js wasn’t originally designed to prevent infinite loops via + and *. But I understand my view can be contested e.g. on the basis that both left recursion and infinite loop detection are just different factes of the same thing (preventing parsers to hang).

I definitely don't remember, but in my opinion, Feature label at #26 appeared later, than this PR was created.

May be, GitHub doesn’t tell.

Your implementation seems to me unreasonably non-optimal, though.

I optimize for primarily for readability and simplicity.

Follow existing code style. For example, you used different commenting density & style, used too few empty lines (e.g. no empty lines between functions or after var statements), and style of your tests also differs from current code.

I tried to adhere to your style. Though as in a code there are no comments, it is difficult to be guided by something here. In tests I didn't notice any divergences in style, you can specify more precisely, where that I could consider it further?

You loop through all the expressions several times, which is not necessary for testing some “facets”, e.g. indirect recursion. But this is a minor point and there were few examples you could have followed. I was probably too strict on you on this point, sorry.

Use one commit per logical change. The #190 fix should have been just one commit fixing both code and tests. Also, no “fixup” commits please.

I did different сommits for fix and for tests consciously. It seems to me that it is so easier to review them. As you didn't mention anywhere that you don't like such approach, I did as it seemed to me more conveniently.

I never mentioned this anywhere because I consider atomic commits and clean history an elementary software engineering practice. Although I guess I should spell these things out somewhere because the amount of people who don’t follow things I consider elementary is way too big.

Avoid incorrectness. Your code assumes an empty character class ([]) doesn’t consume any input, which is not true — it just never matches.

You aren't right. You can create such parser.

You can, but the point is that the [] expression will always fail. It matches exactly one character which has to belong into an empty set — an impossible condition.

Besides, API allows to create the AST parser manually therefore you shouldn't rely on AST from the PEG parser.

No. If you use the API to supply the compiler with an AST that doesn’t correspond to any valid PEG.js grammar, and it explodes, it’s your problem. The compiler is (and always will be) designed to accept only certain inputs.

By the way, same concerns an order of run of passes. Now many passes rely that will be executed in a certain order. However API allows you to change it. I consider it a problem.

I don’t. Granular passes allow separation of concerns, which is very important. There will likely be more passes in the future, not less. Plugin writers must just live with that and avoid messing up with pass ordering.


As for the implementation bugs you found, I’m still investigating.

@dmajda
Copy link
Contributor

dmajda commented Jul 24, 2015

Example of grammar, that freeze parser and create left recursion, that will:

freeze = (![])*;
leftRecursion = !(![] leftRecursion);

These are both correctly detected.

In your implementation you won't notice it because it is hided by other bug in your implementation.
You don't come into expressions of the optional, one_or_more, zero_or_more, simple_and and simple_not nodes. On this example your pass won't work:

// Must be detected, as recursive. You implementation fail. My pass
start1 = ('' (start1/start1))?;
// Must be detected, as non-recursive. You implementation pass. My pass
start2 = ('a' (start2/start2))?;

Right, thanks for spotting that. Filed #359, will fix soon.

Your assumption is that if expression matches an empty input, the left recursion is possible. It is wrong. This sufficient condition but not necessary. The necessary condition that expression doesn't consume any input. It not the same that !matchesEmpty(). As you noted, expression [] doesn't match empty input, however it doesn't consume an input and therefore, leads to the left recursion in what you can will be convinced (see my example above).

I think the whole confusion here originated in that I used the word “matches” in the sense “consumes”, which was misleading. I’ll rename the matchesEmpty function.

The bug is in implementation of matchesEmpty that the following grammars will be erraticly considered, as matches to an empty input though it not so (you can it will be convinced). I assume that the matchesEmpty function shall do that tells it name: for any expression to determine, it can be matched with an empty input or not:

start = !''// not matches anything, but matchesEmpty() == true
start = ![]// not matches anything, but matchesEmpty() == true
start = !. // not matches anything, but matchesEmpty() == true
start = &. // never matches empty, but matchesEmpty() == true

Not sure if that’s what you had in mind but your examples made me file #360.

dmajda added a commit that referenced this pull request Jul 31, 2015
This makes it more clear that the function isn't about the input the
expression *matched* but about the input it *consumed* when it matched.

Based on a comment by @Mingun:

  #307 (comment)
@dmajda
Copy link
Contributor

dmajda commented Jul 31, 2015

dmajda added a commit that referenced this pull request Sep 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants