Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Left recursion fix #75

Merged
merged 31 commits into from
Apr 21, 2019
Merged

Left recursion fix #75

merged 31 commits into from
Apr 21, 2019

Conversation

Victorious3
Copy link
Collaborator

@Victorious3 Victorious3 commented Aug 14, 2018

Fixes #57, fixes #69 see discussion on there for context.
Now fixes #27 as well

This resolves left recursion when the grammar is created instead of figuring it out on the fly. For that it runs some code in leftrec.py to detect left recursive cycles and marks one rule as left recursive to break the cycle. For the code generation this information is saved with a new annotation @leftrec that can be used together with @tatsumasu.
Furthermore, the annotation @nomemo is used to block memoization on rules that are part of a left recursive cycle.

TODO:

  • This doesn't account for memoization yet, mainly because I don't have a failing test for that.
  • Interlocking cycles? See https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion - Not subject of this PR
  • I don't like how it deals with RuleRef at all, ideally those should be resolved before the analysis is run, but that would cause a cascade of changes.
  • Could write some sort of walker class for pretties
  • Figure out how SkipTo (->) and Cut (~) interacts with left recursion
  • Now that Parser drops part of input #27 is fixed I should try if the more simple left recursion detection was actually sufficient, would make everything simpler - Not in the way it was before

This still doesn't cover all cases, but we are getting closer. The PR adds new failing test cases. (skipped for now)

It doesn't seem to be used anywhere,
and it's broken (walk_node vs walk_Node).
PreOrderWalker does the same thing
Didn't test it properly but it seems to work.
I'm not exactly happy with some of the code (mainly how it deals with RuleRef).
The real solution would be to resolve RuleRef beforehand  and replace them with the correct Rule instance, but that would mess with code generation
@apalala
Copy link
Collaborator

apalala commented Aug 14, 2018

Wow! I'll take a look at this ASAP!

@Victorious3
Copy link
Collaborator Author

Travis is still complaining but I don't know how to get it to shut up about those warnings, so I'll leave it like this for now

@apalala
Copy link
Collaborator

apalala commented Aug 14, 2018

Hi @Victorious3 This branch has the mods required to let type checking and unit tests pass:

https://github.com/neogeny/TatSu/tree/Victorious3-left_recursion_fix

The two skipped left-recursion tests are for valid grammars, and they point to defects in either the implementation of left-recursion, or the algorithm.

There are other left recursive grammars in the TatSu issues that also fail, and should probably be part of the unit tests.

I'll take another look at all this in the next few days.

The last implementation was advancing the input way too much, this resets it after every rule invokation
I can get the tests to pass by stripping all whitespace, something's off with that.
One of the grammars was actually broken (not PEG)
@@ -310,8 +307,8 @@ def test_left_recursion_bug(self, trace=False):
start = expression $ ;

expression =
| paren_expression
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't consider minus_expression a valid alternative after it successfully parsed a paren_expression,
therefore (3 - 2) - 1 was correctly failing.

@Victorious3
Copy link
Collaborator Author

Victorious3 commented Aug 18, 2018

Those tests literally took forever without memoization, I better get to fixing that ^^

This still discards many useful memos but for now this is the best I can do
…owever, the examples are non trivial and hard to understand so I don't think these cases are common occurance.
@Victorious3
Copy link
Collaborator Author

@apalala This is pretty much done apart from cosmetic changes, and hopefully good enough to let me continue my project.

@Victorious3
Copy link
Collaborator Author

I couldn't come up with a test case for -> that is also left recursive at the same time, it sorta goes against the point of it. However, since -> is supposed to advance the input in all cases, and this is disabled inside left recursive propagation, there might be some weird edge cases. ~ seems to do alright and is covered by an already existing test case.

While doing this I also discovered that @@nameguard :: False isn't respected, but I'll make a separate issue about that.

leftrec.py could use some work to make it more pretty but not doing it in this PR, I'll leave the TODOs open for that.

So yea, that's it from my side unless you have anything I should change to get it merged.

I don't think @tatsumasu can accept any other parameters. It's still a bit ugly but better than before. @leftrec now needs to be applied before @tatsumasu
Turns memoization back on for rules that aren't part of a left recursive cycle.
This should improve performance significantly, previously all memoization was turned off when  any left recursion took place.
@Victorious3
Copy link
Collaborator Author

The bootstrap test doesn't include anything related to left recursion, it would be a good idea to make sure that the tests are run twice, once directly and once using the generated parser

@Victorious3
Copy link
Collaborator Author

It's still pretty conservative at doing memos during left recursion but it should be solid at least. I ran my project using it and went down from 37s on parsing 1000 lines of 1 + 1 to about 20s, no failures. The expression grammar works entirely on using left recursive rules with multiple precedence levels.

@Victorious3
Copy link
Collaborator Author

I didn't have any problems with this over the course of several months, from my perspective its ready to merge?

@apalala
Copy link
Collaborator

apalala commented Apr 2, 2019

I'm sorry I haven't had the time to review this in detail, @Victorious3 . You're in charge now. If you think it's ready, then it is. I'd just check that all the non-leftrec unit tests pass. I see some merge conflicts, but I'm sure you're aware of them. Do the docs need any updates?

@Victorious3
Copy link
Collaborator Author

Victorious3 commented Apr 3, 2019

There's one issue with how the unittests run currently. There are significant differences between how the parser behaves when running from the generated code and when running in immediate. Tatsu's self test doesn't cover all the features that have been added, so for completeness all the other unit tests should run twice.

The docs don't need changes, it's running "as advertised" now ^^

@apalala apalala merged commit 617a979 into neogeny:master Apr 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indirect Left Recursion calc example: parse_factored() fails against "1*2+3-4" Parser drops part of input
2 participants