Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the backtrack-aware restoring of the state #52

Merged
merged 11 commits into from
Nov 7, 2017
Merged

Fix the backtrack-aware restoring of the state #52

merged 11 commits into from
Nov 7, 2017

Conversation

mna
Copy link
Owner

@mna mna commented Oct 30, 2017

Closes #51 .

Caveats:

  • I'm pretty sure this feature does not work with memoization, though I haven't tested it. In any case, it could only work if the state code blocks were all deterministic (given that to make it work, the state should be memoized too).

  • not quite sure about the throw/recover expressions, if any clone/restore of the state should be done in this case. I don't think so, but I'm not familiar with this recent pigeon feature.

Also note that I removed the emptyState optimization, as I believe it could lead to issues (it was returned in cloneState if the state is empty, and then set as current.state in restoreState so that subsequent state code blocks could modify it so that it isn't empty anymore).

Copy link
Collaborator

@breml breml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mna Thanks for the PR. Find my (length) comments below.

I looked into this PR and I have the following comments:

I played around with your PR a little bit and I made the following observations:

  1. With your changes in the sequence expression in place, the clone/restore operations in parseChoiceExpr are no longer needed, because the restore already happened in the sequence expression, if a restore is needed (all tests happily pass).
    My assumption is, that the real problem with the backtracking in the current code is not in the sequence expressions but in the suffixed operations (?, *), which behave similar to "choice expressions" in that the included expressions could fail but the parser still advances (because ? and * allow zero matches as a successful result). The sequence expression just happens to be the right place to perform this restore operation.
  2. If I remove the clone/restore operations in parseAndExpr and parseNotExpr, the tests are still passing. Therefore either we have no test cases yet, which cover the problems, which could arise, if there is no restore performed after the parseAndExpr or those restores are not necessary.
    So I would like to have a test case, which demonstrates (and also covers) a case, where the usage of & or ! results in a problem where the restore of the state is actually needed.

Last but not least I am not yet convinced, that the p.emptyState optimization is problematic. As you write, the p.emptyState is used to optimize the "expensive" clone operation as long as p.cur.state has a length of 0. As long as the global state feature is not used at all, this optimization works perfectly.
I assume, that the problem you are seeing goes along the following lines:

  • the global state is set to the p.emptyState
  • the global state (and therefore the p.emptyState, as both are now the same) is populated with values
  • the parser performs a backtracking back to the point, where no elements are stored in the global state (and therefore in the p.emptyState).
  • in clone the p.emptyState is returned, because p.cur.state has again a length of 0, which in fact returns p.cur.state again, because still p.cur.state and p.emptyState are the same.

This means, as long as p.cur.state has a length of 0, an empty map is returned, regardless of p.cur.state and p.emptyState are pointing to the same or to different maps.

So for now I think we should keep this optimization in place as it is helpful for all the parsers where the global state is not used at all.

p.pushV()
p.maxFailInvertExpected = !p.maxFailInvertExpected
_, ok := p.parseExpr(not.expr)
p.maxFailInvertExpected = !p.maxFailInvertExpected
p.popV()
p.restore(pt)
p.restoreState(state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefere, if this line would be inserted before p.restore(pt). This way, the order of the instructions would be just inverted to the instructions before p.parseExpr(not.expr).

@breml
Copy link
Collaborator

breml commented Oct 30, 2017

Regarding the memoization I agree with you, the global state feature does not really work with memoization and is not tested for this as well. Because there is no guarantee for the state change expressions to be deterministic in regards of the respective position of the parser within the parsed data, I did not invest more time into this.

Maybe it would be a good idea to have some kind of comment/documentation about this exception/"edge" case.

@mna
Copy link
Owner Author

mna commented Oct 30, 2017

Hello Lucas,

Thanks for your comments. Some thoughts:

  1. You're right, I forgot that choice expressions are sequence expressions (scratch that, it's actually the other way around, not sure if the restore in choice expr can be safely removed, I'll try to find a case where it can fail and add that as a test), so that would be the right place to perform the restore. However, when optimizations are performed, are choice expressions still guaranteed to be "wrapped" in a sequence expression?

  2. Yeah those restore are definitely needed, because predicates always backtrack (consider !EOL with an input that has an EOL, the rule matches but the EOL is not consumed, the state has to be restored). I'll add test cases for those.

  3. Regarding emptyState, I believe there is a way to cause issues with it, but yeah I haven't written a test case or anything, I'll drop it from this PR as it is kind of a separate thing. It would be something along those lines:

  • an expression causes the parser to clone with no state (p.cur.state is empty) so it returns emptyState, and the on restore it sets p.cur.state to the same as emptyState (an alias)
  • state code block adds something (state["thing"] = 1), emptyState no longer empty (because p.cur.state is an alias)
  • an expression causes the parser to clone state["thing"] == 1, then backtracks and restores the clone (p.cur.state is now not the same as emptyState - not an alias - but both contain state["thing" == 1)
  • a subsequent state code block removes the thing (delete(state, "thing")), p.cur.state is now empty but not emptyState
  • a subsequent expression causes the parser to clone the state, which is empty, so it returns emptyState, but emptyState is not empty, problems...

A better optimization IMO would be to drop the calls to clone/restore state if there is no state code block, but that requires a more dynamic code generation and hence is a bit more complex.

I probably won't be able to work on the PR until friday, but until then let me know if you have other comments on that.

Thanks,
Martin

@breml
Copy link
Collaborator

breml commented Nov 2, 2017

@mna Thanks for your explanations. I look forward to your additional test cases. Also I prefer to have your changes (together with the test cases) to the emptyState optimization in an additional PR. I will give this PR an other detailed look after you have updated it.

In cloneState, the parser returns the emptyState map if the state is empty.
However, on restore it is set as the p.cur.state, so subsequent state code
blocks can add items to the emptyState, making it non-empty. To prevent
any issues because of this, I removed this small optimization.
@mna
Copy link
Owner Author

mna commented Nov 3, 2017

@breml I updated the PR which is rebased on your latest master changes. I added tests for the And and Not predicates that fail if the restoreState calls are removed.

I tried to find an input and a rule that could make the ChoiceExpr restoreState calls required, but couldn't find one. I did leave the restore calls in there, though. Personally, I'd leave them there as correctness is more important than speed, and I'm not sure enough that this cannot be required to remove it (especially with the possible optimizations). But let me know if you're confident enough to remove this and I'll update the PR.

Also, there's still the question of the Throw/Recover expressions, does that play a part in restoring the state? As mentioned above, I'm not familiar enough with those to judge.

Thanks,
Martin

@mna mna mentioned this pull request Nov 3, 2017
@breml
Copy link
Collaborator

breml commented Nov 3, 2017

@mna Thanks for the updated PR. I will have a look at this in the next days.

@breml
Copy link
Collaborator

breml commented Nov 5, 2017

@mna I had a look at this PR and I checked your test cases. I removed the state := p.cloneState() and p.restoreState(state) from the function parseNotExpr and the test cases in staterestore and staterestoreopt did not fail (the test for the parseAndExpr worked as expected).

Additionally the grammar and the test cases in statestore and statestoreopt are equivalent. Therefore I suggest to organize the files the same way as in the json example. Do you mind the change this?

For now I am fine with your decision to keep the clone/restore in the choice expression.

My understanding of the structure of the parser is, that now all the relevant clone/restored handling is done in the sequence expressions and therefore they do not escape to the choice expression nor do they escape to the recover expressions. Beside of that, it is my opinion, that it is the responsibility of the author/grammar to set the state back to a valid value in the case of a recovery. Maybe this should be mentioned in the documentation as well.

@mna
Copy link
Owner Author

mna commented Nov 5, 2017

I removed the state := p.cloneState() and p.restoreState(state) from the function parseNotExpr and the test cases in staterestore and staterestoreopt did not fail (the test for the parseAndExpr worked as expected).

Oh you're right, this restore is required only if the Choice restore is removed (if you remove both, the test fails, and if you put back either, the test passes). But as mentioned in the PR, I'd keep the choice restore to make sure there's not a case we're not thinking about.

Additionally the grammar and the test cases in statestore and statestoreopt are equivalent. Therefore I suggest to organize the files the same way as in the json example. Do you mind the change this?

Sure, I'll update this.

@mna
Copy link
Owner Author

mna commented Nov 5, 2017

@breml updated to reuse the same grammar for staterestore standard and optimized tests.

@breml
Copy link
Collaborator

breml commented Nov 6, 2017

@mna Thanks for the update. Please squash this PR and then we are ready to merge it.

@mna mna merged commit 1a765be into master Nov 7, 2017
@mna mna deleted the wip-fix-state branch November 7, 2017 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants