Fix the backtrack-aware restoring of the state #52

mna · 2017-10-30T15:22:07Z

Closes #51 .

Caveats:

I'm pretty sure this feature does not work with memoization, though I haven't tested it. In any case, it could only work if the state code blocks were all deterministic (given that to make it work, the state should be memoized too).
not quite sure about the throw/recover expressions, if any clone/restore of the state should be done in this case. I don't think so, but I'm not familiar with this recent pigeon feature.

Also note that I removed the emptyState optimization, as I believe it could lead to issues (it was returned in cloneState if the state is empty, and then set as current.state in restoreState so that subsequent state code blocks could modify it so that it isn't empty anymore).

breml

@mna Thanks for the PR. Find my (length) comments below.

I looked into this PR and I have the following comments:

I played around with your PR a little bit and I made the following observations:

With your changes in the sequence expression in place, the clone/restore operations in parseChoiceExpr are no longer needed, because the restore already happened in the sequence expression, if a restore is needed (all tests happily pass).
My assumption is, that the real problem with the backtracking in the current code is not in the sequence expressions but in the suffixed operations (?, *), which behave similar to "choice expressions" in that the included expressions could fail but the parser still advances (because ? and * allow zero matches as a successful result). The sequence expression just happens to be the right place to perform this restore operation.
If I remove the clone/restore operations in parseAndExpr and parseNotExpr, the tests are still passing. Therefore either we have no test cases yet, which cover the problems, which could arise, if there is no restore performed after the parseAndExpr or those restores are not necessary.
So I would like to have a test case, which demonstrates (and also covers) a case, where the usage of & or ! results in a problem where the restore of the state is actually needed.

Last but not least I am not yet convinced, that the p.emptyState optimization is problematic. As you write, the p.emptyState is used to optimize the "expensive" clone operation as long as p.cur.state has a length of 0. As long as the global state feature is not used at all, this optimization works perfectly.
I assume, that the problem you are seeing goes along the following lines:

the global state is set to the p.emptyState
the global state (and therefore the p.emptyState, as both are now the same) is populated with values
the parser performs a backtracking back to the point, where no elements are stored in the global state (and therefore in the p.emptyState).
in clone the p.emptyState is returned, because p.cur.state has again a length of 0, which in fact returns p.cur.state again, because still p.cur.state and p.emptyState are the same.

This means, as long as p.cur.state has a length of 0, an empty map is returned, regardless of p.cur.state and p.emptyState are pointing to the same or to different maps.

So for now I think we should keep this optimization in place as it is helpful for all the parsers where the global state is not used at all.

breml · 2017-10-30T20:39:19Z

builder/static_code.go

 	p.pushV()
 	p.maxFailInvertExpected = !p.maxFailInvertExpected
 	_, ok := p.parseExpr(not.expr)
 	p.maxFailInvertExpected = !p.maxFailInvertExpected
 	p.popV()
 	p.restore(pt)
+	p.restoreState(state)


I would prefere, if this line would be inserted before p.restore(pt). This way, the order of the instructions would be just inverted to the instructions before p.parseExpr(not.expr).

breml · 2017-10-30T21:36:34Z

Regarding the memoization I agree with you, the global state feature does not really work with memoization and is not tested for this as well. Because there is no guarantee for the state change expressions to be deterministic in regards of the respective position of the parser within the parsed data, I did not invest more time into this.

Maybe it would be a good idea to have some kind of comment/documentation about this exception/"edge" case.

mna · 2017-10-30T21:42:46Z

Hello Lucas,

Thanks for your comments. Some thoughts:

You're right, ~~I forgot that choice expressions are sequence expressions~~ (scratch that, it's actually the other way around, not sure if the restore in choice expr can be safely removed, I'll try to find a case where it can fail and add that as a test), so that would be the right place to perform the restore. However, when optimizations are performed, are choice expressions still guaranteed to be "wrapped" in a sequence expression?
Yeah those restore are definitely needed, because predicates always backtrack (consider !EOL with an input that has an EOL, the rule matches but the EOL is not consumed, the state has to be restored). I'll add test cases for those.
Regarding emptyState, I believe there is a way to cause issues with it, but yeah I haven't written a test case or anything, I'll drop it from this PR as it is kind of a separate thing. It would be something along those lines:

an expression causes the parser to clone with no state (p.cur.state is empty) so it returns emptyState, and the on restore it sets p.cur.state to the same as emptyState (an alias)
state code block adds something (state["thing"] = 1), emptyState no longer empty (because p.cur.state is an alias)
an expression causes the parser to clone state["thing"] == 1, then backtracks and restores the clone (p.cur.state is now not the same as emptyState - not an alias - but both contain state["thing" == 1)
a subsequent state code block removes the thing (delete(state, "thing")), p.cur.state is now empty but not emptyState
a subsequent expression causes the parser to clone the state, which is empty, so it returns emptyState, but emptyState is not empty, problems...

A better optimization IMO would be to drop the calls to clone/restore state if there is no state code block, but that requires a more dynamic code generation and hence is a bit more complex.

I probably won't be able to work on the PR until friday, but until then let me know if you have other comments on that.

Thanks,
Martin

breml · 2017-11-02T21:40:07Z

@mna Thanks for your explanations. I look forward to your additional test cases. Also I prefer to have your changes (together with the test cases) to the emptyState optimization in an additional PR. I will give this PR an other detailed look after you have updated it.

In cloneState, the parser returns the emptyState map if the state is empty. However, on restore it is set as the p.cur.state, so subsequent state code blocks can add items to the emptyState, making it non-empty. To prevent any issues because of this, I removed this small optimization.

This reverts commit 6aa28ce.

…d a case

mna · 2017-11-03T14:37:22Z

@breml I updated the PR which is rebased on your latest master changes. I added tests for the And and Not predicates that fail if the restoreState calls are removed.

I tried to find an input and a rule that could make the ChoiceExpr restoreState calls required, but couldn't find one. I did leave the restore calls in there, though. Personally, I'd leave them there as correctness is more important than speed, and I'm not sure enough that this cannot be required to remove it (especially with the possible optimizations). But let me know if you're confident enough to remove this and I'll update the PR.

Also, there's still the question of the Throw/Recover expressions, does that play a part in restoring the state? As mentioned above, I'm not familiar enough with those to judge.

Thanks,
Martin

breml · 2017-11-03T21:59:45Z

@mna Thanks for the updated PR. I will have a look at this in the next days.

breml · 2017-11-05T21:36:42Z

@mna I had a look at this PR and I checked your test cases. I removed the state := p.cloneState() and p.restoreState(state) from the function parseNotExpr and the test cases in staterestore and staterestoreopt did not fail (the test for the parseAndExpr worked as expected).

Additionally the grammar and the test cases in statestore and statestoreopt are equivalent. Therefore I suggest to organize the files the same way as in the json example. Do you mind the change this?

For now I am fine with your decision to keep the clone/restore in the choice expression.

My understanding of the structure of the parser is, that now all the relevant clone/restored handling is done in the sequence expressions and therefore they do not escape to the choice expression nor do they escape to the recover expressions. Beside of that, it is my opinion, that it is the responsibility of the author/grammar to set the state back to a valid value in the case of a recovery. Maybe this should be mentioned in the documentation as well.

mna · 2017-11-05T22:16:41Z

I removed the state := p.cloneState() and p.restoreState(state) from the function parseNotExpr and the test cases in staterestore and staterestoreopt did not fail (the test for the parseAndExpr worked as expected).

Oh you're right, this restore is required only if the Choice restore is removed (if you remove both, the test fails, and if you put back either, the test passes). But as mentioned in the PR, I'd keep the choice restore to make sure there's not a case we're not thinking about.

Additionally the grammar and the test cases in statestore and statestoreopt are equivalent. Therefore I suggest to organize the files the same way as in the json example. Do you mind the change this?

Sure, I'll update this.

mna · 2017-11-05T22:33:32Z

@breml updated to reuse the same grammar for staterestore standard and optimized tests.

breml · 2017-11-06T12:53:44Z

@mna Thanks for the update. Please squash this PR and then we are ready to merge it.

breml requested changes Oct 30, 2017

View reviewed changes

breml mentioned this pull request Nov 2, 2017

Document incompatibility of global state with memoization #55

Open

breml mentioned this pull request Nov 2, 2017

optimize global state if not used #57

Merged

mna added 7 commits November 3, 2017 08:49

add failing test for the state store restore bug

39b4175

clone and restore state store in all required expressions

097b3f6

remove unnecessary restore of state in one-or-more expression

2bfabd8

Revert "remove the emptyState optimization"

7f1cb34

This reverts commit 6aa28ce.

reorder restoreState before restore

daef467

test/staterestore: rename package to staterestore instead of main

3092275

mna force-pushed the wip-fix-state branch from 2099da7 to 3092275 Compare November 3, 2017 12:54

mna added 2 commits November 3, 2017 09:45

add test cases that exercise the restoreState on And and Not predicates

2173130

try to add a failing restore state test for ChoiceExpr, could not fin…

959c854

…d a case

mna mentioned this pull request Nov 3, 2017

Remove emptyState optimization #58

Merged

breml mentioned this pull request Nov 5, 2017

Document global state handling in combination with recover/throw #59

Closed

use a common grammar for staterestore tests

fa6086c

Merge branch 'master' into wip-fix-state

0f34f3a

breml approved these changes Nov 6, 2017

View reviewed changes

mna merged commit 1a765be into master Nov 7, 2017

mna deleted the wip-fix-state branch November 7, 2017 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the backtrack-aware restoring of the state #52

Fix the backtrack-aware restoring of the state #52

mna commented Oct 30, 2017

breml left a comment

breml Oct 30, 2017

breml commented Oct 30, 2017

mna commented Oct 30, 2017 •

edited

Loading

breml commented Nov 2, 2017

mna commented Nov 3, 2017

breml commented Nov 3, 2017

breml commented Nov 5, 2017

mna commented Nov 5, 2017

mna commented Nov 5, 2017

breml commented Nov 6, 2017

Fix the backtrack-aware restoring of the state #52

Fix the backtrack-aware restoring of the state #52

Conversation

mna commented Oct 30, 2017

breml left a comment

Choose a reason for hiding this comment

breml Oct 30, 2017

Choose a reason for hiding this comment

breml commented Oct 30, 2017

mna commented Oct 30, 2017 • edited Loading

breml commented Nov 2, 2017

mna commented Nov 3, 2017

breml commented Nov 3, 2017

breml commented Nov 5, 2017

mna commented Nov 5, 2017

mna commented Nov 5, 2017

breml commented Nov 6, 2017

mna commented Oct 30, 2017 •

edited

Loading