-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the backtrack-aware restoring of the state #52
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mna Thanks for the PR. Find my (length) comments below.
I looked into this PR and I have the following comments:
I played around with your PR a little bit and I made the following observations:
- With your changes in the sequence expression in place, the clone/restore operations in
parseChoiceExpr
are no longer needed, because the restore already happened in the sequence expression, if a restore is needed (all tests happily pass).
My assumption is, that the real problem with the backtracking in the current code is not in the sequence expressions but in the suffixed operations (?
,*
), which behave similar to "choice expressions" in that the included expressions could fail but the parser still advances (because?
and*
allow zero matches as a successful result). The sequence expression just happens to be the right place to perform this restore operation. - If I remove the clone/restore operations in
parseAndExpr
andparseNotExpr
, the tests are still passing. Therefore either we have no test cases yet, which cover the problems, which could arise, if there is no restore performed after theparseAndExpr
or those restores are not necessary.
So I would like to have a test case, which demonstrates (and also covers) a case, where the usage of&
or!
results in a problem where the restore of the state is actually needed.
Last but not least I am not yet convinced, that the p.emptyState
optimization is problematic. As you write, the p.emptyState
is used to optimize the "expensive" clone operation as long as p.cur.state
has a length of 0
. As long as the global state feature is not used at all, this optimization works perfectly.
I assume, that the problem you are seeing goes along the following lines:
- the global state is set to the
p.emptyState
- the global state (and therefore the
p.emptyState
, as both are now the same) is populated with values - the parser performs a backtracking back to the point, where no elements are stored in the global state (and therefore in the
p.emptyState
). - in clone the
p.emptyState
is returned, becausep.cur.state
has again a length of0
, which in fact returnsp.cur.state
again, because stillp.cur.state
andp.emptyState
are the same.
This means, as long as p.cur.state
has a length of 0
, an empty map is returned, regardless of p.cur.state
and p.emptyState
are pointing to the same or to different maps.
So for now I think we should keep this optimization in place as it is helpful for all the parsers where the global state is not used at all.
builder/static_code.go
Outdated
p.pushV() | ||
p.maxFailInvertExpected = !p.maxFailInvertExpected | ||
_, ok := p.parseExpr(not.expr) | ||
p.maxFailInvertExpected = !p.maxFailInvertExpected | ||
p.popV() | ||
p.restore(pt) | ||
p.restoreState(state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefere, if this line would be inserted before p.restore(pt)
. This way, the order of the instructions would be just inverted to the instructions before p.parseExpr(not.expr)
.
Regarding the memoization I agree with you, the global state feature does not really work with memoization and is not tested for this as well. Because there is no guarantee for the state change expressions to be deterministic in regards of the respective position of the parser within the parsed data, I did not invest more time into this. Maybe it would be a good idea to have some kind of comment/documentation about this exception/"edge" case. |
Hello Lucas, Thanks for your comments. Some thoughts:
A better optimization IMO would be to drop the calls to clone/restore state if there is no state code block, but that requires a more dynamic code generation and hence is a bit more complex. I probably won't be able to work on the PR until friday, but until then let me know if you have other comments on that. Thanks, |
@mna Thanks for your explanations. I look forward to your additional test cases. Also I prefer to have your changes (together with the test cases) to the emptyState optimization in an additional PR. I will give this PR an other detailed look after you have updated it. |
In cloneState, the parser returns the emptyState map if the state is empty. However, on restore it is set as the p.cur.state, so subsequent state code blocks can add items to the emptyState, making it non-empty. To prevent any issues because of this, I removed this small optimization.
This reverts commit 6aa28ce.
@breml I updated the PR which is rebased on your latest master changes. I added tests for the And and Not predicates that fail if the restoreState calls are removed. I tried to find an input and a rule that could make the ChoiceExpr restoreState calls required, but couldn't find one. I did leave the restore calls in there, though. Personally, I'd leave them there as correctness is more important than speed, and I'm not sure enough that this cannot be required to remove it (especially with the possible optimizations). But let me know if you're confident enough to remove this and I'll update the PR. Also, there's still the question of the Throw/Recover expressions, does that play a part in restoring the state? As mentioned above, I'm not familiar enough with those to judge. Thanks, |
@mna Thanks for the updated PR. I will have a look at this in the next days. |
@mna I had a look at this PR and I checked your test cases. I removed the Additionally the grammar and the test cases in For now I am fine with your decision to keep the clone/restore in the choice expression. My understanding of the structure of the parser is, that now all the relevant clone/restored handling is done in the sequence expressions and therefore they do not escape to the choice expression nor do they escape to the recover expressions. Beside of that, it is my opinion, that it is the responsibility of the author/grammar to set the state back to a valid value in the case of a recovery. Maybe this should be mentioned in the documentation as well. |
Oh you're right, this restore is required only if the Choice restore is removed (if you remove both, the test fails, and if you put back either, the test passes). But as mentioned in the PR, I'd keep the choice restore to make sure there's not a case we're not thinking about.
Sure, I'll update this. |
@breml updated to reuse the same grammar for staterestore standard and optimized tests. |
@mna Thanks for the update. Please squash this PR and then we are ready to merge it. |
Closes #51 .
Caveats:
I'm pretty sure this feature does not work with memoization, though I haven't tested it. In any case, it could only work if the state code blocks were all deterministic (given that to make it work, the state should be memoized too).
not quite sure about the throw/recover expressions, if any clone/restore of the state should be done in this case. I don't think so, but I'm not familiar with this recent pigeon feature.
Also note that I removed the
emptyState
optimization, as I believe it could lead to issues (it was returned incloneState
if the state is empty, and then set ascurrent.state
inrestoreState
so that subsequent state code blocks could modify it so that it isn't empty anymore).