Add option to print better parse errors#3700
Conversation
bc52ee6 to
2c1546b
Compare
|
@Baltoli this is the discussion that started this work: https://runtimeverification.slack.com/archives/C7E701MFG/p1694474725891869 |
|
@Baltoli I don't have a minimal example, but I can provide real examples from KMIR parsing failures, k v6.0.133 error messagebranch
|
|
I have improved the output from this diagnostic; now instead of using only the immediate predecessor parse state, we use the most recent spanning intermediate state. As an example, consider the following production being added to the syntax Exp ::= "AlignOf" "(" Exp ")"In the initial version Daniel tries above, the following parse error would be produced: $ kast <(echo -n 'AlignOf(3+)') --no-exc-wrap
[Error] Inner Parser: Parse error: unexpected token ')' following token '+'. Additional parsing diagnostic information:
No top-level production could apply to this input.
Source(/dev/fd/11)
Location(1,11,1,12)
1 |
. ^because no production includes the terminal sequence $ kast <(echo -n 'AlignOf(3+)') --no-exc-wrap
[Error] Inner Parser: Parse error: unexpected token ')' following token '+'. Additional parsing diagnostic information:
Attempting to apply production:
syntax Exp ::= "AlignOf" "(" Exp ")"
produced partial term:
`AlignOf(_)_ARITHMETIC-SYNTAX_Exp_Exp`(#token("<error>","<invalid>"))
Source(/dev/fd/11)
Location(1,11,1,12)
1 |
. ^In the interests of getting this change merged, and keeping the diff small, this is probably as involved of a heuristic as I'm prepared to implement for the time being. If in the future we identify a case where this heuristic is deficient, we can open a new PR to extend the existing diagnostic implementation. |
radumereuta
left a comment
There was a problem hiding this comment.
Java part lgtm.
I would like it if Daniel also approves this since he's the one who reported the issue in the first place.
Quick question. What happens if the term is very large? Is it going to pretty print everything?
|
Daniel has tested the latest version of the output and confirmed to me that it's useful via DM; if the term is large then the AST printed will indeed also be large. |
This PR fixes #3672 by providing a best-effort rendering of the partial state of the parser when a parse error happens. To do this, we grab the Earley state corresponding to the span
[0, token_index_before_error]and inspect the partial parse trees available, as well as the production being attempted at this point.Consider as a simple example the following standard K definition for arithmetic expressions:
module ARITHMETIC-SYNTAX imports UNSIGNED-INT-SYNTAX imports ID-SYNTAX syntax Exp ::= Int | "-" Exp [group(neg), strict] | Exp "+" Exp [left, group(add), strict] | Exp "-" Exp [left, group(sub), strict] | Exp "*" Exp [left, group(mul), strict] | Exp "/" Exp [left, group(div), strict] | "(" Exp ")" [bracket] syntax priorities neg > mul div > add sub endmodule module ARITHMETIC imports ARITHMETIC-SYNTAX endmoduleOn trying to
kastan invalid term (for example,1 * 2 + a * 4-alexes because we've importedID-SYNTAX, but terms of sortIdcan't appear in a valid derivation ofExp), we see the following message from the new component:we make a best-effort attempt to construct what the top-level
_+_term might have looked like, placing an<error>token where we expected to see a valid term of sortExp.The changes made in this PR are as follows:
EarleyParserto conditionally perform the rendering as above.--debug-parsethrough to the parser when appropriate.I have additionally verified with @dkcumming that the implementation as reviewed produces useful information that would have been useful when addressing the original problem; these cases come from the MIR semantics and are therefore too large to include here as tests.