Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans for error recovery? #173

Open
jaredly opened this issue Aug 1, 2021 · 6 comments
Open

Plans for error recovery? #173

jaredly opened this issue Aug 1, 2021 · 6 comments

Comments

@jaredly
Copy link

jaredly commented Aug 1, 2021

It looks like there's been some work done in extending PEG generators to support error recovery (producing a syntax tree even in the face of some syntax errors) -- see this blog post and this paper.
Have y'all thought about supporting something like that in peggy?

@Mingun
Copy link
Member

Mingun commented Aug 2, 2021

You mean to add a fail-safe parsing to the Peggy grammar itself or add the ability to generate fail-safe parsers? If the last I don't think that we need some special syntax for that, because existing one already quite straightford. The example from the blog post written in the Peggy would look like:

{
  let errors = [];
  function report(message, loc = location()) {
    errors.push({
      location: loc.start.offset + ".." + loc.end.offset,
      message,
    });
  }
}
source_file = ast:failsafe_expr EOF .* {
  return { ast, errors };
};

failsafe_expr = expr / '' {};
expr = paren / ident / error;

ident = $([a-zA-Z_] [a-zA-Z0-9_']*);
paren
  = '('
    e:(expr / '' { report("expected expression after `(`"); })
    (')'    / '' { report("missing `)`"); })
  { return [e]; }
  ;
error = err:$(!')' .)+ { report("unexpected `" + err + "`"); };

EOF = !. / '' { report("expected EOF"); };

Here the undefined is equivalent to the Error variant, array is equivalent to the Parens variant and string is equivalent to the Ident variant.

Error spans for the (foo)) and () not the same, but I think that in Peggy variant they are more logical

@asklar
Copy link

asklar commented Aug 14, 2021

@Mingun thanks for the tip but I've found that applying this approach to a more complex grammar becomes very cumbersome / hard to keep track of. I wonder if there is any interest in providing a more "batteries included" solution for those of us parser hobbyists that need to write parsers that work well with IntelliSense, for example

@Mingun
Copy link
Member

Mingun commented Aug 14, 2021

I understand your desires, but what you suggest to improve in the Peggy grammar? You anyway need to manually specify synchronization points in your grammar. Doing that with dedicated syntax or with existing one. Because the existing abilities does not create much overhead I do not see what benefits we can get from the additional syntax. Of course, if you can provide a syntax for that and to show a benefits from the automaticrealization of these concepts I'll no against. But right now I do not see this benefits

@Mingun
Copy link
Member

Mingun commented Nov 15, 2021

The same paper from the arxiv.org (unlike dl.acm.org it is available for free). The authors also noted that recovery rules written manually can give are more precise error recovery.

However, I think you can add a syntax that simplifies the creation of nodes for automatic recovery, for example:

expression % "<error message>"

would be be translated to:

expression / '' { report("<error message>"); }

where report is a new API function for registering parser errors, similar to expected()/error() but does not stop parsing.

Then the original grammar

expr = paren / ident;
ident = $[a-z]i+;
paren = '(' expr ')';

with minimal changes could be converted to the grammar with some error recovery mechanisms:

expr = paren / ident;
ident = $[a-z]i+;
paren
  = '('
    expr % "expected expression after `(`"
    ')'  % "missing `)`"
  ;

Such modified grammar is able to produce result for the incomplete inputs, such as (foo (instead of (foo)), but inputs with excess symbols (such as (foo))) requires additional rules for the error recovery.

As a further development, a special operator can be introduced for inferring error message from a labeled expression:

expr "Expression" = paren / ident;
ident = $[a-z]i+;
paren
  = '('
    expr %! // produces "expected `Expression`" error - name of the referenced rule
    ')'  %! // produces "expected `)`" or maybe specially for literals "missed `)`"
  ;

The rules for automatically generated messages also could be configurable.

Feel free to play with the implementation but IMO the syntax not very clear and probably it won't be so flexible as a custom solution described above.

@hildjj
Copy link
Contributor

hildjj commented Nov 15, 2021

How well would this work with just a report function, so we don't have to add more syntax?

@Mingun
Copy link
Member

Mingun commented Nov 15, 2021

It already works very well without any additions (and IMO report function also not needed if we don't want to implement syntax additions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants