Fix incorrect error reporting in recoverable mode #1654

let-def · 2017-11-20T14:15:43Z

Error were not reported properly in recoverable mode (... because I forgot one step of menhir checkpoint interpretation when recovering, my bad 👎).

The first patch helped me make recovery code a bit more local but is not necessary, only the second one matters.

With this patch, errors in editors should be as good as error in batch mode (otherwise it deserves an issue).

chenglou · 2017-11-20T18:45:28Z

Oh boy, this is a huge fix, thanks!

jordwalke · 2017-11-22T07:30:44Z

@let-def This is definitely a huge improvement.

Consider the following file:

let x = 2340

let y = 234;

Before this PR, the -r --print ast version of the AST was:

[
  structure_item ([1,0+0].../bin/Hello.re[1,0+0])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("invalidCharacter.orComment.orString",None)
    ]
]

It didn't even include the second let binding in the AST.

After this PR, we get this:

[
  structure_item (bin/Hello.re[2,1+0]..[2,1+12])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("Statement has to end with a semicolon",None)
    ]
  structure_item (bin/Hello.re[4,15+0]..[4,15+12])
    Pstr_value Nonrec
    [
      <def>
        pattern (bin/Hello.re[4,15+4]..[4,15+5])
          Ppat_var "y" (bin/Hello.re[4,15+4]..[4,15+5])
        expression (bin/Hello.re[4,15+8]..[4,15+11])
          Pexp_constant Const_int 234
    ]
]

It correctly has the second line.

This means the editor shows errors nicely like:

The error makes sense too "statements must end with semicolon".

But what's interesting is if I build this PR, and run refmt non-recovered on the raw file I get a very nice error location, which is different:

File "bin/Hello.re", line 4, characters 0-3:
Error: 883: syntax error, consider adding a `;' before```

That points to the second let token in the file which is what you'd ideally see in Merlin.

I'm not sure which one is better, but do you know why they are different, and if they should be unified?

jordwalke · 2017-11-22T09:11:31Z

I found one pretty big difference between the recovered reported locations and the non-recovered. Here's one example:

let x = 2340;

let y = 234;

let res = MyLib.Util.

The exact error is at the very end of the final line and the nonrecovered parser prints the location correctly.

However, the recovered version only produces this AST:

[
  structure_item ([1,0+0]..bin/Hello.re[1,0+0])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("invalidCharacter.orComment.orString",None)
    ]
]

You can see it ate the entire file - and merely puts one error on the first line, the first character.

Is there some way we can just "catch" any parsing error that happens, and then as a final step, insert a fake extension point with that specific location of the parsing error? We could retain some small state to determine if we've ever errored, and then before returning the final AST, in recover mode, inject a fake extension point?

let-def · 2017-11-22T10:07:04Z

First message:

With the recovered AST, the location is computed inside the grammar rule. The non-recovered path tend to use the location of the token in the lexing buffer more often.

There is a bit of non necessary code duplication there, we could get a uniform behavior with a bit of refactoring.
There are a few different things at play, and each affects the behavior a bit: the parsing code (which handles error differently when recovery is enabled), the code that handles "entering the recovery mode" and the code that produces error messages.
While we are tweaking the different bits, it is a bit useful to have duplication, we should just tweak toward what we think is the best and clean up after.

Second message:

This case is worrying.
In Merlin, the parser knows nothing about recovery: it tries to do its normal job, and we have a "meta" driver that looks at the state of the parser and patch it to recover.
The benefit is that recovery considerations doesn't affect the grammar, it is much easier to get a uniform behavior and evolve the grammar "robustly" (you can focus on the good cases) and we have very fine control on which information will be kept in the AST -- in particular we know for sure that anything that has been parsed won't be dropped.
The cons is that this meta-driver is harder to grasp, and the runtime complexity is a bit harder to reason about (a normal parser has a linear runtime cost, but here we are interleaving recovery attempts and actual parsing -- it is essentially a Dijkstra short-path in a graph which has the depth of the stack has upper bound). So the runtime is quasi-linear, or O(n log n) where n is the depth of the stack, not the size of the input.

I would like to experiment using the same approach for Reason.

A benefit of this strategy is that it gives near perfect context information for completion, where the first parsing error is likely under or after the text cursor... And the errors are transmitted as you suggest.

let-def added 2 commits November 20, 2017 14:35

starts refactoring recovery implementation

51928ab

error recovery: forgot one checkpoint normalization

51b523f

facebook-github-bot added the CLA Signed label Nov 20, 2017

chenglou merged commit a5cbc71 into reasonml:master Nov 20, 2017

chenglou added a commit that referenced this pull request Nov 20, 2017

[Docs] #1654

d032b83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect error reporting in recoverable mode #1654

Fix incorrect error reporting in recoverable mode #1654

let-def commented Nov 20, 2017

chenglou commented Nov 20, 2017

jordwalke commented Nov 22, 2017

jordwalke commented Nov 22, 2017

let-def commented Nov 22, 2017 •

edited

Loading

Fix incorrect error reporting in recoverable mode #1654

Fix incorrect error reporting in recoverable mode #1654

Conversation

let-def commented Nov 20, 2017

chenglou commented Nov 20, 2017

jordwalke commented Nov 22, 2017

jordwalke commented Nov 22, 2017

let-def commented Nov 22, 2017 • edited Loading

let-def commented Nov 22, 2017 •

edited

Loading