Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect error reporting in recoverable mode #1654

Merged
merged 2 commits into from
Nov 20, 2017
Merged

Fix incorrect error reporting in recoverable mode #1654

merged 2 commits into from
Nov 20, 2017

Conversation

let-def
Copy link
Contributor

@let-def let-def commented Nov 20, 2017

Error were not reported properly in recoverable mode (... because I forgot one step of menhir checkpoint interpretation when recovering, my bad 👎).

The first patch helped me make recovery code a bit more local but is not necessary, only the second one matters.

With this patch, errors in editors should be as good as error in batch mode (otherwise it deserves an issue).

@chenglou
Copy link
Member

Oh boy, this is a huge fix, thanks!

@chenglou chenglou merged commit a5cbc71 into reasonml:master Nov 20, 2017
chenglou added a commit that referenced this pull request Nov 20, 2017
@jordwalke
Copy link
Member

@let-def This is definitely a huge improvement.

Consider the following file:

let x = 2340

let y = 234;

Before this PR, the -r --print ast version of the AST was:

[
  structure_item ([1,0+0].../bin/Hello.re[1,0+0])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("invalidCharacter.orComment.orString",None)
    ]
]

It didn't even include the second let binding in the AST.

After this PR, we get this:

[
  structure_item (bin/Hello.re[2,1+0]..[2,1+12])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("Statement has to end with a semicolon",None)
    ]
  structure_item (bin/Hello.re[4,15+0]..[4,15+12])
    Pstr_value Nonrec
    [
      <def>
        pattern (bin/Hello.re[4,15+4]..[4,15+5])
          Ppat_var "y" (bin/Hello.re[4,15+4]..[4,15+5])
        expression (bin/Hello.re[4,15+8]..[4,15+11])
          Pexp_constant Const_int 234
    ]
]

It correctly has the second line.

This means the editor shows errors nicely like:

screen shot 2017-11-21 at 11 29 09 pm

The error makes sense too "statements must end with semicolon".

But what's interesting is if I build this PR, and run refmt non-recovered on the raw file I get a very nice error location, which is different:

File "bin/Hello.re", line 4, characters 0-3:
Error: 883: syntax error, consider adding a `;' before```

That points to the second let token in the file which is what you'd ideally see in Merlin.

I'm not sure which one is better, but do you know why they are different, and if they should be unified?

@jordwalke
Copy link
Member

I found one pretty big difference between the recovered reported locations and the non-recovered. Here's one example:

let x = 2340;

let y = 234;

let res = MyLib.Util.

The exact error is at the very end of the final line and the nonrecovered parser prints the location correctly.

However, the recovered version only produces this AST:

[
  structure_item ([1,0+0]..bin/Hello.re[1,0+0])
    Pstr_extension "merlin.syntax-error"
    [
      structure_item (_none_[1,0+-1]..[1,0+-1]) ghost
        Pstr_eval
        expression (_none_[1,0+-1]..[1,0+-1]) ghost
          Pexp_constant Const_string("invalidCharacter.orComment.orString",None)
    ]
]

You can see it ate the entire file - and merely puts one error on the first line, the first character.

Is there some way we can just "catch" any parsing error that happens, and then as a final step, insert a fake extension point with that specific location of the parsing error? We could retain some small state to determine if we've ever errored, and then before returning the final AST, in recover mode, inject a fake extension point?

@let-def
Copy link
Contributor Author

let-def commented Nov 22, 2017

First message:

With the recovered AST, the location is computed inside the grammar rule. The non-recovered path tend to use the location of the token in the lexing buffer more often.

There is a bit of non necessary code duplication there, we could get a uniform behavior with a bit of refactoring.
There are a few different things at play, and each affects the behavior a bit: the parsing code (which handles error differently when recovery is enabled), the code that handles "entering the recovery mode" and the code that produces error messages.
While we are tweaking the different bits, it is a bit useful to have duplication, we should just tweak toward what we think is the best and clean up after.

Second message:

This case is worrying.
In Merlin, the parser knows nothing about recovery: it tries to do its normal job, and we have a "meta" driver that looks at the state of the parser and patch it to recover.
The benefit is that recovery considerations doesn't affect the grammar, it is much easier to get a uniform behavior and evolve the grammar "robustly" (you can focus on the good cases) and we have very fine control on which information will be kept in the AST -- in particular we know for sure that anything that has been parsed won't be dropped.
The cons is that this meta-driver is harder to grasp, and the runtime complexity is a bit harder to reason about (a normal parser has a linear runtime cost, but here we are interleaving recovery attempts and actual parsing -- it is essentially a Dijkstra short-path in a graph which has the depth of the stack has upper bound). So the runtime is quasi-linear, or O(n log n) where n is the depth of the stack, not the size of the input.

I would like to experiment using the same approach for Reason.

A benefit of this strategy is that it gives near perfect context information for completion, where the first parsing error is likely under or after the text cursor... And the errors are transmitted as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants