Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error reporting #459

Merged
merged 8 commits into from
Jul 15, 2020
Merged

Conversation

airportyh
Copy link
Contributor

@airportyh airportyh commented Aug 1, 2019

This PR continues the work for #451:

  • Display 5 lines of input on error to give users more context
  • Display line numbers for the above 5 lines
  • Display user-friendly error report even on custom lexer errors
  • Better error message when the parser is expecting no more input

I am planning on sending a PR to moo.js also (no-context/moo#129), to attach the errored token to its error object, so that the error report can display it. The code in this PR uses that token if it exists.

Code context and line numbers:
Screen Shot 2019-08-01 at 3 28 05 PM

Custom lexer error:
Screen Shot 2019-08-01 at 3 29 32 PM

Parser is not expecting more input:
Screen Shot 2019-08-01 at 3 33 45 PM

formatError. Display better error message
when parser isn't expected further input.
offending line. Output line numbers. Better
error output when no more input is expected.
Display friendly error report even on lexer
errors.
@shmish111
Copy link

This looks very nice, any progress getting it merged?

@airportyh
Copy link
Contributor Author

@shmish111 I haven't heard any news, I guess there isn't a lot of momentum there to get this merged.

@kach
Copy link
Owner

kach commented Mar 30, 2020

Is the moo PR merged?

Also does this mean you're storing lines somewhere? Is there a memory concern for if someone's parsing something non-line-oriented, where you might be storing the entire input stream?

Can we keep ^ instead of the unicode arrow?

Can the commented code lying around (e.g. line 300) be removed?

Merge conflicts?

@TheGrandmother
Copy link

I would love to see this merged.

@ashlincascade
Copy link

Would also love to see this merged. This would unlock a key use case for us-- giving users specific feedback on errors parsing their input in our equation DSL.

@kach
Copy link
Owner

kach commented Jun 19, 2020

@airportyh Do you think this can be de-conflicted without too much trouble?

@airportyh
Copy link
Contributor Author

@kach There's a bunch of context that I need to re-enter my head. I'll revisit this next week.

@airportyh
Copy link
Contributor Author

airportyh commented Jun 29, 2020

@kach I merged the conflicts from master. The moo PR (no-context/moo#129) is still pending though, we won't get the display of the 5 lines code without it if using moo. If using Nearley by itself, you still get the display. Both the StreamLexer in Nearley and moo already buffer all of the input, so I don't think the code introduces a new big problem. In moo, the authors requested that I do not use String.split on the entire buffer because they worry the buffer might be large, and I changed the code there. I think it's really a moot point because if you are already storing the entire buffer, you code will not work for very large inputs anyway. Nearley's model is to feed a smaller chunk at a time, which means the buffer will contain only a chunk at a time, and as long as that's not large, it's not a problem.

@TheGrandmother
Copy link

I'm so happy to see that work is being done on this!
Thank you @airportyh !

@kach
Copy link
Owner

kach commented Jun 30, 2020

Thank you for the rapid turnaround! Can we replace the unicode up-arrow with a ^? Also, why is there a diff for package-lock.json?

@airportyh
Copy link
Contributor Author

@kach removed the special character. As for the change in the package-lock.json, I think it has to do with me having a different version of npm when doing npm install vs whenever last it was done. Sometimes they make minor changes to that tool and change the output of the file. I did upgrade moo.js to 0.5 so that one of the tests could use their keywords feature (this is not super important and if you want I could revert it).

@kach
Copy link
Owner

kach commented Jul 1, 2020

No worries — thank you for your quick updates! I'm reviewing this right now, I swear we're going to get this PR merged before its birthday… :)

@kach
Copy link
Owner

kach commented Jul 1, 2020

test/parser.test.js:85, what's going on there? It leads to stuff printed to console during tests…

@kach
Copy link
Owner

kach commented Jul 1, 2020

Another thing that confuses me (and it's possible my morning brain is just not getting it) is that if I type in a -> [newline] into nearleyc and then hit ^D, the error says "unexpected ws token \n", but in fact nearley supports a -> [newline] "cow" or something just fine. Shouldn't the error be "unexpected end of input" in this case, as in (e.g.) node?

$ echo 'function a() {' | node
[stdin]:2



SyntaxError: Unexpected end of input
    at new Script (vm.js:99:7)
    at createScript (vm.js:249:10)
    at Object.runInThisContext (vm.js:297:10)
    at Object.<anonymous> ([stdin]-wrapper:10:26)
    at Module._compile (internal/modules/cjs/loader.js:1185:30)
    at evalScript (internal/process/execution.js:94:25)
    at internal/main/eval_stdin.js:29:5
    at Socket.<anonymous> (internal/process/execution.js:207:5)
    at Socket.emit (events.js:327:22)
    at endReadableNT (_stream_readable.js:1218:12)

@airportyh
Copy link
Contributor Author

airportyh commented Jul 2, 2020

@kach Oops, that try/catch/log was to debug the exception. I've removed it. The "unexpected end of input" message from your code snippet seems to come from node.

I am not totally sure I follow what you are saying in the second comment. I tried to reproduce what you see, but no luck. But I'll try to answer. I think it is regarding giving a better error when there is no more expected input. The current code in there does not generate an error when end of file is reached. The way it works is if a token is read, and the parse table has no more expectant states (states where the rule is expecting a terminal next), then it throws an error. The problem is that we don't treat end-of-file as a token, which means when it reaches the end of the input, there is nothing to trigger the error.

For example, if you make a file with an incomplete parse:

a ->

and run it through nearleyc, you'd get:

/Users/airportyh/Home/Playground/chameleon/node_modules/nearley/lib/compile.js:25
        for (var i = 0; i < structure.length; i++) {
                                      ^

TypeError: Cannot read property 'length' of undefined

Because the parser is in the middle of an incomplete parse, and therefore does not have any results yet. It might be worth considering emitting an end-of-file token at the end (this is traditionally done as I remember from college), or equivalently, have a method on the parser to tell it that we have received all of the input: parser.end(). That would be a larger design change.

@kach
Copy link
Owner

kach commented Jul 12, 2020

Line 436 still has a unicode arrow! (Also the associated test.)

Also commented code on 299.

Again thanks for your patience, I've been really busy these past few days and haven't found time to look at PRs outside of weekends.

@airportyh
Copy link
Contributor Author

@kach no worries, those are fixed.

@kach kach merged commit b256df4 into kach:master Jul 15, 2020
@kach
Copy link
Owner

kach commented Jul 15, 2020

Merged! Thanks so much for the code and your patience. It's pushed to npm (2.19.5), let's keep an eye on the issues in the next few days in case anyone reports anything. But it should be fine. :-)

@airportyh
Copy link
Contributor Author

Thank you @kach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants