Improve error reporting #459

airportyh · 2019-08-01T19:35:25Z

This PR continues the work for #451:

Display 5 lines of input on error to give users more context
Display line numbers for the above 5 lines
Display user-friendly error report even on custom lexer errors
Better error message when the parser is expecting no more input

I am planning on sending a PR to moo.js also (no-context/moo#129), to attach the errored token to its error object, so that the error report can display it. The code in this PR uses that token if it exists.

Code context and line numbers:

Custom lexer error:

Parser is not expecting more input:

formatError. Display better error message when parser isn't expected further input.

offending line. Output line numbers. Better error output when no more input is expected. Display friendly error report even on lexer errors.

shmish111 · 2020-03-26T18:10:59Z

This looks very nice, any progress getting it merged?

airportyh · 2020-03-30T02:01:43Z

@shmish111 I haven't heard any news, I guess there isn't a lot of momentum there to get this merged.

kach · 2020-03-30T17:44:27Z

Is the moo PR merged?

Also does this mean you're storing lines somewhere? Is there a memory concern for if someone's parsing something non-line-oriented, where you might be storing the entire input stream?

Can we keep ^ instead of the unicode arrow?

Can the commented code lying around (e.g. line 300) be removed?

Merge conflicts?

TheGrandmother · 2020-04-11T14:17:56Z

I would love to see this merged.

ashlincascade · 2020-06-19T23:03:25Z

Would also love to see this merged. This would unlock a key use case for us-- giving users specific feedback on errors parsing their input in our equation DSL.

kach · 2020-06-19T23:55:55Z

@airportyh Do you think this can be de-conflicted without too much trouble?

airportyh · 2020-06-21T03:57:38Z

@kach There's a bunch of context that I need to re-enter my head. I'll revisit this next week.

airportyh · 2020-06-29T15:45:15Z

@kach I merged the conflicts from master. The moo PR (no-context/moo#129) is still pending though, we won't get the display of the 5 lines code without it if using moo. If using Nearley by itself, you still get the display. Both the StreamLexer in Nearley and moo already buffer all of the input, so I don't think the code introduces a new big problem. In moo, the authors requested that I do not use String.split on the entire buffer because they worry the buffer might be large, and I changed the code there. I think it's really a moot point because if you are already storing the entire buffer, you code will not work for very large inputs anyway. Nearley's model is to feed a smaller chunk at a time, which means the buffer will contain only a chunk at a time, and as long as that's not large, it's not a problem.

TheGrandmother · 2020-06-30T11:37:26Z

I'm so happy to see that work is being done on this!
Thank you @airportyh !

kach · 2020-06-30T16:36:36Z

Thank you for the rapid turnaround! Can we replace the unicode up-arrow with a ^? Also, why is there a diff for package-lock.json?

airportyh · 2020-07-01T15:20:20Z

@kach removed the special character. As for the change in the package-lock.json, I think it has to do with me having a different version of npm when doing npm install vs whenever last it was done. Sometimes they make minor changes to that tool and change the output of the file. I did upgrade moo.js to 0.5 so that one of the tests could use their keywords feature (this is not super important and if you want I could revert it).

kach · 2020-07-01T15:22:09Z

No worries — thank you for your quick updates! I'm reviewing this right now, I swear we're going to get this PR merged before its birthday… :)

kach · 2020-07-01T15:24:37Z

test/parser.test.js:85, what's going on there? It leads to stuff printed to console during tests…

kach · 2020-07-01T15:35:10Z

Another thing that confuses me (and it's possible my morning brain is just not getting it) is that if I type in a -> [newline] into nearleyc and then hit ^D, the error says "unexpected ws token \n", but in fact nearley supports a -> [newline] "cow" or something just fine. Shouldn't the error be "unexpected end of input" in this case, as in (e.g.) node?

$ echo 'function a() {' | node
[stdin]:2



SyntaxError: Unexpected end of input
    at new Script (vm.js:99:7)
    at createScript (vm.js:249:10)
    at Object.runInThisContext (vm.js:297:10)
    at Object.<anonymous> ([stdin]-wrapper:10:26)
    at Module._compile (internal/modules/cjs/loader.js:1185:30)
    at evalScript (internal/process/execution.js:94:25)
    at internal/main/eval_stdin.js:29:5
    at Socket.<anonymous> (internal/process/execution.js:207:5)
    at Socket.emit (events.js:327:22)
    at endReadableNT (_stream_readable.js:1218:12)

airportyh · 2020-07-02T15:26:40Z

@kach Oops, that try/catch/log was to debug the exception. I've removed it. The "unexpected end of input" message from your code snippet seems to come from node.

I am not totally sure I follow what you are saying in the second comment. I tried to reproduce what you see, but no luck. But I'll try to answer. I think it is regarding giving a better error when there is no more expected input. The current code in there does not generate an error when end of file is reached. The way it works is if a token is read, and the parse table has no more expectant states (states where the rule is expecting a terminal next), then it throws an error. The problem is that we don't treat end-of-file as a token, which means when it reaches the end of the input, there is nothing to trigger the error.

For example, if you make a file with an incomplete parse:

a ->

and run it through nearleyc, you'd get:

/Users/airportyh/Home/Playground/chameleon/node_modules/nearley/lib/compile.js:25
        for (var i = 0; i < structure.length; i++) {
                                      ^

TypeError: Cannot read property 'length' of undefined

Because the parser is in the middle of an incomplete parse, and therefore does not have any results yet. It might be worth considering emitting an end-of-file token at the end (this is traditionally done as I remember from college), or equivalently, have a method on the parser to tell it that we have received all of the input: parser.end(). That would be a larger design change.

kach · 2020-07-12T19:04:26Z

Line 436 still has a unicode arrow! (Also the associated test.)

Also commented code on 299.

Again thanks for your patience, I've been really busy these past few days and haven't found time to look at PRs outside of weekends.

airportyh · 2020-07-13T15:18:02Z

@kach no worries, those are fixed.

kach · 2020-07-15T04:32:46Z

Merged! Thanks so much for the code and your patience. It's pushed to npm (2.19.5), let's keep an eye on the issues in the next few days in case anyone reports anything. But it should be fine. :-)

airportyh · 2020-07-15T17:04:08Z

Thank you @kach!

airportyh added 4 commits August 1, 2019 12:46

Improved code context display for lexer

4abe0b2

formatError. Display better error message when parser isn't expected further input.

Display 4 more lines of input preceeding the

ee27dc8

offending line. Output line numbers. Better error output when no more input is expected. Display friendly error report even on lexer errors.

Added explanation for lexer error's token prop.

353712c

Fixed line number spacing.

9ffd5a9

airportyh mentioned this pull request Aug 1, 2019

Improved error reporting no-context/moo#129

Merged

merged with master

a1a32a3

removed use of non-ascii character

392c869

removed exception logging in test

f50a143

removed unicode character and uneeded comment

b256df4

kach merged commit b256df4 into kach:master Jul 15, 2020

joshsunshine mentioned this pull request Jun 9, 2022

Improve parsing error messages penrose/penrose#1052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error reporting #459

Improve error reporting #459

airportyh commented Aug 1, 2019 •

edited

Loading

shmish111 commented Mar 26, 2020

airportyh commented Mar 30, 2020

kach commented Mar 30, 2020

TheGrandmother commented Apr 11, 2020

ashlincascade commented Jun 19, 2020

kach commented Jun 19, 2020 •

edited

Loading

airportyh commented Jun 21, 2020

airportyh commented Jun 29, 2020 •

edited

Loading

TheGrandmother commented Jun 30, 2020

kach commented Jun 30, 2020

airportyh commented Jul 1, 2020

kach commented Jul 1, 2020

kach commented Jul 1, 2020

kach commented Jul 1, 2020

airportyh commented Jul 2, 2020 •

edited

Loading

kach commented Jul 12, 2020

airportyh commented Jul 13, 2020

kach commented Jul 15, 2020

airportyh commented Jul 15, 2020

Improve error reporting #459

Improve error reporting #459

Conversation

airportyh commented Aug 1, 2019 • edited Loading

shmish111 commented Mar 26, 2020

airportyh commented Mar 30, 2020

kach commented Mar 30, 2020

TheGrandmother commented Apr 11, 2020

ashlincascade commented Jun 19, 2020

kach commented Jun 19, 2020 • edited Loading

airportyh commented Jun 21, 2020

airportyh commented Jun 29, 2020 • edited Loading

TheGrandmother commented Jun 30, 2020

kach commented Jun 30, 2020

airportyh commented Jul 1, 2020

kach commented Jul 1, 2020

kach commented Jul 1, 2020

kach commented Jul 1, 2020

airportyh commented Jul 2, 2020 • edited Loading

kach commented Jul 12, 2020

airportyh commented Jul 13, 2020

kach commented Jul 15, 2020

airportyh commented Jul 15, 2020

airportyh commented Aug 1, 2019 •

edited

Loading

kach commented Jun 19, 2020 •

edited

Loading

airportyh commented Jun 29, 2020 •

edited

Loading

airportyh commented Jul 2, 2020 •

edited

Loading