Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error reporting #129

Merged
merged 3 commits into from
May 22, 2022
Merged

Conversation

airportyh
Copy link
Contributor

@airportyh airportyh commented Aug 1, 2019

This PR aims to improve the error message displayed to the user when a syntax error is encountered by the lexer. In particular:

  • 5 input lines are displayed on error to give users context
  • Line numbers are displayed
  • Attaches the errored token to the thrown error object, eventhough it's not a "good" token - this allows the handler (in particular in Nearley), to have access to the errored input and display as it pleases

This PR is related to kach/nearley#459.

Screenshot:

Screen Shot 2019-08-01 at 3 54 43 PM

for context and line numbers.
Copy link
Collaborator

@tjvr tjvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the desire to improve error reporting in parsers, but I'm not convinced that moo should return extra lines of the input.

moo.js Outdated
var eol = token.lineBreaks ? token.text.indexOf('\n') : token.text.length
var firstLine = this.buffer.substring(start, token.offset + eol)

var lines = this.buffer.split("\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid using split here because we've already tokenized the input. Error handling doesn't necessarily need to be as fast as the happy path, but it could potentially be an issue if the input buffer is large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We avoid using split here because we've already tokenized the input. Error handling doesn't necessarily need to be as fast as the happy path, but it could potentially be an issue if the input buffer is large.

We could walk the string backwards instead if that is a concern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walking backwards (and forwards—see my comment) is a better option. Tokenizers often deal with very large inputs.

@tjvr
Copy link
Collaborator

tjvr commented Aug 1, 2019

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

@airportyh
Copy link
Contributor Author

airportyh commented Aug 2, 2019

I appreciate the desire to improve error reporting in parsers, but I'm not convinced that moo should return extra lines of the input.

Could you elaborate? Do you have concerns about effects that the change could cause? Or don't think the issue is important? I'd be happy to work on the code more to make it satisfactory to you, but if you are not interested at all to merge this then I will leave it alone.

@airportyh
Copy link
Contributor Author

airportyh commented Aug 2, 2019

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

I wanted to display the character that the lexer errored on within the parser error message. This was not possible without reaching in to the non-public api of the lexer (as specified on the Nearley website) to access its current index and buffer. I felt that passing this token back is slightly better than doing that. Open to other suggestions though.

@nathan
Copy link
Collaborator

nathan commented Aug 5, 2019

@airportyh

Screen Shot 2019-08-01 at 3 54 43 PM

It looks like your terminal emulator is treating the unicode arrow as either fullwidth or proportional width, as many emulators do, so it only sort of lines up with the character it's trying to point to. It's better just to use a caret/circumflex (^) because that's pretty much guaranteed to line up, even if it doesn't look quite as arrow-y.

I find context lines in error messages quite useful, and line numbers marginally so. However, you've implemented context wrong: you should give n lines on both sides of the problematic one, not just n lines before. It's much more useful to see this:

5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^
8    if (result.extremities > 10) return false
9    try {

than this:

3    return lizard
4  }
5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^

(See, e.g., the --unified or --context options of diff or django's debug tracebacks.) The blank line required for the pointer also draws your eye straight to the (in)correct line instead of forcing you to scan to the bottom to find the one that caused the error.

Another formatting nit: there should be at least two columns between line numbers and code: two spaces, a full stop and a space, or something similar.

@tjvr I don't have a problem with making this the default behavior, but if you'd like to be consistent with previous versions and/or keep error messages less bulky, perhaps we could make it an option to formatError.

moo.js Outdated
return pad(token.line - lines.length + i + 1, lastLineDigits) + " " + line;
})
.join("\n")
message += "\n" + pad("", lastLineDigits + token.col) + "⬆︎\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should not end with blank lines unless you have a Really Good Reason™. I'd get rid of the terminal \n.

moo.js Outdated

function pad(n, length) {
var s = String(n)
return Array(length - s.length + 1).join(" ") + s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though you never violate this invariant, it makes me uncomfortable (i.e., will trip up future maintainers) that this function will throw if you give it something longer than length, since functions like pad usually don't. Either inline the code (since you made the function local and you only call it once) or make it more robust and put it with the other utility functions at the top.

moo.js Outdated
var eol = token.lineBreaks ? token.text.indexOf('\n') : token.text.length
var firstLine = this.buffer.substring(start, token.offset + eol)

var lines = this.buffer.split("\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walking backwards (and forwards—see my comment) is a better option. Tokenizers often deal with very large inputs.

@tjvr
Copy link
Collaborator

tjvr commented Aug 6, 2019

I agree with all of Nathan's comments. Since he's happy to make this the default, so am I. I think I'd rather have bulkier error messages for everyone than introduce extra arguments to formatError. I think one or two lines before and after the error would be sufficient. :)

As above, I'd rather not attach the token object to the error message, but it sounds like we have to.

@airportyh
Copy link
Contributor Author

airportyh commented Aug 8, 2019

Nice! I appreciate the feedback. I plan to address the mentioned issues soon. Here I'll summarize what I am planning to do:

  • avoid buffer.split and walk the string buffer backwards instead
  • revert arrow character back to ^
  • display ~2 lines of input before and after the problematic line
  • make pad robust and handle cases where input string is longer than length (throw or crop? I can consult other well-known implementations)
  • reconsider including the "bad" token in the error object. If we don't do this, it's not the end of the world, since the code context output will already make it obvious where the problem is. This feature is a little weird because the attached token is not a valid thing.

If either of you would like to change this list please give me a heads up. I plan to start work some time tomorrow.

@nathan
Copy link
Collaborator

nathan commented Aug 9, 2019

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

@airportyh
Copy link
Contributor Author

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

Okay.

Revert arrow character to ^.
Error handling in pad function
Revert attaching of invalid token to error.
@airportyh
Copy link
Contributor Author

New code is in! Please take a look.

@airportyh
Copy link
Contributor Author

@tjvr @nathan any question on this? Let me know if there's anything I can do to help.

Copy link
Collaborator

@tjvr tjvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to go through it all I'm afraid, but here are a couple of comments!

moo.js Outdated
@@ -559,6 +593,8 @@
}
}

Lexer.prototype.lastNLines = lastNLines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to export this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the only way I could write tests for it. Do you know of another way? Or we could ditch the tests, which I am okay with too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and removed tests.

moo.js Outdated
@@ -53,6 +53,39 @@
}
}

function pad(n, length) {
var s = String(n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling String() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed that superfluous if we simply assume the input is a string. I can get rid of it if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nathan nathan self-requested a review August 21, 2019 13:11
@TheGrandmother
Copy link

I would love to see this merged....

      if (err.stack.match(/.*moo\.js.*/)) {
        const m = err.message.match(/.*line (\d+) col (\d+)/)
        const newErr = new CompilerError(err.message)
        err.position = {line: parseInt(m[0]), col: parseInt(m[1])}
        throw newErr
      }

This is the sketchy workaround I had to use to differentiate lexer errors and get the position out. :P

Copy link
Collaborator

@tjvr tjvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@tjvr tjvr merged commit 813e006 into no-context:master May 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants