Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error formatting #240

Merged
merged 15 commits into from
Jun 23, 2018
Merged

Error formatting #240

merged 15 commits into from
Jun 23, 2018

Conversation

halfzebra
Copy link
Contributor

@halfzebra halfzebra commented Apr 6, 2018

This PR addresses #239

I'm not using any external dependencies to add colors, because it all boils down to having supports-color included into the project and I don't feel like forcing this in any way.

I'm willinbg to work more on this to get it to the point where the users can feel a positive change in ttheir experience with the error repoting.

Please leave your feedback! 👐

Here's an example of how the error messages might look with this PR incliuded:

Node

screen shot 2018-04-06 at 11 06 33

Browser(Chrome)

screen shot 2018-04-06 at 11 06 39

  • Pad the line numbers with spaces as necessary (and add a test for this)
  • Implement binary formatError as described above

Buffer error formatting

  • "line numbers" should be in hexadecimal
  • each "line" should be eight bytes
  • each group of four bytes should have two spaces after it so it's easier to
    read, making it sort of feel like paragraphs

@coveralls
Copy link

coveralls commented Apr 6, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 76690c7 on halfzebra:error-formatting into 1a4f6a9 on jneen:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 99.854% when pulling 343bbf0 on halfzebra:error-formatting into 74f10d6 on jneen:master.

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 6, 2018

Looks pretty good so far.

When you're done with this I would like to get code coverage back up to 100% before merging.

I think we need some discussion about what the formatting should look like for Buffer inputs (@theqabalist?)

Please use the map function defined in parsimmon.js since .map method doesn't exist in IE7 (which we still support for now).

Also not really sure about the newLine variable vs using "\n" or "\n\n" in a couple spots.

@theqabalist
Copy link
Collaborator

I'm not entirely sure the best way to format buffers, because they are not line oriented like a lot of practical string parsing is. So showing the whole buffer is probably really noisy. Maybe something like byte offsets plus bytes in a similar manner to what is being proposed from surrounding context like:

     | 1050 1051 1052 1053 1054 1055
0xFF | 0xFF    X 0xFF 0xFF 0xFF 0xFF
               ^
Expected byte 0x00.

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 6, 2018

are the 1050, 1051, etc the byte offsets? what is the 0xFF on the left in the gutter area?

@theqabalist
Copy link
Collaborator

theqabalist commented Apr 6, 2018

Yeah, those are byte offsets. I may have misinterpreted the gutter in the original formatting thing, but I assumed it was the actual value from the position, since the position had an x in it. In looking at it again, I suppose it could be the line number reference.

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 6, 2018

Maybe output similar to hexdump would make sense?

0000000 65 78 70 6f 72 74 20 63 6c 61 73 73 20 4c 6f 63
0000010 61 74 69 6f 6e 20 7b 0a 20 20 63 6f 6e 73 74 72
0000020 75 63 74 6f 72 28 0a 20 20 20 20 72 65 61 64 6f
0000030 6e 6c 79 20 6f 66 66 73 65 74 3a 20 6e 75 6d 62
0000040 65 72 2c 0a 20 20 20 20 72 65 61 64 6f 6e 6c 79
0000050 20 6c 69 6e 65 3a 20 6e 75 6d 62 65 72 2c 0a 20
0000060 20 20 20 72 65 61 64 6f 6e 6c 79 20 63 6f 6c 75
0000070 6d 6e 3a 20 6e 75 6d 62 65 72 0a 20 20 29 20 7b
0000080 7d 0a 0a 20 20 61 64 64 43 68 75 6e 6b 28 74 65
0000090 78 74 3a 20 73 74 72 69 6e 67 29 20 7b 0a 20 20

and then we can put ^^ pointing to the offending byte? (and still keep the | around the gutter)

@theqabalist
Copy link
Collaborator

I like that too, although I worry about length a bit. Like some of the messages I have parsed are like 1300 bytes, and displaying the entirety of it seems overwhelming.

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 7, 2018

Oh, I definitely meant not the entirety of it, haha. I like how hexdump does "16 bytes = 1 line" as the equivalence. Maybe even 8 would be good? But just treating n number of bytes as a "line" and giving "byte offsets" instead of line numbers seems legit (also lowercase hex formatting without 0x in front is good for readability)

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 7, 2018

So the idea is we would display n number of lines before an error (and maybe m number of lines after?) and a binary "line" is 16 bytes (or 8, not sure)

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 7, 2018

Thanks! I see two things left:

  • Pad the line numbers with spaces as necessary (and add a test for this)
  • Implement binary formatError as described above

padding example

-- Parsing failed ------------------------

   8 | }
   9 | const n = Math.floor(10 * Math.random());
> 10 | for (var x = 0; x < n; x++ {
     |                            ^

Expected one of the following:

foo, bar, baz, quux

@halfzebra
Copy link
Contributor Author

Thanks for the feedback, I'll look into it asap. 👍

@halfzebra
Copy link
Contributor Author

Just an update, I'm still working on this 🙂

@wavebeem
Copy link
Collaborator

wavebeem commented Apr 16, 2018 via email

@wavebeem
Copy link
Collaborator

@halfzebra are you still working on this? no problem if you still don't have time; i might start looking into it too.

@halfzebra
Copy link
Contributor Author

halfzebra commented May 7, 2018

@wavebeem Hi Brian, I just have pushed the latest implementation for string parser to support the line number padding.

Looking into the binary formatError now. Please let me know what you think!

src/parsimmon.js Outdated
var showToLineIndex =
lineWithErrorIndex + 3 > inputLinesLength
? inputLinesLength
: lineWithErrorIndex + 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are -2 and +3 the number of lines of context to show before and after the error message? I think putting them in variables would help make it more clear what's going on here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do!

var answer = Parsimmon.formatError(input, parser.parse(input));

assert.deepEqual(answer, expectation);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding test cases for 3 and 4 digit line numbers? This is looking good but it would be nice to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I'll include that kind of test asap.

@wavebeem
Copy link
Collaborator

Coming along nicely 😄

@wavebeem
Copy link
Collaborator

what are the odds i'm going through github and see you just pushed 33 seconds ago?! i hadn't even gotten the email yet 😄

@halfzebra
Copy link
Contributor Author

Finally made the first draft of the buffer parsing error formatter, please let me know if there's anything wrong with the implementation.

I will put some more time into this and hopefully finish it soon 🙂

Copy link
Collaborator

@wavebeem wavebeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple questions and one change, please :)

[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0],
[0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]
)
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, this is a cool trick to get the array formatted how you like :]

" 8 | 0 0 0 0 0 0 0 0 0 0\n" +
" 9 | 0 0 0 0 0 0 0 0 0 0\n" +
" 10 | 0 0 0 0 0 0 0 0 0 0\n" +
" 11 | 0 0 0 0 0 0 0 0 0 0\n" +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think each hex byte should be written as two digits, like 00 instead of 0 so that it is easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will address this asap! 👍

src/parsimmon.js Outdated
}
return {
from: byteRange.from / bytesPerLine,
to: byteRange.to / bytesPerLine
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to do like a Math.floor(from/ perLine) here so its stays an integer? seems like this could end up with weird values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, thanks for pointing that out! 👍

@wavebeem
Copy link
Collaborator

wavebeem commented Jun 19, 2018

I've been looking at how some hex editors work for displaying information in a
more human friendly way and there's a couple things I'd like to see still:

  • "line numbers" should be in hexadecimal
  • each "line" should be eight bytes
  • each group of four bytes should have two spaces after it so it's easier to
    read, making it sort of feel like paragraphs

before

   20 | 00 00 00 00 00 00 00 00 00 00
   30 | 00 00 00 00 00 00 00 00 00 00
   40 | 00 00 00 00 00 00 00 00 00 00
   50 | 00 00 00 00 00 00 00 00 00 00
   60 | 00 00 00 00 00 00 00 00 00 00
>  70 | 00 00 ff 00 00 00 00 00 00 00
      |       ^^
   80 | 00 00 00 00 00 00 00 00 00 00
   90 | 00 00 00 00 00 00 00 00 00 00
  100 | 00 00 00 00 00 00 00 00 00 00
  110 | 00 00 00 00 00 00 00 00 00 00

after

   20 | 00 00 00 00  00 00 00 00
   30 | 00 00 00 00  00 00 00 00
   40 | 00 00 00 00  00 00 00 00
   50 | 00 00 00 00  00 00 00 00
   60 | 00 00 00 00  00 00 00 00
>  70 | 00 00 ff 00  00 00 00 00
      |       ^^
   80 | 00 00 00 00  00 00 00 00
   90 | 00 00 00 00  00 00 00 00
   a0 | 00 00 00 00  00 00 00 00
   b0 | 00 00 00 00  00 00 00 00

This way it's easy to visually jump to the specific bytes without having to
count, and also 4-byte groupings are super common anyway (32-bit integers). And
then everything is hex-based and consistent, like in the hex editors I'm looking
at.

Also if you get tired of this I'm always happy to finish it up too, I know I can
be a little picky when it comes to new features.

image

@anko
Copy link
Contributor

anko commented Jun 19, 2018

My favourite hobby is ruining parties by bringing up the full extent and horror of Unicode, so here goes! 🙌

Any column-sensitive formatting could be thrown off by characters that are specified as rendered at a different width, since this applies even in monospace:

I'm unsure whether this list is exhaustive, and I don't know how to find out.

To make matters worse, some of these are rendered differently in different terminals. For example, 'a​b' (with a zero-width space character between the a and b) is rendered like this in Firefox's console—

FF console representation

—but like this in alacritty (a terminal)—

alacritty representation

I've encountered terminals that render fullwidth characters as single-width, and some double-width. If some terminal uses a web browser as a renderer, who knows what it'll do. 🤷‍♀️ 🤷‍♂️


So if we want the ^ on the next line to point at the right thing in the output, and we can't rely on the characters being printed with any particular width, the only option I can see is to replace any non-unit-width characters in the column-sensitive part of the output with a '�' (U+FFFD; the Unicode replacement character), and add a warning to the end of the error text that explains this, and at least provides a mapping for the poor user in charge of figuring it out. Something like this:

Some Unicode characters were omitted due to display limitations, and appear as '�'.
These are, in order of appearance:
  - U+200B 'ZERO WIDTH SPACE': ​
  - U+FF21 'FULLWIDTH LATIN CAPITAL LETTER A': A

We'd need a fixed list of all the non-unit-width characters, Unicode codepoints, and names, which we can get from The Unicode Consortium's official listings. And then we hope they never feel like adding any more later.

@wavebeem
Copy link
Collaborator

heh, well at least this won't apply to to the buffer output format :)

do you know what other compilers and parsers do with this kind of input? i think it would at least be fairly edge case most of the time so not the end of the world if we don't address it yet

@halfzebra
Copy link
Contributor Author

@wavebeem thanks for the feedback, the byte buffer parsing error requirements are quite straightforward. I think I can get this done over the next weekend. I've never done any work with byte buffers, so it's a trial and error. I'll let you know!

@anko I'm afraid you haven't ruined the party 🙂It's surely an interesting case, but I feel like in this PR I have to focus on fixing the formatting first.

@halfzebra
Copy link
Contributor Author

Ready for a review! 🙂

Copy link
Collaborator

@wavebeem wavebeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a question on the one line where you didn't do rounding right before the one where you did.

This looks great, and thank you so much for all your work on the PR! I'm gonna take a little bit of time just to play with it before I merge it, but I should get time to look at it this weekend!

src/parsimmon.js Outdated
return "one of " + expected.join(", ");

return {
from: byteRange.from / bytesPerLine,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this doesn't need to be rounded also?

Copy link
Contributor Author

@halfzebra halfzebra Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because byteRange.from should always be dividable by bytesPerLine, since it's derived from:

// Removes the reminder from `i`
var byteLineWithErrorIndex = i - (i % bytesPerLine); 

var byteRange = rangeFromIndexAndOffsets(
  byteLineWithErrorIndex, // <- Used to calculate `byteRange.from`
  bytesBefore,
  bytesAfter + bytesPerLine,
  input.length
);

function rangeFromIndexAndOffsets(i, before, after, length) {
  return {
      // `before` and `i` are dividable by `bytesPerLine`
      from: i - before > 0 ? i - before : 0,
      to: i + after > length ? length : i + after
    };
}

Maybe this code should be improved, because this particular place is a bit indirect.

Please let me know if you still think there's a problem with this.

@wavebeem
Copy link
Collaborator

I noticed a bug when the range you're looking at for a binary parser includes byte offsets of multiple digit counts:

> P.seq(
  P.any.times(0xfff),
  P.Binary.byte(0x00)
).tryParse(fs.readFileSync("src/parsimmon.js"))

Error:
-- PARSING FAILED --------------------------------------------------

  fd0 | 69 6f 6e 28  70 61 72 73
  fd8 | 65 64 29 20  7b 0a 20 20
  fe0 | 20 20 76 61  72 20 6e 61
  fe8 | 6d 65 64 50  61 72 73 65
  ff0 | 64 20 3d 20  6d 61 70 28
> ff8 | 66 75 6e 63  74 69 6f 6e
      |                       ^^
  1000 | 28 6e 61 6d  65 2c 20 69
  1008 | 29 20 7b 0a  20 20 20 20
  1010 | 20 20 72 65  74 75 72 6e
  1018 | 20 5b 6e 61  6d 65 2c 20

Expected:

0x00

Notice how the ff8 and 1000 are not aligned correctly, and the line made of | doesn't line up either. It seems fine as long as we don't cross from a 3-4 digit boundry, 2-3 digit boundary, etc.

The bug doesn't happen with string parsers either.

Would you mind adding a test to cover this use case and fixing the issue?

Also, this feature is really cool! It's hard to imagine Parsimmon without this now. 😄

@wavebeem
Copy link
Collaborator

Playing with this a bit I think the problem is you forgot to multiply by 8 in this section. Changing it fixed the display issue for me:

  if (isBuffer(input)) {
    lastLineNumberLabelLength = (lineRange.to > 0
      ? 8 * lineRange.to - 1
      : 8 * lineRange.to
    ).toString(16).length;
    if (lastLineNumberLabelLength < 2) {
      lastLineNumberLabelLength = 2;
    }
  }

It's a little tricky there because the "line numbers" actually need to be multiplied by 8 since there's 8 bytes per line and it's actually a byte offset

@halfzebra
Copy link
Contributor Author

@wavebeem thanks for the feedback 👍

I have added the test and fixed the problem!

@wavebeem
Copy link
Collaborator

Fantastic, thanks! I think this is all good to go.

I'll merge this, update the changelog, release to npm, and tweet about it soon.

Would you like me to @ mention you in the tweet?

@halfzebra
Copy link
Contributor Author

Thands, that would be great!

@wavebeem wavebeem merged commit 7345dac into jneen:master Jun 23, 2018
@halfzebra halfzebra deleted the error-formatting branch June 23, 2018 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants