Error formatting #240

halfzebra · 2018-04-06T17:13:49Z

This PR addresses #239

I'm not using any external dependencies to add colors, because it all boils down to having supports-color included into the project and I don't feel like forcing this in any way.

I'm willinbg to work more on this to get it to the point where the users can feel a positive change in ttheir experience with the error repoting.

Please leave your feedback! 👐

Here's an example of how the error messages might look with this PR incliuded:

Node

Browser(Chrome)

Pad the line numbers with spaces as necessary (and add a test for this)
Implement binary formatError as described above

Buffer error formatting

"line numbers" should be in hexadecimal
each "line" should be eight bytes
each group of four bytes should have two spaces after it so it's easier to
read, making it sort of feel like paragraphs

coveralls · 2018-04-06T17:20:31Z

Coverage remained the same at 100.0% when pulling 76690c7 on halfzebra:error-formatting into 1a4f6a9 on jneen:master.

coveralls · 2018-04-06T17:20:31Z

Coverage decreased (-0.1%) to 99.854% when pulling 343bbf0 on halfzebra:error-formatting into 74f10d6 on jneen:master.

wavebeem · 2018-04-06T17:33:25Z

Looks pretty good so far.

When you're done with this I would like to get code coverage back up to 100% before merging.

I think we need some discussion about what the formatting should look like for Buffer inputs (@theqabalist?)

Please use the map function defined in parsimmon.js since .map method doesn't exist in IE7 (which we still support for now).

Also not really sure about the newLine variable vs using "\n" or "\n\n" in a couple spots.

theqabalist · 2018-04-06T18:48:13Z

I'm not entirely sure the best way to format buffers, because they are not line oriented like a lot of practical string parsing is. So showing the whole buffer is probably really noisy. Maybe something like byte offsets plus bytes in a similar manner to what is being proposed from surrounding context like:

     | 1050 1051 1052 1053 1054 1055
0xFF | 0xFF    X 0xFF 0xFF 0xFF 0xFF
               ^
Expected byte 0x00.

wavebeem · 2018-04-06T20:37:20Z

are the 1050, 1051, etc the byte offsets? what is the 0xFF on the left in the gutter area?

theqabalist · 2018-04-06T23:17:51Z

Yeah, those are byte offsets. I may have misinterpreted the gutter in the original formatting thing, but I assumed it was the actual value from the position, since the position had an x in it. In looking at it again, I suppose it could be the line number reference.

wavebeem · 2018-04-06T23:23:51Z

Maybe output similar to hexdump would make sense?

0000000 65 78 70 6f 72 74 20 63 6c 61 73 73 20 4c 6f 63
0000010 61 74 69 6f 6e 20 7b 0a 20 20 63 6f 6e 73 74 72
0000020 75 63 74 6f 72 28 0a 20 20 20 20 72 65 61 64 6f
0000030 6e 6c 79 20 6f 66 66 73 65 74 3a 20 6e 75 6d 62
0000040 65 72 2c 0a 20 20 20 20 72 65 61 64 6f 6e 6c 79
0000050 20 6c 69 6e 65 3a 20 6e 75 6d 62 65 72 2c 0a 20
0000060 20 20 20 72 65 61 64 6f 6e 6c 79 20 63 6f 6c 75
0000070 6d 6e 3a 20 6e 75 6d 62 65 72 0a 20 20 29 20 7b
0000080 7d 0a 0a 20 20 61 64 64 43 68 75 6e 6b 28 74 65
0000090 78 74 3a 20 73 74 72 69 6e 67 29 20 7b 0a 20 20

and then we can put ^^ pointing to the offending byte? (and still keep the | around the gutter)

theqabalist · 2018-04-07T04:03:20Z

I like that too, although I worry about length a bit. Like some of the messages I have parsed are like 1300 bytes, and displaying the entirety of it seems overwhelming.

wavebeem · 2018-04-07T04:22:51Z

Oh, I definitely meant not the entirety of it, haha. I like how hexdump does "16 bytes = 1 line" as the equivalence. Maybe even 8 would be good? But just treating n number of bytes as a "line" and giving "byte offsets" instead of line numbers seems legit (also lowercase hex formatting without 0x in front is good for readability)

wavebeem · 2018-04-07T04:23:54Z

So the idea is we would display n number of lines before an error (and maybe m number of lines after?) and a binary "line" is 16 bytes (or 8, not sure)

wavebeem · 2018-04-07T16:04:40Z

Thanks! I see two things left:

Pad the line numbers with spaces as necessary (and add a test for this)
Implement binary formatError as described above

padding example

-- Parsing failed ------------------------

   8 | }
   9 | const n = Math.floor(10 * Math.random());
> 10 | for (var x = 0; x < n; x++ {
     |                            ^

Expected one of the following:

foo, bar, baz, quux

halfzebra · 2018-04-07T18:28:23Z

Thanks for the feedback, I'll look into it asap. 👍

halfzebra · 2018-04-16T09:00:08Z

Just an update, I'm still working on this 🙂

wavebeem · 2018-04-16T11:22:59Z

No rush 🙂

…

On Mon, Apr 16, 2018, 02:00 Eduard Kyvenko ***@***.***> wrote: Just an update, I'm still working on this 🙂 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKyr5TxNAcImWlYn-pjQYti8obQH0Q3ks5tpF2dgaJpZM4TKaCm> .

wavebeem · 2018-04-28T17:10:47Z

@halfzebra are you still working on this? no problem if you still don't have time; i might start looking into it too.

halfzebra · 2018-05-07T07:24:43Z

@wavebeem Hi Brian, I just have pushed the latest implementation for string parser to support the line number padding.

Looking into the binary formatError now. Please let me know what you think!

wavebeem · 2018-05-07T23:05:42Z

src/parsimmon.js

+  var showToLineIndex =
+    lineWithErrorIndex + 3 > inputLinesLength
+      ? inputLinesLength
+      : lineWithErrorIndex + 3;


Are -2 and +3 the number of lines of context to show before and after the error message? I think putting them in variables would help make it more clear what's going on here.

Good point, will do!

wavebeem · 2018-05-07T23:07:06Z

test/core/formatError.test.js

+    var answer = Parsimmon.formatError(input, parser.parse(input));
+
+    assert.deepEqual(answer, expectation);
+  });


Would you mind adding test cases for 3 and 4 digit line numbers? This is looking good but it would be nice to have.

Thanks for the feedback, I'll include that kind of test asap.

wavebeem · 2018-05-10T18:48:48Z

Coming along nicely 😄

wavebeem · 2018-06-17T18:59:01Z

what are the odds i'm going through github and see you just pushed 33 seconds ago?! i hadn't even gotten the email yet 😄

halfzebra · 2018-06-17T19:04:30Z

Finally made the first draft of the buffer parsing error formatter, please let me know if there's anything wrong with the implementation.

I will put some more time into this and hopefully finish it soon 🙂

wavebeem

a couple questions and one change, please :)

wavebeem · 2018-06-17T19:07:33Z

test/core/formatError.test.js

+        [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0],
+        [0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0]
+      )
+    );


ooh, this is a cool trick to get the array formatted how you like :]

wavebeem · 2018-06-17T19:09:12Z

test/core/formatError.test.js

+      "   8 | 0 0  0 0 0 0 0 0 0 0\n" +
+      "   9 | 0 0  0 0 0 0 0 0 0 0\n" +
+      "  10 | 0 0  0 0 0 0 0 0 0 0\n" +
+      "  11 | 0 0  0 0 0 0 0 0 0 0\n" +


i think each hex byte should be written as two digits, like 00 instead of 0 so that it is easier to read

Good point, I will address this asap! 👍

wavebeem · 2018-06-17T19:10:18Z

src/parsimmon.js

+  }
+  return {
+    from: byteRange.from / bytesPerLine,
+    to: byteRange.to / bytesPerLine


do you need to do like a Math.floor(from/ perLine) here so its stays an integer? seems like this could end up with weird values

It could, thanks for pointing that out! 👍

wavebeem · 2018-06-19T16:22:59Z

I've been looking at how some hex editors work for displaying information in a
more human friendly way and there's a couple things I'd like to see still:

"line numbers" should be in hexadecimal
each "line" should be eight bytes
each group of four bytes should have two spaces after it so it's easier to
read, making it sort of feel like paragraphs

before

   20 | 00 00 00 00 00 00 00 00 00 00
   30 | 00 00 00 00 00 00 00 00 00 00
   40 | 00 00 00 00 00 00 00 00 00 00
   50 | 00 00 00 00 00 00 00 00 00 00
   60 | 00 00 00 00 00 00 00 00 00 00
>  70 | 00 00 ff 00 00 00 00 00 00 00
      |       ^^
   80 | 00 00 00 00 00 00 00 00 00 00
   90 | 00 00 00 00 00 00 00 00 00 00
  100 | 00 00 00 00 00 00 00 00 00 00
  110 | 00 00 00 00 00 00 00 00 00 00

after

   20 | 00 00 00 00  00 00 00 00
   30 | 00 00 00 00  00 00 00 00
   40 | 00 00 00 00  00 00 00 00
   50 | 00 00 00 00  00 00 00 00
   60 | 00 00 00 00  00 00 00 00
>  70 | 00 00 ff 00  00 00 00 00
      |       ^^
   80 | 00 00 00 00  00 00 00 00
   90 | 00 00 00 00  00 00 00 00
   a0 | 00 00 00 00  00 00 00 00
   b0 | 00 00 00 00  00 00 00 00

This way it's easy to visually jump to the specific bytes without having to
count, and also 4-byte groupings are super common anyway (32-bit integers). And
then everything is hex-based and consistent, like in the hex editors I'm looking
at.

Also if you get tired of this I'm always happy to finish it up too, I know I can
be a little picky when it comes to new features.

anko · 2018-06-19T21:20:31Z

My favourite hobby is ruining parties by bringing up the full extent and horror of Unicode, so here goes! 🙌

Any column-sensitive formatting could be thrown off by characters that are specified as rendered at a different width, since this applies even in monospace:

the zero-width space, and
fullwidth forms in Asian scripts.

I'm unsure whether this list is exhaustive, and I don't know how to find out.

To make matters worse, some of these are rendered differently in different terminals. For example, 'ab' (with a zero-width space character between the a and b) is rendered like this in Firefox's console—

—but like this in alacritty (a terminal)—

I've encountered terminals that render fullwidth characters as single-width, and some double-width. If some terminal uses a web browser as a renderer, who knows what it'll do. 🤷‍♀️ 🤷‍♂️

So if we want the ^ on the next line to point at the right thing in the output, and we can't rely on the characters being printed with any particular width, the only option I can see is to replace any non-unit-width characters in the column-sensitive part of the output with a '�' (U+FFFD; the Unicode replacement character), and add a warning to the end of the error text that explains this, and at least provides a mapping for the poor user in charge of figuring it out. Something like this:

Some Unicode characters were omitted due to display limitations, and appear as '�'.
These are, in order of appearance:
  - U+200B 'ZERO WIDTH SPACE': 
  - U+FF21 'FULLWIDTH LATIN CAPITAL LETTER A': Ａ

We'd need a fixed list of all the non-unit-width characters, Unicode codepoints, and names, which we can get from The Unicode Consortium's official listings. And then we hope they never feel like adding any more later.

wavebeem · 2018-06-19T21:23:59Z

heh, well at least this won't apply to to the buffer output format :)

do you know what other compilers and parsers do with this kind of input? i think it would at least be fairly edge case most of the time so not the end of the world if we don't address it yet

halfzebra · 2018-06-20T10:28:01Z

@wavebeem thanks for the feedback, the byte buffer parsing error requirements are quite straightforward. I think I can get this done over the next weekend. I've never done any work with byte buffers, so it's a trial and error. I'll let you know!

@anko I'm afraid you haven't ruined the party 🙂It's surely an interesting case, but I feel like in this PR I have to focus on fixing the formatting first.

…verage

…ith error marker.

…age up to 100% by removing unreachable edge-cases.

…ffers.

…the additional separator.

halfzebra · 2018-06-20T20:33:28Z

Ready for a review! 🙂

wavebeem

Just had a question on the one line where you didn't do rounding right before the one where you did.

This looks great, and thank you so much for all your work on the PR! I'm gonna take a little bit of time just to play with it before I merge it, but I should get time to look at it this weekend!

wavebeem · 2018-06-21T04:56:59Z

src/parsimmon.js

-  return "one of " + expected.join(", ");
+
+  return {
+    from: byteRange.from / bytesPerLine,


Is there a reason this doesn't need to be rounded also?

It's because byteRange.from should always be dividable by bytesPerLine, since it's derived from:

// Removes the reminder from `i` var byteLineWithErrorIndex = i - (i % bytesPerLine); var byteRange = rangeFromIndexAndOffsets( byteLineWithErrorIndex, // <- Used to calculate `byteRange.from` bytesBefore, bytesAfter + bytesPerLine, input.length ); function rangeFromIndexAndOffsets(i, before, after, length) { return { // `before` and `i` are dividable by `bytesPerLine` from: i - before > 0 ? i - before : 0, to: i + after > length ? length : i + after }; }

Maybe this code should be improved, because this particular place is a bit indirect.

Please let me know if you still think there's a problem with this.

wavebeem · 2018-06-23T15:25:25Z

I noticed a bug when the range you're looking at for a binary parser includes byte offsets of multiple digit counts:

> P.seq(
  P.any.times(0xfff),
  P.Binary.byte(0x00)
).tryParse(fs.readFileSync("src/parsimmon.js"))

Error:
-- PARSING FAILED --------------------------------------------------

  fd0 | 69 6f 6e 28  70 61 72 73
  fd8 | 65 64 29 20  7b 0a 20 20
  fe0 | 20 20 76 61  72 20 6e 61
  fe8 | 6d 65 64 50  61 72 73 65
  ff0 | 64 20 3d 20  6d 61 70 28
> ff8 | 66 75 6e 63  74 69 6f 6e
      |                       ^^
  1000 | 28 6e 61 6d  65 2c 20 69
  1008 | 29 20 7b 0a  20 20 20 20
  1010 | 20 20 72 65  74 75 72 6e
  1018 | 20 5b 6e 61  6d 65 2c 20

Expected:

0x00

Notice how the ff8 and 1000 are not aligned correctly, and the line made of | doesn't line up either. It seems fine as long as we don't cross from a 3-4 digit boundry, 2-3 digit boundary, etc.

The bug doesn't happen with string parsers either.

Would you mind adding a test to cover this use case and fixing the issue?

Also, this feature is really cool! It's hard to imagine Parsimmon without this now. 😄

wavebeem · 2018-06-23T18:53:50Z

Playing with this a bit I think the problem is you forgot to multiply by 8 in this section. Changing it fixed the display issue for me:

  if (isBuffer(input)) {
    lastLineNumberLabelLength = (lineRange.to > 0
      ? 8 * lineRange.to - 1
      : 8 * lineRange.to
    ).toString(16).length;
    if (lastLineNumberLabelLength < 2) {
      lastLineNumberLabelLength = 2;
    }
  }

It's a little tricky there because the "line numbers" actually need to be multiplied by 8 since there's 8 bytes per line and it's actually a byte offset

halfzebra · 2018-06-23T19:39:51Z

@wavebeem thanks for the feedback 👍

I have added the test and fixed the problem!

wavebeem · 2018-06-23T20:44:48Z

Fantastic, thanks! I think this is all good to go.

I'll merge this, update the changelog, release to npm, and tweet about it soon.

Would you like me to @ mention you in the tweet?

halfzebra · 2018-06-23T21:30:15Z

Thands, that would be great!

wavebeem reviewed May 7, 2018

View reviewed changes

wavebeem reviewed Jun 17, 2018

View reviewed changes

halfzebra added 5 commits June 20, 2018 15:44

Add better error formatting for general use.

025a0bb

Adjust the tests to the new error formatting.

b23519d

Use the existing map funciton and remove redundant variables

2cc0e48

Fix a bug in the error formatter and add a test to reach 100% code co…

414c1fb

…verage

Implement a smarter way to add padding to line numbers and the line w…

ff31984

…ith error marker.

halfzebra added 7 commits June 20, 2018 15:44

Improve the code clarity, add a few tests for larger inputs.

2c51dc5

First draft of the buffer error formatting.

47ef2a3

Fix the range conversion issue by rounding the result.

767391d

Fix the formatting issues for byte values and byte error labels.

bbbeefb

Remove dead code, simplify the buffer error formatting. Get the cover…

74a0f6b

…age up to 100% by removing unreachable edge-cases.

Fixed line label formatting inconsistencies for different sizes of bu…

cd67a10

…ffers.

Convert buffer line numbers to hex, display 8 bytes per line and add …

00bbfcc

…the additional separator.

halfzebra force-pushed the error-formatting branch from 3f773a9 to 00bbfcc Compare June 20, 2018 13:44

Fix lint errors.

28263c0

halfzebra force-pushed the error-formatting branch from 9db8b61 to 28263c0 Compare June 20, 2018 14:12

wavebeem approved these changes Jun 21, 2018

View reviewed changes

Added a test for error formatting on large buffers.

44b3967

Fix the byte offset number formatting for large buffers.

76690c7

wavebeem merged commit 7345dac into jneen:master Jun 23, 2018

halfzebra deleted the error-formatting branch June 23, 2018 22:31

halfzebra mentioned this pull request Jun 25, 2018

Handle ambiguous unicode character width in error output #263

Closed

Error formatting #240

Error formatting #240

Conversation

halfzebra commented Apr 6, 2018 • edited Loading

Node

Browser(Chrome)

Buffer error formatting

coveralls commented Apr 6, 2018 • edited Loading

coveralls commented Apr 6, 2018

wavebeem commented Apr 6, 2018

theqabalist commented Apr 6, 2018

wavebeem commented Apr 6, 2018

theqabalist commented Apr 6, 2018 • edited Loading

wavebeem commented Apr 6, 2018

theqabalist commented Apr 7, 2018

wavebeem commented Apr 7, 2018

wavebeem commented Apr 7, 2018

wavebeem commented Apr 7, 2018 • edited Loading

halfzebra commented Apr 7, 2018

halfzebra commented Apr 16, 2018

wavebeem commented Apr 16, 2018 via email

wavebeem commented Apr 28, 2018

halfzebra commented May 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wavebeem commented May 10, 2018

wavebeem commented Jun 17, 2018

halfzebra commented Jun 17, 2018

wavebeem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wavebeem commented Jun 19, 2018 • edited Loading

before

after

anko commented Jun 19, 2018 • edited Loading

wavebeem commented Jun 19, 2018

halfzebra commented Jun 20, 2018

halfzebra commented Jun 20, 2018

wavebeem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halfzebra Jun 21, 2018 • edited Loading

Choose a reason for hiding this comment

wavebeem commented Jun 23, 2018

wavebeem commented Jun 23, 2018

halfzebra commented Jun 23, 2018

wavebeem commented Jun 23, 2018

halfzebra commented Jun 23, 2018

halfzebra commented Apr 6, 2018 •

edited

Loading

coveralls commented Apr 6, 2018 •

edited

Loading

theqabalist commented Apr 6, 2018 •

edited

Loading

wavebeem commented Apr 7, 2018 •

edited

Loading

halfzebra commented May 7, 2018 •

edited

Loading

wavebeem commented Jun 19, 2018 •

edited

Loading

anko commented Jun 19, 2018 •

edited

Loading

halfzebra Jun 21, 2018 •

edited

Loading