Error handling #18

tjvr · 2017-03-13T15:55:59Z

Currently, if the lexer finds an error, it returns a single ERRORTOKEN containing the whole of the rest of the buffer. This is suboptimal:

The name ERRORTOKEN is hard-coded (taken from Python). We probably want it user-customisable.
It returns many lines, making the resulting message hard to read.
It leaks the entire buffer (thanks to V8's implementation of slice() using SlicedStrings).

Other considerations:

Is an error token preferable to throwing an exception? (Probably—exceptions should be exceptional, and syntax errors in a parser aren't.)
If/when we have a Stream wrapper, we can only return an ERRORTOKEN if we know we've seen the end of the stream. Otherwise, it's possible seeing more data might complete a valid token.

The text was updated successfully, but these errors were encountered:

nathan · 2017-03-13T16:35:02Z

If/when we have a Stream wrapper

In its current state, the API is too generic to allow a useful stream wrapper. Consider the following language:

moo.compile({
  even: /(ab)+(?!a)/,
  odd: /(ab)*a/,
})

Suppose the stream emits aba, ba, bab. The correct tokenization of abababab is a single even token, but the lexer cannot determine that until it reaches the end of the stream; emitting any earlier will result in an incorrect token.

tjvr · 2017-03-13T16:41:22Z

Wow, good point; I'd missed that.

@Hardmath123 It sounds like we can't seamlessly integrate moo into nearley; at least not with proper feed() / Stream support. :(

In its current state

Am I to read you don't like the API? You helped design it! :-)

nathan · 2017-03-13T16:53:19Z

Am I to read you don't like the API? You helped design it! :-)

No, I do like the API! I was simply pointing out that it's incompatible with a stream API that doesn't re-lex the stream when it gets new data.

If you leave the task of pushing data only at token boundaries up to the user, then it's easy to add a stream interface (that's what feed() is, essentially). Packages like stream-split are good for, e.g., pushing the data line-by-line.

It's also possible, I guess, to re-lex the last token when we get new data, but you can still break this if you have two tokens which, when concatenated, form a different token; and this makes the API weird because you need to sometimes return a token twice.

nathan · 2017-03-17T16:26:35Z

Fixed by #25

tjvr added the enhancement label Mar 13, 2017

nathan mentioned this issue Mar 13, 2017

Streams API #20

Open

tjvr mentioned this issue Mar 14, 2017

Add error tokens #25

Merged

nathan closed this as completed Mar 17, 2017

tjvr mentioned this issue Mar 17, 2017

Add Lexer#stream() #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling #18

Error handling #18

tjvr commented Mar 13, 2017

nathan commented Mar 13, 2017

tjvr commented Mar 13, 2017

nathan commented Mar 13, 2017 •

edited

Loading

nathan commented Mar 17, 2017

Error handling #18

Error handling #18

Comments

tjvr commented Mar 13, 2017

nathan commented Mar 13, 2017

tjvr commented Mar 13, 2017

nathan commented Mar 13, 2017 • edited Loading

nathan commented Mar 17, 2017

nathan commented Mar 13, 2017 •

edited

Loading