Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling #18

Closed
tjvr opened this issue Mar 13, 2017 · 4 comments
Closed

Error handling #18

tjvr opened this issue Mar 13, 2017 · 4 comments

Comments

@tjvr
Copy link
Collaborator

tjvr commented Mar 13, 2017

Currently, if the lexer finds an error, it returns a single ERRORTOKEN containing the whole of the rest of the buffer. This is suboptimal:

  • The name ERRORTOKEN is hard-coded (taken from Python). We probably want it user-customisable.
  • It returns many lines, making the resulting message hard to read.
  • It leaks the entire buffer (thanks to V8's implementation of slice() using SlicedStrings).

Other considerations:

  • Is an error token preferable to throwing an exception? (Probably—exceptions should be exceptional, and syntax errors in a parser aren't.)
  • If/when we have a Stream wrapper, we can only return an ERRORTOKEN if we know we've seen the end of the stream. Otherwise, it's possible seeing more data might complete a valid token.
@nathan
Copy link
Collaborator

nathan commented Mar 13, 2017

If/when we have a Stream wrapper

In its current state, the API is too generic to allow a useful stream wrapper. Consider the following language:

moo.compile({
  even: /(ab)+(?!a)/,
  odd: /(ab)*a/,
})

Suppose the stream emits aba, ba, bab. The correct tokenization of abababab is a single even token, but the lexer cannot determine that until it reaches the end of the stream; emitting any earlier will result in an incorrect token.

@tjvr
Copy link
Collaborator Author

tjvr commented Mar 13, 2017

Wow, good point; I'd missed that.

@Hardmath123 It sounds like we can't seamlessly integrate moo into nearley; at least not with proper feed() / Stream support. :(

In its current state

Am I to read you don't like the API? You helped design it! :-)

@nathan
Copy link
Collaborator

nathan commented Mar 13, 2017

Am I to read you don't like the API? You helped design it! :-)

No, I do like the API! I was simply pointing out that it's incompatible with a stream API that doesn't re-lex the stream when it gets new data.

If you leave the task of pushing data only at token boundaries up to the user, then it's easy to add a stream interface (that's what feed() is, essentially). Packages like stream-split are good for, e.g., pushing the data line-by-line.

It's also possible, I guess, to re-lex the last token when we get new data, but you can still break this if you have two tokens which, when concatenated, form a different token; and this makes the API weird because you need to sometimes return a token twice.

@nathan nathan mentioned this issue Mar 13, 2017
@tjvr tjvr mentioned this issue Mar 14, 2017
@nathan
Copy link
Collaborator

nathan commented Mar 17, 2017

Fixed by #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants