Proposal: default token types #98

nathan · 2018-08-23T18:55:41Z

Upon further consideration, #88 seems like a good idea. Markdown(ish) syntaxes are the obvious motivating example: arbitrary text with certain embedded sequences have special meanings, and incomplete special sequences should be passed through verbatim.

const lexer = moo.compile({
  para: {lineBreaks: true, match: /(?:\r?\n|\r){2,}/},
  issu: {match: /#\d+/, value: s => s.slice(1)},
  lstr: /\*\*(?=\S)|__(?=\S)/,
  rstr: /\*\*(?=\s|$)|__(?=\s|$)/,
  escp: {match: /\\./, value: s => s.slice(1)},
  text: moo.default,
})

lexer.reset(`
Upon **further consideration,** #88 seems like a good idea.

Markdown(ish) syntaxes are the obvious motivating example…
`.trim())

console.log([...lexer]) /*
[ { type: 'text', value: 'Upon ' },
  { type: 'lstr', value: '**' },
  { type: 'text', value: 'further consideration,' },
  { type: 'rstr', value: '**' },
  { type: 'text', value: ' ' },
  { type: 'issu', value: '88' },
  { type: 'text', value: ' seems like a good idea.' },
  { type: 'para', value: '\n\n' },
  { type: 'text', value: 'Markdown(ish) syntaxes are the obvious motivating example…' } ]
*/

@tjvr Feel free to bikeshed the name. (fill might be better?)

moranje · 2018-08-23T19:15:46Z

Sounds great to me, this will make the parsing of my text based language a lot easier. Two things:

The name moo.default might clash with when the node module is being read as an ES6 module e.g. const moo = require('moo').default. I would go with defaultToken but anything is okay really.
Is the order important when specifying a default token?

nathan · 2018-08-23T19:38:55Z

The name moo.default might clash with when the node module is being read as an ES6 module

AFAIK the de facto rule is to do that conservatively or not at all when the exports don't contain __esModule: true. But again, there's probably a better name than default regardless.

Is the order important when specifying a default token?

No; this matches the behavior of moo.error.

tjvr · 2018-08-26T15:08:34Z

This is a great idea! I'll need to think about the name. 😊

tjvr

This is great; thank you for writing it!

I'm concerned it makes things a bit more complicated, but I think that's okay; I'd just like to review it carefully.

Could we try renaming default to fallback? I'll then have a good look at this! 🙌

tjvr · 2018-08-26T14:58:39Z

moo.js

      value: null,
      getType: null,
+      shouldThrow: false,


Why do we have both shouldThrow and error?

error is like fallback but it consumes until the end of the buffer instead of just to the next valid token. shouldThrow throws a syntax error instead of returning the error token.

tjvr · 2018-08-26T15:01:19Z

moo.js

-    var suffix = hasSticky ? '' : '|(?:)'
-    var flags = hasSticky ? 'ym' : 'gm'
+    var defaultRule = errorRule && errorRule.default
+    var suffix = hasSticky || defaultRule ? '' : '|(?:)'


I note this will conflict with your other PR 😞

tjvr · 2018-08-26T15:02:43Z

moo.js

@@ -271,6 +277,8 @@
      line: this.line,
      col: this.col,
      state: this.state,
+      queued: this.queued,


minor: I think I'd prefer queuedToken

moo.js

@@ -123,6 +125,7 @@
    return options
  }

+  var defaultErrorRule = ruleOptions('error', {lineBreaks: true, shouldThrow: true})


It doesn't appear in other error messages

tjvr · 2018-08-28T08:00:22Z

Just to confirm: will /foo|bar/g try and match foo at each index in the buffer, and only once that fails, attempt to match bar? (Which would be bad.)

_{Sent with GitHawk}

nathan · 2018-08-28T11:05:42Z

@tjvr No. The exec algorithm explicitly works by attempting to match the RegExp at each string index (AdvanceStringIndex is just +1 for non-unicode RegExps), so it will find the earliest match of any complete path through the RegExp, including the current lastIndex if there is a match there.

tjvr · 2018-08-31T22:06:15Z

I love how simple this is. ❤️

moranje · 2018-09-01T10:25:21Z

Just tested this on my codebase, it's working as intended.

nathan mentioned this pull request Aug 23, 2018

Specifying a default token #88

Closed

nathan requested a review from tjvr August 23, 2018 18:58

tjvr reviewed Aug 26, 2018

View reviewed changes

nathan force-pushed the default branch from dec208c to fa1b09a Compare August 27, 2018 15:59

nathan added 8 commits August 27, 2018 11:59

moo.default

2e9f945

Simplify default rule handling

fd3924a

Fix queuedThrow

6e207f5

Simplify shouldThrow flow

7db44df

Remove unnecessary colon in error message

b87dbad

It doesn't appear in other error messages

Test default tokens

fa1b09a

Rename moo.{default => fallback}

83437f6

Rename queued{ => Token}

33973d9

tjvr mentioned this pull request Aug 31, 2018

Revert "Fast case for single characters" #102

Merged

Add some comments

0bf3570

tjvr merged commit 0925463 into master Aug 31, 2018

tjvr deleted the default branch August 31, 2018 22:05

tjvr mentioned this pull request Nov 9, 2018

Document fallback tokens #112

Open

tjvr mentioned this pull request Feb 5, 2019

Can't get the offset property of the Error while calling feed kach/nearley#424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: default token types #98

Proposal: default token types #98

nathan commented Aug 23, 2018 •

edited

Loading

moranje commented Aug 23, 2018

nathan commented Aug 23, 2018 •

edited

Loading

tjvr commented Aug 26, 2018

tjvr left a comment

tjvr Aug 26, 2018

nathan Aug 27, 2018

tjvr Aug 26, 2018

nathan Aug 27, 2018

tjvr Aug 26, 2018

nathan Aug 27, 2018

This comment was marked as resolved.

This comment was marked as resolved.

tjvr commented Aug 28, 2018

nathan commented Aug 28, 2018

tjvr commented Aug 31, 2018

moranje commented Sep 1, 2018

Proposal: default token types #98

Proposal: default token types #98

Conversation

nathan commented Aug 23, 2018 • edited Loading

moranje commented Aug 23, 2018

nathan commented Aug 23, 2018 • edited Loading

tjvr commented Aug 26, 2018

tjvr left a comment

Choose a reason for hiding this comment

tjvr Aug 26, 2018

Choose a reason for hiding this comment

nathan Aug 27, 2018

Choose a reason for hiding this comment

tjvr Aug 26, 2018

Choose a reason for hiding this comment

nathan Aug 27, 2018

Choose a reason for hiding this comment

tjvr Aug 26, 2018

Choose a reason for hiding this comment

nathan Aug 27, 2018

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

tjvr commented Aug 28, 2018

nathan commented Aug 28, 2018

tjvr commented Aug 31, 2018

moranje commented Sep 1, 2018

nathan commented Aug 23, 2018 •

edited

Loading

nathan commented Aug 23, 2018 •

edited

Loading