Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: default token types #98

Merged
merged 9 commits into from
Aug 31, 2018
Merged

Proposal: default token types #98

merged 9 commits into from
Aug 31, 2018

Conversation

nathan
Copy link
Collaborator

@nathan nathan commented Aug 23, 2018

Upon further consideration, #88 seems like a good idea. Markdown(ish) syntaxes are the obvious motivating example: arbitrary text with certain embedded sequences have special meanings, and incomplete special sequences should be passed through verbatim.

const lexer = moo.compile({
  para: {lineBreaks: true, match: /(?:\r?\n|\r){2,}/},
  issu: {match: /#\d+/, value: s => s.slice(1)},
  lstr: /\*\*(?=\S)|__(?=\S)/,
  rstr: /\*\*(?=\s|$)|__(?=\s|$)/,
  escp: {match: /\\./, value: s => s.slice(1)},
  text: moo.default,
})

lexer.reset(`
Upon **further consideration,** #88 seems like a good idea.

Markdown(ish) syntaxes are the obvious motivating example…
`.trim())

console.log([...lexer]) /*
[ { type: 'text', value: 'Upon ' },
  { type: 'lstr', value: '**' },
  { type: 'text', value: 'further consideration,' },
  { type: 'rstr', value: '**' },
  { type: 'text', value: ' ' },
  { type: 'issu', value: '88' },
  { type: 'text', value: ' seems like a good idea.' },
  { type: 'para', value: '\n\n' },
  { type: 'text', value: 'Markdown(ish) syntaxes are the obvious motivating example…' } ]
*/

@tjvr Feel free to bikeshed the name. (fill might be better?)

@nathan nathan mentioned this pull request Aug 23, 2018
@nathan nathan requested a review from tjvr August 23, 2018 18:58
@moranje
Copy link
Contributor

moranje commented Aug 23, 2018

Sounds great to me, this will make the parsing of my text based language a lot easier. Two things:

  1. The name moo.default might clash with when the node module is being read as an ES6 module e.g. const moo = require('moo').default. I would go with defaultToken but anything is okay really.
  2. Is the order important when specifying a default token?

@nathan
Copy link
Collaborator Author

nathan commented Aug 23, 2018

The name moo.default might clash with when the node module is being read as an ES6 module

AFAIK the de facto rule is to do that conservatively or not at all when the exports don't contain __esModule: true. But again, there's probably a better name than default regardless.

Is the order important when specifying a default token?

No; this matches the behavior of moo.error.

@tjvr
Copy link
Collaborator

tjvr commented Aug 26, 2018

This is a great idea! I'll need to think about the name. 😊

Copy link
Collaborator

@tjvr tjvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great; thank you for writing it!

I'm concerned it makes things a bit more complicated, but I think that's okay; I'd just like to review it carefully.

Could we try renaming default to fallback? I'll then have a good look at this! 🙌

moo.js Outdated
value: null,
getType: null,
shouldThrow: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have both shouldThrow and error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error is like fallback but it consumes until the end of the buffer instead of just to the next valid token. shouldThrow throws a syntax error instead of returning the error token.

moo.js Outdated
var suffix = hasSticky ? '' : '|(?:)'
var flags = hasSticky ? 'ym' : 'gm'
var defaultRule = errorRule && errorRule.default
var suffix = hasSticky || defaultRule ? '' : '|(?:)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note this will conflict with your other PR 😞

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased.

moo.js Outdated
@@ -271,6 +277,8 @@
line: this.line,
col: this.col,
state: this.state,
queued: this.queued,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think I'd prefer queuedToken

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

@@ -123,6 +125,7 @@
return options
}

var defaultErrorRule = ruleOptions('error', {lineBreaks: true, shouldThrow: true})

This comment was marked as resolved.

This comment was marked as resolved.

@tjvr
Copy link
Collaborator

tjvr commented Aug 28, 2018

Just to confirm: will /foo|bar/g try and match foo at each index in the buffer, and only once that fails, attempt to match bar? (Which would be bad.)

Sent with GitHawk

@nathan
Copy link
Collaborator Author

nathan commented Aug 28, 2018

@tjvr No. The exec algorithm explicitly works by attempting to match the RegExp at each string index (AdvanceStringIndex is just +1 for non-unicode RegExps), so it will find the earliest match of any complete path through the RegExp, including the current lastIndex if there is a match there.

@tjvr tjvr merged commit 0925463 into master Aug 31, 2018
@tjvr tjvr deleted the default branch August 31, 2018 22:05
@tjvr
Copy link
Collaborator

tjvr commented Aug 31, 2018

I love how simple this is. ❤️

@moranje
Copy link
Contributor

moranje commented Sep 1, 2018

Just tested this on my codebase, it's working as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants