-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: inheritance for stateful lexers #93
Conversation
This is cool, thanks for investigating this! My first thought is: wow, that's a fairly sophisticated example tokenizer you have there (the JS-template-like one). I'm not sure I could have written it myself; I wonder if we've made a tool that only you can use correctly ;-) (As a minor point of style, I'd have kept the
Can you explain what you mean by this? Order is reasonably important in Moo lexers, so it would be good to understand this (and make sure we get the semantics right). Is it that in normal behaviour, the rules are inserted in place of the I suppose using a dictionary for this means we can't allow multiple I'm tempted to recommend writing this as a separate transform -- e.g. |
Correct. So if
That's a good point. I can think of two solutions:
state: [
{include: 'comment'},
{name: 'id', match: /\w+/},
{include: 'std'},
// ...
],
state: {
include_comment: 1,
id: /\w+/,
include_std: 1,
// ...
}, (FWIW, The |
I'd missed your suggestion of I think this is cool, and includes are important! However, is there a good reason not to use JavaScript's object spread syntax? const ws = {
ws: { match: /\s+/, lineBreaks: true },
}
const comment = {
lc: /\/\/.+/,
bc: /\/\*[^]*?\*\//,
}
const std = {
...comment,
...ws,
id: /[A-Za-z]\w*/,
op: /[!=]==|\+[+=]?|-[-=]|<<=?|>>>?=?|&&?|\|\|?|[<>!=/*&|^%]=|[~!,/*^?:%]/,
tbeg: { match: /`(?:\\[^]|[^\\`])*?\${/, value: s => s.slice(1, -2), push: 'template' },
tsim: { match: /`(?:\\[^]|[^\\`])*?`/, value: s => s.slice(1, -1) },
str: { match: /'(?:\\[^]|[^\\'])*?'|"(?:\\[^]|[^\\"])*?"/, value: s => s.slice(1, -1) },
lbrace: { match: '{', push: 'brace'},
}
const main = {
...std,
}
const brace = {
...std,
rbrace: { match: '}', pop: 1 },
}
const template = {
...std,
tmid: { match: /}(?:\\[^]|[^\\`])*?\${/, value: s => s.slice(1, -2) },
tend: { match: /}(?:\\[^]|[^\\`])*?`/, value: s => s.slice(1, -1), pop: 1 },
}
const lexer = moo.states({
$all: { err: moo.error },
main, brace, template,
}) |
I'm not sure it's a good reason, but a reason is that this implements adding rather than replacing rules. Compare: moo.states({
base: {op: ['*', '/', '+', '-']},
mod: {include: 'base', op: ['%']},
}) where const base = {op: ['*', '/', '+', '-']}
const mod = {...base, op: ['%']}
moo.states({base, mod}) where |
I'm sold. I'm still not sure if this is the right interface, as discussed previously. But I'd like to merge this and play around with it for a bit. |
PR so you can play around with it. I'll add tests if this seems like the right idea.
This adds a new
include
rule, which allows you to include all the rules from another state.include
behaves as follows:Here's an example of parsing JS-like templates:
Here's how it behaves with cycles: