Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type transform #85

Merged
merged 7 commits into from
Sep 19, 2018
Merged

Add type transform #85

merged 7 commits into from
Sep 19, 2018

Conversation

tjvr
Copy link
Collaborator

@tjvr tjvr commented Mar 31, 2018

We recently added a value transform.

This PR:

  • Adds a type transform.
  • Exposes moo.keywords().
  • Removes Lexer#has().

The existing value transform takes the text and returns the value. By default, the text is used unchanged.

The new type transform takes the text and returns the type. By default, the type of the rule is used (e.g. identifier).

Example: case-insensitive keywords

This is my preferred solution for #67 / #78.

For example, you can create a customised version of moo.keywords which matches case-insensitively:

const caseInsensitiveKeywords = map => {
  const transform = moo.keywords(map)
  return text => transform(text.toLowerCase())
}

let lexer = compile({
  identifier: {
    match: /[a-zA-Z]+/,
    type: caseInsensitiveKeywords({
      keyword: ['class', 'def'],
    }),
  },
})

Lexer#has()

This unfortunately makes it impossible to write a Lexer#has function, since we can't infer what token names might be returned by this custom function.

This will make Moo incompatible with the current version of Nearley: we introduced has() so that we could tell whether %foo refers to a custom token matcher such as foo = { test: x => Number.isInteger(x) }, or a lexer token. But custom token matchers will likely be removed [from Nearley] going forward, so has() will have no use.

@tjvr
Copy link
Collaborator Author

tjvr commented Sep 18, 2018

I've rewritten this on top of the latest master.

Lexer#has() will now always return true, so most Nearley grammars should continue to work.

@houghtonap
Copy link

A few notes about the example and case insensitivity.

  1. I believe, in the example the code keyword: ['class', 'def'] should be keyword: ['CLASS', 'DEF] since the purpose of the const caseInsensitiveKeywords is to lower case the keywords given.
  2. The example does not demonstrate case insensitivity. As far as I can determine, the example demonstrates that the keyword could be either Upper case or Lower case which is a subset of case insensitivity. For example, a case insensitive match would match CLASS, class, ClAsS, cLaSs, etc.
  3. The above example and moo.keywords seems to be a round about way of achieving case insensitivity or other possibilities. Perusing moo.js, it seems that a simpler solution would be to allow an Array that is a mixture of strings or regular expressions in keywordTransform. Currently, only strings are allowed for the keyword array otherwise an error is thrown indicating such. However, if regular expressions were allowed in addition to strings you could do:
    let lexer = compile({
      identifier: {
        match: [ /[Cc][Ll][As][Ss][Ss]/, /[Dd][Ee][Ff]/, 'lambda' /* I really only want this one as lower case */ ],
        type: v => v.toLocaleUpperCase( ),
      },
    })
    When the Array contains only strings, proceed with the existing transform code that builds a switch statement, otherwise convert the strings in the array to regular expressions (quoting meta characters), then create a matchable regular expression in place of the switch statement being built. The returned function from keywordTransform would just match the token found against the built regular expression, e.g., token.match( rePossibilities ). I suspect that there will be a threshold between executing the switch statement vs. executing the regular expression match, which may be something else to consider in keywordTransform.

For (1,2) above, perhaps I misunderstood the example in this issue, feel free to enlighten me.

@tjvr tjvr deleted the type-transform branch June 4, 2021 13:26
@tjvr
Copy link
Collaborator Author

tjvr commented Jun 4, 2021

I think you misunderstood the example; (1) and (2) don't sound right to me.

caseInsensitiveKeywords uses the keywords ['class', 'def'] passed in to build a regular non-case-sensitive map using the built-in moo.keywords() function.

It then returns a closure which calls toLowerCase() on the value -- the token that was lexed -- before passing it to moo.keywords().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants