Add type transform #85

tjvr · 2018-03-31T21:45:20Z

We recently added a value transform.

This PR:

Adds a type transform.
Exposes moo.keywords().
Removes Lexer#has().

The existing value transform takes the text and returns the value. By default, the text is used unchanged.

The new type transform takes the text and returns the type. By default, the type of the rule is used (e.g. identifier).

Example: case-insensitive keywords

This is my preferred solution for #67 / #78.

For example, you can create a customised version of moo.keywords which matches case-insensitively:

const caseInsensitiveKeywords = map => {
  const transform = moo.keywords(map)
  return text => transform(text.toLowerCase())
}

let lexer = compile({
  identifier: {
    match: /[a-zA-Z]+/,
    type: caseInsensitiveKeywords({
      keyword: ['class', 'def'],
    }),
  },
})

Lexer#has()

This unfortunately makes it impossible to write a Lexer#has function, since we can't infer what token names might be returned by this custom function.

This will make Moo incompatible with the current version of Nearley: we introduced has() so that we could tell whether %foo refers to a custom token matcher such as foo = { test: x => Number.isInteger(x) }, or a lexer token. But custom token matchers will likely be removed [from Nearley] going forward, so has() will have no use.

tjvr · 2018-09-18T12:14:13Z

I've rewritten this on top of the latest master.

Lexer#has() will now always return true, so most Nearley grammars should continue to work.

houghtonap · 2021-06-01T08:54:12Z

A few notes about the example and case insensitivity.

I believe, in the example the code keyword: ['class', 'def'] should be keyword: ['CLASS', 'DEF] since the purpose of the const caseInsensitiveKeywords is to lower case the keywords given.
The example does not demonstrate case insensitivity. As far as I can determine, the example demonstrates that the keyword could be either Upper case or Lower case which is a subset of case insensitivity. For example, a case insensitive match would match CLASS, class, ClAsS, cLaSs, etc.
The above example and moo.keywords seems to be a round about way of achieving case insensitivity or other possibilities. Perusing moo.js, it seems that a simpler solution would be to allow an Array that is a mixture of strings or regular expressions in keywordTransform. Currently, only strings are allowed for the keyword array otherwise an error is thrown indicating such. However, if regular expressions were allowed in addition to strings you could do:
```
let lexer = compile({
  identifier: {
    match: [ /[Cc][Ll][As][Ss][Ss]/, /[Dd][Ee][Ff]/, 'lambda' /* I really only want this one as lower case */ ],
    type: v => v.toLocaleUpperCase( ),
  },
})
```
When the Array contains only strings, proceed with the existing transform code that builds a switch statement, otherwise convert the strings in the array to regular expressions (quoting meta characters), then create a matchable regular expression in place of the switch statement being built. The returned function from keywordTransform would just match the token found against the built regular expression, e.g., token.match( rePossibilities ). I suspect that there will be a threshold between executing the switch statement vs. executing the regular expression match, which may be something else to consider in keywordTransform.

For (1,2) above, perhaps I misunderstood the example in this issue, feel free to enlighten me.

tjvr · 2021-06-04T13:29:20Z

I think you misunderstood the example; (1) and (2) don't sound right to me.

caseInsensitiveKeywords uses the keywords ['class', 'def'] passed in to build a regular non-case-sensitive map using the built-in moo.keywords() function.

It then returns a closure which calls toLowerCase() on the value -- the token that was lexed -- before passing it to moo.keywords().

tjvr force-pushed the type-transform branch from 5456cf6 to 40dee4a Compare September 18, 2018 12:12

tjvr requested a review from nathan September 18, 2018 12:15

tjvr force-pushed the master branch 2 times, most recently from 82acf8d to 8c3e622 Compare September 18, 2018 12:19

tjvr force-pushed the type-transform branch from 40dee4a to d233dd6 Compare September 18, 2018 12:23

tjvr added 6 commits September 19, 2018 13:18

Allow type to be a function

ab3f331

Warn if both keywords and type are set

9aa0c81

Warn if type is a string

90c5980

Rename name -> type

7687bca

Remove keywords shorthand

1bfa713

Deprecate Lexer#has

24b23ca

tjvr force-pushed the type-transform branch from d233dd6 to 24b23ca Compare September 19, 2018 12:18

Test that type transforms work with arrays

9762632

tjvr merged commit 6973d55 into master Sep 19, 2018

tjvr mentioned this pull request Sep 19, 2018

Add keywordsCaseInsensitive option #78

Closed

autioch mentioned this pull request Jan 17, 2019

Case sensitivity handling and note about in the docs #117

Open

randyn mentioned this pull request Feb 17, 2021

Deprecate moo's lexer has function DefinitelyTyped/DefinitelyTyped#51278

Merged

2 tasks

tjvr mentioned this pull request May 29, 2021

Deprecation of method has and impact on Nearley parser #158

Open

tjvr deleted the type-transform branch June 4, 2021 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add type transform #85

Add type transform #85

tjvr commented Mar 31, 2018 •

edited

tjvr commented Sep 18, 2018

houghtonap commented Jun 1, 2021

tjvr commented Jun 4, 2021

Add type transform #85

Add type transform #85

Conversation

tjvr commented Mar 31, 2018 • edited

Example: case-insensitive keywords

Lexer#has()

tjvr commented Sep 18, 2018

houghtonap commented Jun 1, 2021

tjvr commented Jun 4, 2021

tjvr commented Mar 31, 2018 •

edited