Proposal: Support multi-class highlighting inside a single mode/rule #2838

joshgoebel · 2020-11-08T23:38:22Z

Is your request related to a specific problem you're having?

Yes, very often you have language constructs like [delim]content[delim]... a string being a perfect example. Many time you may want to highlight the delimiters differently than the content. We provide no simple way to do that now without resorting to complex rule chains or ambiguous contents. For example, lets match a simple single quoted string and highlight the whole thing as a string.

{ 
  className: "string",
  begin: /'.*'/,
}

Easy, but now lets try to color it separately:

{
  className: "string.delim",
  begin: /'/, end: /'/,
  contains: [ 
     { begin : /[^']*, className: "string" /
  ]
}

This is the shortest variant, and does get the job done, but we have string nested inside string.delim, which is strange. If we were going to do nesting at all (which I'm not sure we should) here surely you'd want string.delim inside string. And of course it wouldn't help us at all if we wanted to classify the begin and end matcher with different classes.

Lets try again (and fail):

{
  begin: /(?=')/,  // looks like a string looking ahead
  contains: [ 
     { begin: /'/, className: "string.delim" },
     { begin : /[^']*(?=')/, className: "string" },
  ]
}

Many modes/grammars get away using/abusing contains like this for a sequence and it works because the rules are distinct enough... that doesn't work for us here... I'm not sure how to make this work... after it finds the whole string it'll just keep matching, we have no way to know we've reached the "end". It would work if the end delimiter were different:

  // 'string'E // 'E terminates strings
  contains: [ 
     { begin: /'/, className: "string.delim" },
     { begin : /.*(?!'E)/, className: "string" },
     { begin: /'E/, className: "string.delim", endsParent: true },
  ]

Ok, so lets dig in use all the powers we have available:

{
        begin: /(?='.*')/,  // perhaps look ahead just to see if we have a full string
        contains: [
           {
             begin: /'/, className: "string.delim",
             starts: {
               className: "string",
               end: /\b\B/, // hack to leave the mode open until a rule matches
               contains: [
                 {
                   begin: /(?=')/,
                   endsParent: true
                 }
               ],
               starts: {
                 contains: [
                   { begin: /'/, className: "string.delim", }
                 ]
               }
             }
             },
        ]
      },

Ok, that works, but man... find quote highlight it, start a new mode "string", use a magic end to prevent the mode from closing... keep matching things until we see a ' (look ahead), then end the parent which starts ANOTHER new mode to match the final delimiter. Ugh. We could also try a pure chain:

      {
        begin: /(?='.*')/,  // perhaps look ahead just to see if we have a full string
        contains: [
           {
             begin: /'/, className: "string.delim",
             starts: {
               className: "string",
               begin: /[^']*/,
               starts: {
                 contains: [
                   { begin: /'/, className: "string.delim", }
                 ]
               }
             }
          }
        ]
      },

Simple, one mode chains into the next into the next with starts until it hits the end and all rewinds... this of course requires us to manually match the middle of the expression, which is slightly annoying.

Any alternative solutions you considered...

Of course these structures are a pain to write by hand, but we could of course use syntactic sugar (ie, build one of the above variants internally, without adding any features to the core of the parser). Say we added some chain sugar:

{
        className: "string",
        chain: [ 
          { match: /'/, className: "string.delim" },
          { match: /[^']*/, className: "string" },
          { match: /'/, className: "string.delim" }
        ]
}

Better. This of course compiles into something much more complex... and we'd still left with having to specify the inside match of the string, which is kind of annoying. This is also bad because single "modes" that secretly compile into massively complex chains make it much harder to build complex high-level structs based on composing those lower-level structures. The interactions get very complex because what you think is a "simple low-level rule" is actually a massively complex rule that the compiler has just hidden all the complexity away from you.

This type of sugar works for many simpler things, but this "3 pair" (two delims, and an enclosure) is such a common pattern that I really think perhaps it should be added/supported by the parser at the lowest-level. We already have the concept of begin, end and everything in-between. We just don't provide an easy mechanism to assign separate CSS classes.

The solution you'd prefer / feature you'd like to see added...

So I'd like to propose two low-level variations:

One for modes with children, simply allows each "piece" to be individually classified. This is closest to what we already have and would be simplest to add I think:

{
        className: {
          begin: 'string.delim',
          middle: 'string',
          end: 'string.delim'
        },
        begin: '"',
        end: '"',
        contains: // ...
      }

And the same thing for simpler regex matches when a single regex will get the job done:

      {
        match: /(')(.*?)(')/,
        className: ['string.delim', 'string', 'string.delim'],
        // or possibly
        className: {
          0: 'string.delim',
          1: 'string',
          2: 'string.delim'
        },
      },

The latter format (keyed digits) would of be immediately recognizable to anyone whose worked on TextMate grammars before... and of course this style is not limited to 3 match components... you could easily have a complex regex that broke something down into 5 or 7 components, highlighting each of the pieces differently.

And since these are TRUE singular modes they can be composed easily (used anywhere modes can already be used) without any special caveats or complex interactions with starts, endsParent, endsWithParent, etc...

Additional context...

None.

The text was updated successfully, but these errors were encountered:

joshgoebel added enhancement An enhancement or new feature discuss/propose Proposal for a new feature/direction labels Nov 8, 2020

joshgoebel mentioned this issue Nov 8, 2020

enh(latex): Implement an easy to use chaining mechanism #2776

Closed

3 tasks

joshgoebel changed the title ~~Proposal: Simple and complex single mode multi-expression highlighting~~ Proposal: Improve multi-class single mode highlighting Nov 8, 2020

joshgoebel changed the title ~~Proposal: Improve multi-class single mode highlighting~~ Proposal: Support multi-class highlighting inside a single mode/rule Nov 8, 2020

joshgoebel mentioned this issue Feb 16, 2021

Migrate code highlightjs/highlightjs-turtle#2

Open

joshgoebel added the help welcome Could use help from community label Mar 25, 2021

This was referenced Mar 27, 2021

enh(parser) multi-class in a single mode #3081

Merged

Discuss: Should 3rd party language grammars be version tagged? #3096

Closed

joshgoebel closed this as completed in #3081 Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Support multi-class highlighting inside a single mode/rule #2838

Proposal: Support multi-class highlighting inside a single mode/rule #2838

joshgoebel commented Nov 8, 2020 •

edited

Loading

Proposal: Support multi-class highlighting inside a single mode/rule #2838

Proposal: Support multi-class highlighting inside a single mode/rule #2838

Comments

joshgoebel commented Nov 8, 2020 • edited Loading

joshgoebel commented Nov 8, 2020 •

edited

Loading