Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Support multi-class highlighting inside a single mode/rule #2838

Closed
joshgoebel opened this issue Nov 8, 2020 · 0 comments · Fixed by #3081
Closed

Proposal: Support multi-class highlighting inside a single mode/rule #2838

joshgoebel opened this issue Nov 8, 2020 · 0 comments · Fixed by #3081
Labels
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature help welcome Could use help from community

Comments

@joshgoebel
Copy link
Member

joshgoebel commented Nov 8, 2020

Is your request related to a specific problem you're having?

Yes, very often you have language constructs like [delim]content[delim]... a string being a perfect example. Many time you may want to highlight the delimiters differently than the content. We provide no simple way to do that now without resorting to complex rule chains or ambiguous contents. For example, lets match a simple single quoted string and highlight the whole thing as a string.

{ 
  className: "string",
  begin: /'.*'/,
}

Easy, but now lets try to color it separately:

{
  className: "string.delim",
  begin: /'/, end: /'/,
  contains: [ 
     { begin : /[^']*, className: "string" /
  ]
}

This is the shortest variant, and does get the job done, but we have string nested inside string.delim, which is strange. If we were going to do nesting at all (which I'm not sure we should) here surely you'd want string.delim inside string. And of course it wouldn't help us at all if we wanted to classify the begin and end matcher with different classes.

Lets try again (and fail):

{
  begin: /(?=')/,  // looks like a string looking ahead
  contains: [ 
     { begin: /'/, className: "string.delim" },
     { begin : /[^']*(?=')/, className: "string" },
  ]
}

Many modes/grammars get away using/abusing contains like this for a sequence and it works because the rules are distinct enough... that doesn't work for us here... I'm not sure how to make this work... after it finds the whole string it'll just keep matching, we have no way to know we've reached the "end". It would work if the end delimiter were different:

  // 'string'E // 'E terminates strings
  contains: [ 
     { begin: /'/, className: "string.delim" },
     { begin : /.*(?!'E)/, className: "string" },
     { begin: /'E/, className: "string.delim", endsParent: true },
  ]

Ok, so lets dig in use all the powers we have available:

{
        begin: /(?='.*')/,  // perhaps look ahead just to see if we have a full string
        contains: [
           {
             begin: /'/, className: "string.delim",
             starts: {
               className: "string",
               end: /\b\B/, // hack to leave the mode open until a rule matches
               contains: [
                 {
                   begin: /(?=')/,
                   endsParent: true
                 }
               ],
               starts: {
                 contains: [
                   { begin: /'/, className: "string.delim", }
                 ]
               }
             }
             },
        ]
      },

Ok, that works, but man... find quote highlight it, start a new mode "string", use a magic end to prevent the mode from closing... keep matching things until we see a ' (look ahead), then end the parent which starts ANOTHER new mode to match the final delimiter. Ugh. We could also try a pure chain:

      {
        begin: /(?='.*')/,  // perhaps look ahead just to see if we have a full string
        contains: [
           {
             begin: /'/, className: "string.delim",
             starts: {
               className: "string",
               begin: /[^']*/,
               starts: {
                 contains: [
                   { begin: /'/, className: "string.delim", }
                 ]
               }
             }
          }
        ]
      },

Simple, one mode chains into the next into the next with starts until it hits the end and all rewinds... this of course requires us to manually match the middle of the expression, which is slightly annoying.

Any alternative solutions you considered...

Of course these structures are a pain to write by hand, but we could of course use syntactic sugar (ie, build one of the above variants internally, without adding any features to the core of the parser). Say we added some chain sugar:

{
        className: "string",
        chain: [ 
          { match: /'/, className: "string.delim" },
          { match: /[^']*/, className: "string" },
          { match: /'/, className: "string.delim" }
        ]
}

Better. This of course compiles into something much more complex... and we'd still left with having to specify the inside match of the string, which is kind of annoying. This is also bad because single "modes" that secretly compile into massively complex chains make it much harder to build complex high-level structs based on composing those lower-level structures. The interactions get very complex because what you think is a "simple low-level rule" is actually a massively complex rule that the compiler has just hidden all the complexity away from you.

This type of sugar works for many simpler things, but this "3 pair" (two delims, and an enclosure) is such a common pattern that I really think perhaps it should be added/supported by the parser at the lowest-level. We already have the concept of begin, end and everything in-between. We just don't provide an easy mechanism to assign separate CSS classes.

The solution you'd prefer / feature you'd like to see added...

So I'd like to propose two low-level variations:

One for modes with children, simply allows each "piece" to be individually classified. This is closest to what we already have and would be simplest to add I think:

{
        className: {
          begin: 'string.delim',
          middle: 'string',
          end: 'string.delim'
        },
        begin: '"',
        end: '"',
        contains: // ...
      }

And the same thing for simpler regex matches when a single regex will get the job done:

      {
        match: /(')(.*?)(')/,
        className: ['string.delim', 'string', 'string.delim'],
        // or possibly
        className: {
          0: 'string.delim',
          1: 'string',
          2: 'string.delim'
        },
      },

The latter format (keyed digits) would of be immediately recognizable to anyone whose worked on TextMate grammars before... and of course this style is not limited to 3 match components... you could easily have a complex regex that broke something down into 5 or 7 components, highlighting each of the pieces differently.

And since these are TRUE singular modes they can be composed easily (used anywhere modes can already be used) without any special caveats or complex interactions with starts, endsParent, endsWithParent, etc...

Additional context...

None.

@joshgoebel joshgoebel added enhancement An enhancement or new feature discuss/propose Proposal for a new feature/direction labels Nov 8, 2020
@joshgoebel joshgoebel changed the title Proposal: Simple and complex single mode multi-expression highlighting Proposal: Improve multi-class single mode highlighting Nov 8, 2020
@joshgoebel joshgoebel changed the title Proposal: Improve multi-class single mode highlighting Proposal: Support multi-class highlighting inside a single mode/rule Nov 8, 2020
@joshgoebel joshgoebel added the help welcome Could use help from community label Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss/propose Proposal for a new feature/direction enhancement An enhancement or new feature help welcome Could use help from community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant