Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlightjs not respecting parent regex of a sub mode #2238

Closed
algomaster99 opened this issue Oct 30, 2019 · 36 comments
Closed

Highlightjs not respecting parent regex of a sub mode #2238

algomaster99 opened this issue Oct 30, 2019 · 36 comments

Comments

@algomaster99
Copy link

algomaster99 commented Oct 30, 2019

I need to write a lexer which highlights my command-line tool commands properly.

$ dvc add file.csv
$ dvc pipeline list

So the command starts with dvc and it may have one or two subcommands - add or pipeline list respectively.

Therefore, it should highlight dvc add and dvc pipeline list in first and second case respectively.

contains: [
  {
    begin: /^\s*\$\s(dvc|git) [a-z-]+/,
    returnBegin: true,
    contains: [
      {
        begin: /dvc [a-z-]+ ?/,
        lexemes: '[a-z-]+',
        keywords: {
          built_in:
             'dvc'
          },
          contains: [
            {
              begin: /\w+(?![\S])/,
              keywords: {
                built_in: 'list'
              }
            }
          ],
          className: 'strong'
        }
      ]
    }
 ]

It matches $ dvc pipeline list even though the parent regex i.e. /^\s*\$\s(dvc|git) [a-z-]+/ and should only match till $ dvc pipeline. How is it exactly functioning?

How does /dvc [a-z-]+ ?/ override it and continues matching the expression?

@joshgoebel
Copy link
Member

Please provide output HTML so it’s 100% clear what is happening. I find your description not that clear.

@joshgoebel
Copy link
Member

Looks like your second begin matches dvc pipeline then your third grabs list.

@joshgoebel
Copy link
Member

It’s all strong because your 3rd matches is a child of the 2nd.

@algomaster99
Copy link
Author

@yyyc514 but the parent regex - /^\s*\$\s(dvc|git) [a-z-]+/ is not being respected if dvc pipeline list matches. How is this working exactly?

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

I don’t think you understand how it works. The first regex matches then it switches to the contains and starts looking for children then matches list

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

The first regex only tells it to begin. It can keep matching if there are children to match.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

The parents BEGIN regex has nothing to do with children, it just starts the mode. Well unless you use begin same as end. Lol. Then it COULD potentially

@algomaster99
Copy link
Author

@yyyc514 so how should I change my configuration to enable that? Basically, I would want to match dvc pipeline list but not pics in dvc add pics since 'pics' is not in built_in.
The current scenario makes both of them strong. Should I hardcode regex like this - (?:list|...)? (... Refers to list of all keywords I want to match)
Currently, both of the above commands become bold. And since, dvc, pipeline and list both are in built_in, they are colored blue too. pics doesn't get blue colored but it does get bold.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

Well depends how picky you want to be... You could probably get a long ways with just a single mode:

$ (git|dvc) * - then just FULL list of keywords

And then matching a long list of keywords... it wouldn't be "smart" but it might work for 95% of things...

Or you could go all the way the other way and have a single mode per subcommand:

begin: "dvc add" (no keywords?)
begin: "dvc something ..." (custom keywords for the ...)
begin: "dvs pipeline ..." (custom keywords for the ...)
...

@algomaster99
Copy link
Author

@yyyc514 we were trying the second way only. But we don't know how to distinguish between $ dvc pipeline <keyword> and $ dvc add <not-a-keyword>.

@joshgoebel
Copy link
Member

Well, that might be impossible in some circumstances. But if you had a separate mode for each then you know how many keyword to expect, right?

Do for "dvc add" you'd be done... everything else should be arguments, no? What are the keywords for add?

You only need to worry about matching keywords, not non-keywords.

@algomaster99
Copy link
Author

@yyyc514 dvc add doesn't have any. But "dvc pipeline" has one sub command called 'list'. So basically it can at most two sub-commands after 'dvc'

@joshgoebel
Copy link
Member

Right... but again if you're doing "per subcommand" then it's be a tree... Or you might do better just hard coding them all:

(dvc add|dvc pipeline submit|dvc pipeline list|dvc ...)

Or any combination. There are only like 20 ways to do the same thing here. :)

@algomaster99
Copy link
Author

@yyyc514 Yes, I want to do it "per subcommand". I just don't know how can I differentiate between pics in $ dvc add pics and list in $ div pipeline list. (pics is just a name of the directory.)

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

Why would there be any problem? per subcommand means writing outa rule PER subcommand.

IE:

Rules:

  • dvc add DONE
  • dvd pipeline (subcommand keywords)
  • ...

They would be ENTIRELy separate matchers.

@joshgoebel
Copy link
Member

So if you had 20 commands you'd have 20 rules.

@algomaster99
Copy link
Author

@yyyc514 Okay you mean I make different sub-modes for commands with one sub-command and two sub-commands?

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

Or a different sub-mode for EVERY command... really at this point you'd probably want to write code to build your syntax, not write it by hand...

language_rules = []
language_rules.push(add_command("dvc add"))
language_rules.push(add_command("dvc pipeline", keywords: {...}))

return {
 contains: language_rules
}

@algomaster99
Copy link
Author

@yyyc514 I am sorry. But I am unfamiliar with this add_command function. I couldn't even find it here -https://highlightjs.readthedocs.io/en/latest/api.html

@joshgoebel
Copy link
Member

It's an exercise for the reader, you'd have to write it yourself... I'm just talking about building the language dynamically using code.

@shcheklein
Copy link

@yyyc514

The first regex only tells it to begin. It can keep matching if there are children to match.

could you point to the docs please?

I thought that it's working in a bit different way according to this:

https://highlightjs.readthedocs.io/en/latest/reference.html#end

If absent, end defaults to a regexp that matches anything, so the mode ends immediately.

what does immediately mean in this case?

@shcheklein
Copy link

shcheklein commented Oct 30, 2019

@yyyc514 I'm agreed in general on sub-commands/sub-modes approach in general! And thank you for the prompt help and response.

Just trying to understand better the behavior here. It looked like a bug to us. First, it does not stop (okay, may be I misinterpreted the "immediately" part).

Then why did @algomaster99 had to put ? into /dvc [a-z-]+ ?/ to start matching dvc pipeline list? and just /dvc [a-z-]+/ didn't work as far as I understand.

Also, should it be matching dvc pipeline list list (both lists as keywords)? It's not the case with this config as far as I remember.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

what does immediately mean in this case?

It means immediately (and you’d see that for rules with no children) but if there are children/contains they will match. If it worked the way you are thinking you could never have any children or subrules.

This may be a docs issues. Clarity could probably be added.

@joshgoebel
Copy link
Member

Then why did @algomaster99 had to put ? into /dvc [a-z-]+ ?/ to start matching dvc pipeline list? and just /dvc [a-z-]+/ didn't work as far as I understand.

I have no evidence right now this statement is true. The last bit eats the space which is often desirable.

@shcheklein
Copy link

shcheklein commented Oct 30, 2019

@yyyc514

If it worked the way you are thinking you could never have any children or subrules.

I though you can do returnBegin: true and then apply children rules if needed. And/or, specify end explicitly, e.g. end: /\n|$/.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

I think a big key is it’s always named “begin”. It BEGINS a mode, not ends it. When there are no more matches a mode ends.

And again if you are expecting it to immediately terminate AND also including children your not thinking consistently, because that makes no sense.

returnBegin is a huge hack and often (always?) can be avoided with look ahead. It honestly should probably be deprecated.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

If you have a recommendation to update the docs a PR would be welcome. The behavior here (so far) is correct as intended.

@shcheklein
Copy link

I think a big key is it’s always named “begin”. It BEGINS a mode, not ends it. When there are no more matches a mode ends.

totally, I'm not arguing with that. And again thank you for your help! 🙏 I'm just trying to fit in my head the way you describe it and what we actually see in our case.

For example this logic https://github.com/iterative/dvc.org/blob/master/src/Documentation/Markdown/lang/dvc.js#L47:L75 should be matching dvc add two times (twice the same kid) in this one:

$ dvc add dvc add

is it correct? Or am I still missing something?

returnBegin is a huge hack and often (always?) can be avoided with look ahead. It honestly should probably be deprecated.

As far as remember we had to use it this piece above to actually being able match at least one kid and apply different classes - className: 'skipped' for example. Again, may be we got it wrong but it works for us at dvc.org/docs very well (the current version, not the one @algomaster99 shared in this ticket.) Would really appreciate your feedback if you see any problem in that piece.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

should be matching dvc add two times (twice the same kid) in this one:

At a glance, yes. Although you could use endsParent if that was undesirable.

As far as remember we had to use it this piece above to actually being able match at least one kid and apply different classes

Right, to look-ahead. But real look ahead in regex actually works as of 9.16 so that’s a much better way.

@shcheklein
Copy link

shcheklein commented Oct 30, 2019

At a glance, yes. Although you could use endsParent if that was undesirable.

That's not what happening for me. Thus I'm confused. It's very easy to reproduce in our project.

But real look ahead in regex actually works as of 9.16 so that’s a much better way.

could you please elaborate, share a link?

@joshgoebel
Copy link
Member

#2135

Looks at some of the smaller commits in this PR for examples.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 30, 2019

That's not what happening for me. Thus I'm confused. It's very easy to reproduce in our project.

Make a jsfiddle and I’ll look at it.

You don’t eat the space. So your rule DOES not match again because there is an extra space, which immediately ends the parent mode.

@joshgoebel
Copy link
Member

You could use end and excludeEnd to eat the space.

@shcheklein
Copy link

You don’t eat the space. So your rule DOES not match again because there is an extra space, which immediately ends the parent mode.

Yep, I see. That's right!

@shcheklein
Copy link

Looks at some of the smaller commits in this PR for examples.

thanks! I see that we can get rid of returnBegin this way indeed.

@joshgoebel
Copy link
Member

Added:
#2242

Closing this as asked and answered, though conversation can continue if you have any other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants