Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monarch grammar: Regex starting with ^ should only match start of source line, but it does not. #1818

Closed
bolinfest opened this issue Feb 7, 2020 · 6 comments · Fixed by microsoft/vscode#90266
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug
Milestone

Comments

@bolinfest
Copy link
Contributor

monaco-editor version: 0.19.3
Browser: Chrome 79
OS: macOS
Playground code that reproduces the issue:

According to the Monarch docs:

If it starts with a ^ character, the expression only matches at the start of a source line.

I have tried both specifying a regex literal /^!/ as well as a string "^!", but neither works.

You can reproduce the issue on the special Monarch playground at https://microsoft.github.io/monaco-editor/monarch.html. In the Language syntax definition, specify:

return {
  // The main tokenizer for our languages
  tokenizer: {
    root: [
      [/^\!/, {token: 'delimiter.curly', next: 'jsonInBang', nextEmbedded: 'json'}],
    ],
    jsonInBang: [
      [/^\!/, {token: 'delimiter.curly', next: '@pop', nextEmbedded: '@pop'}],
    ],
  },
};

And then in the Language editor, specify:

!
  {
    "foo": "bar",
    "baz": 42,
    "quux": [
      {
        "foo": "bar!",
      }
    ],
    "quux2": [
      {
        "foo": "bar",
      }
    ]
  }
!

If you look carefully, the JSON inside the ! starts out being highlighted correctly, but after the ! in the string literal "bar!", it stops working. It appears the /^\!/ has matched it even though it is not the start of a source code line, which is not what the docs claim should happen.

@bolinfest
Copy link
Contributor Author

Stepping through the code, I believe the issue is that MonarchTokenizer._myTokenize() has proper checks for ^:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L614-L625

In particular, it does this check:

if (pos === 0 || !rule.matchOnlyAtLineStart) {

By comparison, when in an embedded grammar, I believe MonarchTokenizer._findLeavingNestedModeOffset() is used to test the rules to determine whether to exit the embedded mode. As best I can tell, it not appear to check if matchOnlyAtLineStart is set on each rule.

@bolinfest
Copy link
Contributor Author

Ah, so the issue is a bit more subtle than I realized. In monarchCompile.ts, as part of Rule.setRegex(), the regular expression is rewritten:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchCompile.ts#L372

wrapping it in ^(?:<original_regex>), dropping the ^, if necessary (and setting matchOnlyAtLineStart to reflect that change).

As such, this check in _findLeavingNestedModeOffset() is expected to succeed:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L499

since those characters should have been added by setRegex().

I suspect the fix might be as simple as changing this code:

https://github.com/Microsoft/vscode/blob/0aed1023c556dfcd9fc211855f14b5a183b8a3e0/src/vs/editor/standalone/common/monarch/monarchLexer.ts#L503-L506

To be something like:

if (result === -1 || (result !== 0 && rule.matchOnlyAtLineStart)) {
    continue;
}

Though it would probably be slightly more efficient to rewrite the regex to include a leading ^ before passing it to line.search() so that it can fail faster, if appropriate.

Also, I think the RegExp is re-created each time _findLeavingNestedModeOffset() is run, so it might benefit from caching it on the rule or in a WeakMap or something?

@bolinfest
Copy link
Contributor Author

I'm trying to fix this myself and I have been following the instructions on CONTRIBUTING.md, but it is proving difficult. What I really want is a dev version of https://microsoft.github.io/monaco-editor/monarch.html that I can edit/refresh quickly.

I followed the instructions for npm run release and npm run build-website, but it takes 3 minutes to build on my laptop, so that is a slow dev cycle. Even once it is done, it does not seem to be loading my dev version, but what was downloaded into node_modules/, even if I add ?editor=src, so that's no good.

I poked around the various .html files in test/, but none of them seem to offer a similar level of interactivity as the playground.

It would be helpful if the docs provided more detail about how the maintainers get a good dev cycle going.

@alexdima
Copy link
Member

alexdima commented Feb 10, 2020

@bolinfest The trick is to use yarn run watch in the vscode repo. That will be a process that will continue executing until killed. Then, you can edit .ts files and within <1s the generated .js are written to disk and then you can reload the webpage.

It usually just means that you edit the .ts file with autosave on and then alt-tab to a browser and reload and then you see your change...

@alexdima alexdima added the bug Issue identified by VS Code Team member as probable bug label Feb 10, 2020
@alexdima alexdima self-assigned this Feb 10, 2020
@bolinfest
Copy link
Contributor Author

@alexdima I had that working for the establish test pages, but it did not seem to be working for the playground (which I cared about because I didn't see a good test page for grammars, so I wanted to use monarch.html). Should the fast edit/refresh cycle be working for the playground, too?

@alexdima
Copy link
Member

For testing changes like that, I typically use one of the generated files like at http://127.0.0.1:8080/monaco-editor/test/playground.generated/extending-language-services-custom-languages.html

and just edit the .html to contain the Monarch definition I want and the content I want and then just reload...

alexdima added a commit to microsoft/vscode that referenced this issue Feb 10, 2020
alexdima pushed a commit to microsoft/vscode that referenced this issue Feb 10, 2020
@alexdima alexdima added this to the January 2020 milestone Feb 10, 2020
@vscodebot vscodebot bot locked and limited conversation to collaborators Mar 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants