How to define begin and end patterns which span multiple lines? #41

kumarharsh · 2017-04-14T10:57:51Z

Hi, I'm trying to expand the support for my plugin, graphql-for-vscode, to support gherkin feature files, but I'm stuck while defining the grammer. The begin regexp seems to not do any matching when I specify a newline (\n):

Specifically:

With this syntax definition:

{
  "fileTypes": ["feature"],
  "scopeName": "text.gherkin.feature.graphql",
  "injectionSelector": "L:text -comment",
  "patterns": [
    {
      "begin": "graphql request\\s+\"\"\"",
      "end": "\"\"\"",
      "patterns": [
        {
          "name": "featureGQL",
          "include": "source.graphql"
        }
      ]
    }
  ]
}

I don't get any matches:

but modifying the source text to bring the beginning """ to the same line as graphql request does the trick:

I tried modifying the begin regexp to be: "begin": "graphql request\\n\\s+\"\"\"", but it didn't help - in fact, it stopped highlighting anything within quotes.

I've spent some time browsing other syntaxes in vscode and textmate, but could only find \n to be used in match regexp sections, but none yet in begin or end sections.

The text was updated successfully, but these errors were encountered:

kumarharsh · 2017-04-14T11:29:15Z

I also tried running npm run inspect on my syntax definition, and see that the tokenizer takes one line at a time. So, it seems like it's not possible to define multi-line begin/end rules? If not, then is there an alternate?

- new lines don't work in begin/end patterns, so there doesn't seem the be a way to match the intent: "start highlighting only when a line with `graphql request` is followed by `"""` in the next line, and end when the next set of `"""` is found. Ref: microsoft/vscode-textmate#41

kumarharsh · 2017-11-14T11:43:54Z

@alexandrudima friendly ping

ghost · 2017-11-14T15:27:17Z

Regular expressions in TextMate grammars will not match across multiple lines. This is a fundamental limitation of the format. The only way to match content across multiple lines is to chain begin/end matches.

alexdima · 2017-11-14T17:16:50Z

@freebroccolo is correct. Each line (with \n appended at the end) will be evaluated, one at a time, in order, against a grammar...

colinfang · 2017-11-26T03:21:42Z

Do you happen to know what does the prefix L: mean in the injectionSelector? I cannot find any docs on it.

alexdima · 2017-11-27T10:30:59Z

I think @aeschli would know.

kumarharsh · 2017-11-27T10:50:55Z

The L: part means left injection, i.e., the grammar rules are injected to the left of the existing rules for the scope being highlighted. When doing syntax highlighting, the left-most rule has higher precedence than the rules to it's right. So the L: ensures that this syntax highlighting will override the default ones.

Ref: textmate/markdown.tmbundle#15 (comment)

kumarharsh · 2018-01-16T07:30:19Z

@aeschli I have a follow-up question on defining grammars to match across multiple lines. I've created a syntax for the gherkin syntax, which would match and highlight graphql syntax defined as so:

    Given I make a graphql request
    """
    mutation {
      UserCreate(input: {...}) {
        clientMutationId
      }
    }
    """
    Then I expect the response
    """
    {
      "data": {...},
      "errors": [...],
    }
    """

The syntax definition should apply on matching the following conditions:

Line X should end with graphql request string.
Line X+1 should have the """ docstring marker
Line X+2 to X+Y should have the graphql query which would be highlighted using the graphql syntax.
Line X+Y+1 should end with a """ docstring marker.
Line X+Y+2 would have a newline or some other gherkin code, which should be highlighted with the gherkin syntax.

The syntax definition I tried was like this:

{
  "injectionSelector": "L:text -comment",
  "patterns": [
    {
      "begin": "graphql request\\s*$",  // STAGE_1
      "patterns": [
        {
          "begin": "^\\s*(\"\"\")$",  // STAGE_2
          "beginCaptures": {
            "1": { "name": "string.quoted.double.graphql.begin" }
          },
          "end": "^\\s*(\"\"\")$",
          "endCaptures": {
            "1": { "name": "string.quoted.double.graphql.end" }
          },
          "patterns": [
            { "include": "source.graphql" }
          ]
        }
      ]
    },
  ]
  ...
}

It highlights the graphql syntax correctly, but then the highlighter doesn't revert back to gherkin syntax after encountering the closing """ (string.quoted.double.graphql.end). I believe this is because there is no end pattern defined in STAGE_1 part? But then how would I go about defining an end pattern there, as the docstrings are both captured within the STAGE_2 patterns, so there is nothing left to enable me to define as the end of the STAGE_1 pattern.

aeschli · 2018-01-16T08:53:31Z

@kumarharsh Sorry, I'm no expert either, but I also believe that has to do with the missing end rule.
The markdown grammar uses the begin/while loop for something similar, maybe that helps (just guessing):
https://github.com/Microsoft/vscode/blob/1eca6b9817f1f44486cc966d8fc448ee95728b8f/extensions/markdown/syntaxes/markdown.tmLanguage.base#L75

kumarharsh · 2018-01-17T08:05:18Z

I have seen that, but still can't make any sense of how to go about it. Since as soon as I define the while part, the first """ will match and the syntax will end right there.

    Given I make a graphql request  // begin matches this
    """                             // while would end here(?)
    mutation {

Guess I'll just drop this try here - as I feel the textmate grammar is handicapped by default.
Also, there doesn't seem to be any definitive guide to how to construct such grammars. Even the original textmate guide doesn't even describe while usage, just devs suffering a world of pain.

- remove support for syntax starting with "graphql request" followed by a new line with `"""`. Seems impossible (microsoft/vscode-textmate#41 (comment)).

ghost · 2018-01-17T21:49:33Z

@kumarharsh For situations like this you have to make use of oniguruma lookarounds.

Here is a modification of your original example using a lookbehind:

{
  "injectionSelector": "L:text -comment",
  "patterns": [
    {
      "begin": "graphql request\\s*$",  // STAGE_1
      "end": "(?<=\"\"\")", // end if the last token consumed was the closing """
      "patterns": [
        {
          "begin": "^\\s*(\"\"\")$",  // STAGE_2
          "end": "^\\s*(\"\"\")",
          "patterns": [
            { "include": "source.graphql" }
          ]
        }
      ]
    },
  ]
  ...
}

Alternatively you can use a lookahead (although I find this is usually a worse choice):

{
  "injectionSelector": "L:text -comment",
  "patterns": [
    {
      "begin": "graphql request\\s*$",  // STAGE_1
      "end": "(\"\"\")$",
      "patterns": [
        {
          "begin": "^\\s*(\"\"\")$",  // STAGE_2
          "end": "^\\s*(?=\"\"\")",
          "patterns": [
            { "include": "source.graphql" }
          ]
        }
      ]
    },
  ]
  ...
}

One pattern you will want to use to match multi-line syntactic constructs accurately is as follows. This is what I was referring to earlier about chaining "begin"/"end" matches. You might want to adapt your example to this style depending on how many stages there will be after the first quoted block.

{
  "patterns": [
    {
      "begin": "A",
      "end": "B",
    },
    {
      "begin": "(?<=B)",
      "end": "C",
    },
    {
      "begin": "(?<=C)",
      "end": "D",
    },
    …
  ]
}

kumarharsh · 2018-01-18T06:32:43Z

Thanks a lot @freebroccolo. I was under the impression that lookbehinds were not supported by JS/TS. Didn't think of the second way though. The third example is great. I had misconstrued 'chaining' to mean 'nesting' 🤦‍♂️

ghost · 2018-01-18T07:01:05Z

Yeah, the regexp engine used for handling TextMate grammars is usually oniguruma so you have a lot more flexibility than you do with JS regexps.

JoshCheek · 2021-02-16T14:41:50Z

To clarify: for this pattern to work, it must be able to identify confidently, from the first line, that it applies?

Eg there is no way to do a markdown header where the current line could be a paragraph or could be a h1 or h2, and we can't know which, until we see the next line.

I attempted it, expecting that if the end didn't match, then it would not apply the begin, but instead, I think it just entered that node and never left it: Everything afterwards continued to have the header on it.

$ git diff --cached
diff --git a/markdown.tmLanguage.base.yaml b/markdown.tmLanguage.base.yaml
index 15df966..bf78da0 100644
--- a/markdown.tmLanguage.base.yaml
+++ b/markdown.tmLanguage.base.yaml
@@ -3,12 +3,13 @@ keyEquivalent: ^~M
 name: Markdown
 patterns:
 - {include: '#frontMatter'}
+- {include: '#heading-atx'}
+- {include: '#heading-setext'}
 - {include: '#block'}
 repository:
   block:
     patterns:
     - {include: '#separator'}
-    - {include: '#heading'}
     - {include: '#blockquote'}
     - {include: '#lists'}
     - {include: '#fenced_code_block'}
@@ -22,6 +23,8 @@ repository:
       '2': {name: punctuation.definition.quote.begin.markdown}
     name: markup.quote.markdown
     patterns:
+    - {include: '#heading-atx'}
+    - {include: '#heading-setext'}
     - {include: '#block'}
     while: (^|\G)\s*(>) ?
 {{languageDefinitions}}
@@ -38,7 +41,7 @@ repository:
     endCaptures:
       '3': {name: punctuation.definition.markdown}
     name: markup.fenced_code.block.markdown
-  heading:
+  heading-atx:
     match: (?:^|\G)[ ]{0,3}(#{1,6}\s+(.*?)(\s+#{1,6})?\s*)$
     captures:
       '1':
@@ -83,9 +86,12 @@ repository:
     patterns:
     - {include: '#inline'}
   heading-setext:
-    patterns:
-    - {match: '^(={3,})(?=[ \t]*$\n?)', name: markup.heading.setext.1.markdown}
-    - {match: '^(-{3,})(?=[ \t]*$\n?)', name: markup.heading.setext.2.markdown}
+    name: 'heading.1.markdown'
+    begin: (?:^|\G)(\w[^\n]*)$\n
+    beginCaptures: {'1': {name: entity.name.section.markdown}}
+    end: \G^(={3,})[ \t]*$\n?
+    endCaptures: {'1': {name: markup.heading.setext.1.markdown}}
+    patterns: [{match: '(?<=^={3,}[ \t]*$\n?)\G'}]
   html:
     patterns:
     - begin: (^|\G)\s*(<!--)
@@ -154,7 +160,6 @@ repository:
     patterns:
     - {include: '#inline'}
     - {include: text.html.derivative}
-    - {include: '#heading-setext'}
     while: (^|\G)(?!\s*$|#|[ ]{0,3}([-*_>][ ]{2,}){3,}[ \t]*$\n?|[ ]{0,3}[*+->]|[
       ]{0,3}[0-9]+\.)
   lists:
@@ -182,7 +187,6 @@ repository:
     patterns:
     - {include: '#inline'}
     - {include: text.html.derivative}
-    - {include: '#heading-setext'}
     while: (^|\G)((?=\s*[-=]{3,}\s*$)|[ ]{4,}(?=\S))
   raw_block: {begin: '(^|\G)([ ]{4}|\t)', name: markup.raw.block.markdown, while: '(^|\G)([
       ]{4}|\t)'}

I'm guessing it's impossible? 😞 Specifically, on "this is a paragraph", there is just no way to handle that.

$ cat test/colorize-fixtures/h1.md
nice 1
======

* list 1
* list 2

this is a paragraph
still just a paragraph

# shitty 1

nice 2
------

## shitty 2

$ jq < test/colorize-results/h1_md.json 'map({c, t})[]' -c
{"c":"nice 1","t":"text.html.markdown heading.1.markdown entity.name.section.markdown"}
{"c":"======","t":"text.html.markdown heading.1.markdown markup.heading.setext.1.markdown"}
{"c":"*","t":"text.html.markdown markup.list.unnumbered.markdown punctuation.definition.list.begin.markdown"}
{"c":" ","t":"text.html.markdown markup.list.unnumbered.markdown"}
{"c":"list 1","t":"text.html.markdown markup.list.unnumbered.markdown meta.paragraph.markdown"}
{"c":"*","t":"text.html.markdown markup.list.unnumbered.markdown punctuation.definition.list.begin.markdown"}
{"c":" ","t":"text.html.markdown markup.list.unnumbered.markdown"}
{"c":"list 2","t":"text.html.markdown markup.list.unnumbered.markdown meta.paragraph.markdown"}
{"c":"this is a paragraph","t":"text.html.markdown heading.1.markdown entity.name.section.markdown"}
{"c":"still just a paragraph","t":"text.html.markdown heading.1.markdown"}
{"c":"# shitty 1","t":"text.html.markdown heading.1.markdown"}
{"c":"nice 2","t":"text.html.markdown heading.1.markdown"}
{"c":"------","t":"text.html.markdown heading.1.markdown"}
{"c":"## shitty 2","t":"text.html.markdown heading.1.markdown"}

jeff-hykin · 2021-02-16T16:54:50Z

To clarify: for this pattern to work, it must be able to identify confidently, from the first line, that it applies?

Yes, and that is actually a very succinct way of stating the overall fundamental limitation of the TextMate engine (not VS Code's implementation). The Tree Sitter (used by Atom) was created precisely to solve this limitation.

JoshCheek · 2021-02-18T00:29:09Z

Thanks for confirming 🙏

For any future readers, note that there is apparently an addition to the TextMate grammar, called Semantic Highlighting. I haven't looked into it yet, but it is introduced like this:

Starting with release 1.43, VS Code also allows extensions to provide tokenization through a Semantic Token Provider. Semantic providers are typically implemented by language servers that have a deeper understanding of the source file and can resolve symbols in the context of the project. For example, a constant variable name can be rendered using constant highlighting throughout the project, not just at the place of its declaration.

Highlighting based on semantic tokens is considered an addition to the TextMate-based syntax highlighting. Semantic highlighting goes on top of the syntax highlighting. And as language servers can take a while to load and analyze a project, semantic token highlighting may appear after a short delay.
-- https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide

TextMate grammer cannot move across multiple lines. Use the suggestion workaround mentioned in this issue: microsoft/vscode-textmate#41

Syntax highlight-level grammar support for: + string/int/float/bool constants + local/global/system variables + function call foundation + trans tag + line comments - multiline comments don't work - refer to microsoft/vscode-textmate#41 (comment)

See microsoft/vscode-textmate#41 (comment) for multi line matching

+ single-line multiline comments are properly highlighted + multiline comments are still broken likely due to textmate's limitation (see microsoft/vscode-textmate#41)

kumarharsh changed the title ~~Do begin and end not match newlines?~~ How to define begin and end patterns which span multiple lines? Apr 14, 2017

kumarharsh mentioned this issue Nov 14, 2017

Grammar Injection doesn't not work on multi-line content #57

Closed

kumarharsh closed this as completed Jan 17, 2018

matter123 mentioned this issue Jul 6, 2019

Chaining jeff-hykin/vscode-textmate#2

Open

danyill mentioned this issue Jan 22, 2020

Editor syntax highlighting breaks at two line titles asciidoctor/asciidoctor-vscode#248

Closed

alfonsogarciacaro mentioned this issue Aug 21, 2021

Enable next line coloring alfonsogarciacaro/vscode-template-fsharp-highlight#2

Closed

seanwu1105 added a commit to seanwu1105/vscode-qt-for-python that referenced this issue Dec 17, 2021

Fix #165.

0a865bf

TextMate grammer cannot move across multiple lines. Use the suggestion workaround mentioned in this issue: microsoft/vscode-textmate#41

aust1nz mentioned this issue Sep 6, 2022

Update VS code SQL highlighting extension gajus/slonik#392

Merged

thebearingedge mentioned this issue Mar 17, 2023

Doesn't work for multiline generic types thebearingedge/vscode-sql-lit#13

Closed

gquerret mentioned this issue May 5, 2023

Include file with multi-line arguments causes parser and syntax highlighting issue vscode-abl/vscode-abl#77

Closed

aabounegm mentioned this issue Oct 23, 2023

Syntax highlighting when "uses" is on another line rzk-lang/vscode-rzk#62

Open

aceArt-GmbH pushed a commit to aceArt-GmbH/vscode-java that referenced this issue Nov 29, 2023

Add SQL block highlighting support

4a788e6

See microsoft/vscode-textmate#41 (comment) for multi line matching

aceArt-GmbH pushed a commit to aceArt-GmbH/vscode-java that referenced this issue Dec 5, 2023

Add SQL block highlighting support

b46893a

See microsoft/vscode-textmate#41 (comment) for multi line matching

aceArt-GmbH pushed a commit to aceArt-GmbH/vscode-java that referenced this issue Dec 11, 2023

Add SQL block highlighting support

fb53eca

See microsoft/vscode-textmate#41 (comment) for multi line matching

aceArt-GmbH mentioned this issue Dec 18, 2023

Add SQL block highlighting support redhat-developer/vscode-java#3397

Merged

This was referenced Jan 9, 2024

working with prisma & enums breaks formatting and/or types ts-safeql/safeql#197

Closed

Additional Syntax Highlighting of SQL in Prisma Client raw queries prisma/language-tools#1219

Open

alehechka mentioned this issue Feb 12, 2024

Resolve single line import expression syntax highlighting #508 templ-go/templ-vscode#29

Merged

kitten mentioned this issue Feb 21, 2024

fix: Fix vscode-graphql-syntax’s grammar to support string literals on separate lines graphql/graphiql#3518

Merged

Uzo2005 mentioned this issue Mar 16, 2024

Create a Syntax highlighter [VSCode] openpeeps/tim#4

Closed

alek5k mentioned this issue Apr 9, 2024

Multiline Remark does not get colored correctly alek5k/TPSyntaxHighlighter#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to define begin and end patterns which span multiple lines? #41

How to define begin and end patterns which span multiple lines? #41

kumarharsh commented Apr 14, 2017 •

edited

kumarharsh commented Apr 14, 2017

kumarharsh commented Nov 14, 2017

ghost commented Nov 14, 2017

alexdima commented Nov 14, 2017

colinfang commented Nov 26, 2017

alexdima commented Nov 27, 2017

kumarharsh commented Nov 27, 2017 •

edited

kumarharsh commented Jan 16, 2018

aeschli commented Jan 16, 2018

kumarharsh commented Jan 17, 2018

ghost commented Jan 17, 2018 •

edited by ghost

kumarharsh commented Jan 18, 2018

ghost commented Jan 18, 2018

JoshCheek commented Feb 16, 2021

jeff-hykin commented Feb 16, 2021 •

edited

JoshCheek commented Feb 18, 2021

How to define begin and end patterns which span multiple lines? #41

How to define begin and end patterns which span multiple lines? #41

Comments

kumarharsh commented Apr 14, 2017 • edited

kumarharsh commented Apr 14, 2017

kumarharsh commented Nov 14, 2017

ghost commented Nov 14, 2017

alexdima commented Nov 14, 2017

colinfang commented Nov 26, 2017

alexdima commented Nov 27, 2017

kumarharsh commented Nov 27, 2017 • edited

kumarharsh commented Jan 16, 2018

aeschli commented Jan 16, 2018

kumarharsh commented Jan 17, 2018

ghost commented Jan 17, 2018 • edited by ghost

kumarharsh commented Jan 18, 2018

ghost commented Jan 18, 2018

JoshCheek commented Feb 16, 2021

jeff-hykin commented Feb 16, 2021 • edited

JoshCheek commented Feb 18, 2021

kumarharsh commented Apr 14, 2017 •

edited

kumarharsh commented Nov 27, 2017 •

edited

ghost commented Jan 17, 2018 •

edited by ghost

jeff-hykin commented Feb 16, 2021 •

edited