Improve Markdown code block tokens #17591

joshpeng · 2016-12-20T07:25:51Z

Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Both fixed.

Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters:

Starting and ending ``` punctuations
Code block's language setting
Code snippet

Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters: - Starting and ending ``` punctuations - Code block's language setting - Code snippet

msftclas · 2016-12-20T07:25:58Z

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

Allow for variable amount of whitespacing before ``` code blocks

Raw blocks were preventing tokenizing as languaged blocks. Putting them on bottom resolves this.

msftgits · 2016-12-20T16:39:04Z

Hi, I am closing and re-opening this PR to bump the CLA bot. Sorry for the inconvenience!

msftclas · 2016-12-20T16:39:10Z

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!
We will now validate the agreement and then real humans will evaluate your PR.

TTYL, MSBOT;

Used to require a new line inbetween ``` code blocks and preceding paragraph text.

mjbvz

I like the overall idea, but this introduces a number of important regressions that need to be fixed before we can merge this in. Please take a look at the comments and let me know if you have any questions.

mjbvz · 2016-12-20T23:29:21Z

extensions/markdown/syntaxes/markdown.tmLanguage

-					<key>while</key>
-					<string>(^|\G)(?!\s*\2\3*\s*$)</string>
+					<key>end</key>
+					<string>(^|\G)\s*([`~]{3,})\n</string>


We have to use a while clause here. This prevents broken language grammars from leaking outside of the fenced block. Switching to while from end fixed a large number of syntax highlighting issues.

Hmm. That is why the closing ``` wasn't captured though. I'll continue thinking about how to achieve both our goals.

Edit: I think I might have a solution by nesting patterns. Testing on my end.

mjbvz · 2016-12-20T23:31:27Z

extensions/markdown/syntaxes/markdown.tmLanguage

@@ -566,11 +582,32 @@
 				<key>fenced_code_block_basic</key>
 				<dict>
 					<key>begin</key>
-						<string>(^|\G)\s*(([`~]){3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)(\s+.*)?$</string>
+					<string>(^|\G)\s*([`~]{3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)\n</string>


Keep the (\s+.*)?$bit. We allow arbitrary text on the rest of the line after the language identifier to support passing other attributes (like line numbers specifiers) that some markdown engines support.

mjbvz · 2016-12-20T23:32:17Z

extensions/markdown/syntaxes/markdown.tmLanguage

-					<key>while</key>
-					<string>(^|\G)(?!\s*\2\3*\s*$)</string>
+					<key>end</key>
+					<string>(^|\G)\s*([`~]{3,})\n</string>


This should also reverted to how it was before so that we consume any number of spaces after the fence end and the end of line.

Also, the closing fenced code block should match the fence type originally used. That's why we used the back references instead of [`~]

Prevents leaks in MD code fences while also capturing the closing fence punctuations.

joshpeng · 2016-12-21T11:15:49Z

Thank you @mjbvz for shedding light on many scenarios I was unaware of. Latest commit addresses previous code review's issues in the following manner:

Originally, using while instead of end prevented broken language grammar from leaking, but it also prevented capturing the closing fence punctuation. Now I am nesting the while-captures inside a primary pattern to achieve capturing of the closing fence while still preventing grammar leaks.
(\s+.*)?$ at the end of begin-captures added back and tokenized as fenced_code.block.language.attributes. This is for any arbitrary text after the language identifier
End-captures reverted to once again consume all whitespace after the closing fence as well as matching the opening fence's punctuation.
End-captures closing fence's prefix whitespace handling improved to cover some scenarios where it should be detected as a raw block starter instead of fenced block closer.

joshpeng · 2016-12-27T16:27:52Z

@mjbvz Hope you had a great Christmas. Was wondering if you had a chance to see these changes? Thanks.

mjbvz · 2016-12-31T00:43:47Z

The change looks good.

Before we merge this in, can you please take a look at the failing tests in travis. You likely have to run the colorization tests again locally and check in the updated markdown test file. See the "VS Code Tokenizer Tests" in the launch.json

joshpeng · 2016-12-31T02:41:16Z

@mjbvz I've uploaded the updated tokenizer Markdown test. The test passes locally, but Travis is still failing. Is that expected?

mjbvz · 2017-01-03T18:05:18Z

@joshpeng Thank you for this change. I've gone ahead and merged it in. It should be available in the next insiders build

* Fix typos * Add Go, Rust and Scala * Adjust Go, Rust and Scala's logic as per #17591

joshpeng · 2017-02-04T19:12:26Z

@mjbvz How do I get mentioned for my contributions in the release notes of 1.9.0? ;(

mjbvz · 2017-02-08T02:07:20Z

Sorry about that @joshpeng. Let me see if I can fix things

@gregvanl I thought the contributor lists for the release notes was automatically generated. Did the PR have to be marked with "January 2017" for that to happen? What's the best way to update the list post-release?

joshpeng · 2017-02-09T16:59:14Z

@mjbvz Didn't make it into 1.9.1 notes either. oh well :[

mjbvz · 2017-02-09T21:58:43Z

I've added you to the 1.9 release notes: microsoft/vscode-docs@d0e9826

Sorry for the omission when this was first published and thanks again for the PR

msftclas added the cla-required label Dec 20, 2016

bpasero assigned mjbvz Dec 20, 2016

joshpeng added 2 commits December 20, 2016 02:15

Variable whitespace for MD code block ``` token

4ceebae

Allow for variable amount of whitespacing before ``` code blocks

Reorder raw blocks

344c3c8

Raw blocks were preventing tokenizing as languaged blocks. Putting them on bottom resolves this.

msftgits closed this Dec 20, 2016

msftgits reopened this Dec 20, 2016

msftclas added the cla-signed label Dec 20, 2016

msftgits removed the cla-required label Dec 20, 2016

Fix MD block detection when following paragraph

2043f9d

Used to require a new line inbetween ``` code blocks and preceding paragraph text.

mjbvz requested changes Dec 20, 2016

View reviewed changes

Prevent broken language grammar leaks in MD fences

5bab6c8

Prevents leaks in MD code fences while also capturing the closing fence punctuations.

Update Markdown tokenizer test file

431f543

mjbvz merged commit b9a362a into microsoft:master Jan 3, 2017

mjbvz pushed a commit that referenced this pull request Jan 20, 2017

Markdown fixes (#18704)

22cc4a1

* Fix typos * Add Go, Rust and Scala * Adjust Go, Rust and Scala's logic as per #17591

mjbvz added this to the January 2017 milestone Feb 8, 2017

github-actions bot locked and limited conversation to collaborators Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Markdown code block tokens #17591

Improve Markdown code block tokens #17591

joshpeng commented Dec 20, 2016

msftclas commented Dec 20, 2016

msftgits commented Dec 20, 2016

msftclas commented Dec 20, 2016

mjbvz left a comment

mjbvz Dec 20, 2016

joshpeng Dec 21, 2016 •

edited

mjbvz Dec 20, 2016 •

edited

mjbvz Dec 20, 2016

mjbvz Dec 20, 2016

joshpeng commented Dec 21, 2016 •

edited

joshpeng commented Dec 27, 2016

mjbvz commented Dec 31, 2016

joshpeng commented Dec 31, 2016 •

edited

mjbvz commented Jan 3, 2017

joshpeng commented Feb 4, 2017

mjbvz commented Feb 8, 2017

joshpeng commented Feb 9, 2017

mjbvz commented Feb 9, 2017

Improve Markdown code block tokens #17591

Improve Markdown code block tokens #17591

Conversation

joshpeng commented Dec 20, 2016

msftclas commented Dec 20, 2016

msftgits commented Dec 20, 2016

msftclas commented Dec 20, 2016

mjbvz left a comment

Choose a reason for hiding this comment

mjbvz Dec 20, 2016

Choose a reason for hiding this comment

joshpeng Dec 21, 2016 • edited

Choose a reason for hiding this comment

mjbvz Dec 20, 2016 • edited

Choose a reason for hiding this comment

mjbvz Dec 20, 2016

Choose a reason for hiding this comment

mjbvz Dec 20, 2016

Choose a reason for hiding this comment

joshpeng commented Dec 21, 2016 • edited

joshpeng commented Dec 27, 2016

mjbvz commented Dec 31, 2016

joshpeng commented Dec 31, 2016 • edited

mjbvz commented Jan 3, 2017

joshpeng commented Feb 4, 2017

mjbvz commented Feb 8, 2017

joshpeng commented Feb 9, 2017

mjbvz commented Feb 9, 2017

joshpeng Dec 21, 2016 •

edited

mjbvz Dec 20, 2016 •

edited

joshpeng commented Dec 21, 2016 •

edited

joshpeng commented Dec 31, 2016 •

edited