New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Markdown code block tokens #17591
Conversation
Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters: - Starting and ending ``` punctuations - Code block's language setting - Code snippet
Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution! TTYL, MSBOT; |
Allow for variable amount of whitespacing before ``` code blocks
Raw blocks were preventing tokenizing as languaged blocks. Putting them on bottom resolves this.
Hi, I am closing and re-opening this PR to bump the CLA bot. Sorry for the inconvenience! |
Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution! TTYL, MSBOT; |
Used to require a new line inbetween ``` code blocks and preceding paragraph text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall idea, but this introduces a number of important regressions that need to be fixed before we can merge this in. Please take a look at the comments and let me know if you have any questions.
<key>while</key> | ||
<string>(^|\G)(?!\s*\2\3*\s*$)</string> | ||
<key>end</key> | ||
<string>(^|\G)\s*([`~]{3,})\n</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to use a while clause here. This prevents broken language grammars from leaking outside of the fenced block. Switching to while from end fixed a large number of syntax highlighting issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That is why the closing ``` wasn't captured though. I'll continue thinking about how to achieve both our goals.
Edit: I think I might have a solution by nesting patterns. Testing on my end.
@@ -566,11 +582,32 @@ | |||
<key>fenced_code_block_basic</key> | |||
<dict> | |||
<key>begin</key> | |||
<string>(^|\G)\s*(([`~]){3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)(\s+.*)?$</string> | |||
<string>(^|\G)\s*([`~]{3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)\n</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the (\s+.*)?$
bit. We allow arbitrary text on the rest of the line after the language identifier to support passing other attributes (like line numbers specifiers) that some markdown engines support.
<key>while</key> | ||
<string>(^|\G)(?!\s*\2\3*\s*$)</string> | ||
<key>end</key> | ||
<string>(^|\G)\s*([`~]{3,})\n</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also reverted to how it was before so that we consume any number of spaces after the fence end and the end of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the closing fenced code block should match the fence type originally used. That's why we used the back references instead of [`~]
Prevents leaks in MD code fences while also capturing the closing fence punctuations.
Thank you @mjbvz for shedding light on many scenarios I was unaware of. Latest commit addresses previous code review's issues in the following manner:
|
@mjbvz Hope you had a great Christmas. Was wondering if you had a chance to see these changes? Thanks. |
The change looks good. Before we merge this in, can you please take a look at the failing tests in travis. You likely have to run the colorization tests again locally and check in the updated markdown test file. See the |
@mjbvz I've uploaded the updated tokenizer Markdown test. The test passes locally, but Travis is still failing. Is that expected? |
@joshpeng Thank you for this change. I've gone ahead and merged it in. It should be available in the next insiders build |
* Fix typos * Add Go, Rust and Scala * Adjust Go, Rust and Scala's logic as per #17591
@mjbvz How do I get mentioned for my contributions in the release notes of 1.9.0? ;( |
@mjbvz Didn't make it into 1.9.1 notes either. oh well :[ |
I've added you to the 1.9 release notes: microsoft/vscode-docs@d0e9826 Sorry for the omission when this was first published and thanks again for the PR |
Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Both fixed.
Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters: