Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Markdown code block tokens #17591

Merged
merged 6 commits into from Jan 3, 2017
Merged

Improve Markdown code block tokens #17591

merged 6 commits into from Jan 3, 2017

Conversation

joshpeng
Copy link
Contributor

Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Both fixed.

Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters:

  • Starting and ending ``` punctuations
  • Code block's language setting
  • Code snippet

Code blocks without a language weren't tokenized.
Code blocks didn't have their ending ``` punctuation tokenized.
Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters:
- Starting and ending ``` punctuations
- Code block's language setting
- Code snippet
@msftclas
Copy link

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

Allow for variable amount of whitespacing before ``` code blocks
Raw blocks were preventing tokenizing as languaged blocks. Putting them on bottom resolves this.
@msftgits
Copy link

Hi, I am closing and re-opening this PR to bump the CLA bot. Sorry for the inconvenience!

@msftgits msftgits closed this Dec 20, 2016
@msftgits msftgits reopened this Dec 20, 2016
@msftclas
Copy link

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!
We will now validate the agreement and then real humans will evaluate your PR.

TTYL, MSBOT;

Used to require a new line inbetween ``` code blocks and preceding paragraph text.
Copy link
Contributor

@mjbvz mjbvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea, but this introduces a number of important regressions that need to be fixed before we can merge this in. Please take a look at the comments and let me know if you have any questions.

<key>while</key>
<string>(^|\G)(?!\s*\2\3*\s*$)</string>
<key>end</key>
<string>(^|\G)\s*([`~]{3,})\n</string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to use a while clause here. This prevents broken language grammars from leaking outside of the fenced block. Switching to while from end fixed a large number of syntax highlighting issues.

Copy link
Contributor Author

@joshpeng joshpeng Dec 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That is why the closing ``` wasn't captured though. I'll continue thinking about how to achieve both our goals.

Edit: I think I might have a solution by nesting patterns. Testing on my end.

@@ -566,11 +582,32 @@
<key>fenced_code_block_basic</key>
<dict>
<key>begin</key>
<string>(^|\G)\s*(([`~]){3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)(\s+.*)?$</string>
<string>(^|\G)\s*([`~]{3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)\n</string>
Copy link
Contributor

@mjbvz mjbvz Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the (\s+.*)?$bit. We allow arbitrary text on the rest of the line after the language identifier to support passing other attributes (like line numbers specifiers) that some markdown engines support.

<key>while</key>
<string>(^|\G)(?!\s*\2\3*\s*$)</string>
<key>end</key>
<string>(^|\G)\s*([`~]{3,})\n</string>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also reverted to how it was before so that we consume any number of spaces after the fence end and the end of line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the closing fenced code block should match the fence type originally used. That's why we used the back references instead of [`~]

Prevents leaks in MD code fences while also capturing the closing fence punctuations.
@joshpeng
Copy link
Contributor Author

joshpeng commented Dec 21, 2016

Thank you @mjbvz for shedding light on many scenarios I was unaware of. Latest commit addresses previous code review's issues in the following manner:

  1. Originally, using while instead of end prevented broken language grammar from leaking, but it also prevented capturing the closing fence punctuation. Now I am nesting the while-captures inside a primary pattern to achieve capturing of the closing fence while still preventing grammar leaks.

  2. (\s+.*)?$ at the end of begin-captures added back and tokenized as fenced_code.block.language.attributes. This is for any arbitrary text after the language identifier

  3. End-captures reverted to once again consume all whitespace after the closing fence as well as matching the opening fence's punctuation.

  4. End-captures closing fence's prefix whitespace handling improved to cover some scenarios where it should be detected as a raw block starter instead of fenced block closer.

@joshpeng
Copy link
Contributor Author

@mjbvz Hope you had a great Christmas. Was wondering if you had a chance to see these changes? Thanks.

@mjbvz
Copy link
Contributor

mjbvz commented Dec 31, 2016

The change looks good.

Before we merge this in, can you please take a look at the failing tests in travis. You likely have to run the colorization tests again locally and check in the updated markdown test file. See the "VS Code Tokenizer Tests" in the launch.json

@joshpeng
Copy link
Contributor Author

joshpeng commented Dec 31, 2016

@mjbvz I've uploaded the updated tokenizer Markdown test. The test passes locally, but Travis is still failing. Is that expected?

@mjbvz mjbvz merged commit b9a362a into microsoft:master Jan 3, 2017
@mjbvz
Copy link
Contributor

mjbvz commented Jan 3, 2017

@joshpeng Thank you for this change. I've gone ahead and merged it in. It should be available in the next insiders build

mjbvz pushed a commit that referenced this pull request Jan 20, 2017
* Fix typos

* Add Go, Rust and Scala

* Adjust Go, Rust and Scala's logic as per #17591
@joshpeng
Copy link
Contributor Author

joshpeng commented Feb 4, 2017

@mjbvz How do I get mentioned for my contributions in the release notes of 1.9.0? ;(

@mjbvz mjbvz added this to the January 2017 milestone Feb 8, 2017
@mjbvz
Copy link
Contributor

mjbvz commented Feb 8, 2017

Sorry about that @joshpeng. Let me see if I can fix things

@gregvanl I thought the contributor lists for the release notes was automatically generated. Did the PR have to be marked with "January 2017" for that to happen? What's the best way to update the list post-release?

@joshpeng
Copy link
Contributor Author

joshpeng commented Feb 9, 2017

@mjbvz Didn't make it into 1.9.1 notes either. oh well :[

@mjbvz
Copy link
Contributor

mjbvz commented Feb 9, 2017

I've added you to the 1.9 release notes: microsoft/vscode-docs@d0e9826

Sorry for the omission when this was first published and thanks again for the PR

@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants