-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown lexer edge case with links and square brackets (rare) #1444
Comments
Agreed this should parse. It works fine in CommonMark. |
The problem is with how the regex uses non-greedy matching in rules like this: rouge/lib/rouge/lexers/markdown.rb Line 105 in 65207be
The problem that I can imagine is that a more complex set of rules potentially leads you to a state where it's difficult to know on an initial pass whether you're in a link or not. |
@pyrmont Would it be crazy to add the opening paren for the link, as a way to identify the full link-text section? - rule %r/(!?\[)(#{edot}*?)(\])/ do
+ rule %r/(!?\[)(#{edot}*?)(\]\()/ do |
It would need to be either a |
(Sidenote, I just noticed you and jneen are both in Japan, which is funny because I'm in Kagoshima!) |
@pyrmont I struggle with lookahead logic and never got a good grasp of it, but I could easily add a rule %r/(!?\[)(#{edot}*?)(\]\(|\]\[)/ do |
You can think of a lookahead as a special kind of 'group' (using |
When hiding brackets inside links with backticks, rendering was breaking. This makes the lexer search for `](` or `](` to find the end delimiting the link text.
@pyrmont I read through some regex docs and I'm starting to grasp it, but with the tool I was using your example wasn't working, though a slight modification worked. Does this make sense? rule %r/(!?\[)(#{edot}*?)(?=\]\(|\]\[)/ do Once I got that working, I started to wonder if I should use lookbehind to skip the first bracket as well. Does this make sense? rule %r/(!?(?<=\[))(#{edot}*?)(?=\]\(|\]\[)/ do |
I got it working now, turns out the online tool was just buggy and I had to refresh and it just started working again. I'll add the lookahead to the PR. |
When hiding brackets inside links (with backticks), coloring was breaking. This makes the lexer search for the open bracket or paren that starts the url or reference link
Name of the lexer
Markdown
This might be too rare to bother put effort into it, but this is something I run into from time to time as some of our engineers like linking to TOML section examples, and the section names are usually
[section]
or[[section]]
.I'm raising the issue because::
Code sample
Does not work with GitHub's coloring either:
Does not work with Rouge:
Works fine with VS Code:
Additional context
Yeah, it's a rare edge case. Now that I saw one fix earlier this week (that was fast, btw!), I'll see if I can spot any issues with the regex (though I'm still learning).
The text was updated successfully, but these errors were encountered: