-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Italics parsing broken when underscore is followed by some characters #96
Comments
It looks like this is not specific just to smart quotes, but generally also to other characters, like a-z. If the text between underscores should be parsed as an emphasis anytime, then this is a really a bug. |
I've just found a workaround until this gets fixed: use |
Looking at the spec, https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis: It seems the reason you get the italics with an apostrophe is because the apostrophe counts as a Unicode punctuation character. The smart quote doesn't, so the italics shouldn't kick in. This probably means that the asterisk workaround shouldn't work either 🤷 |
@anderskaplan, thanks for taking time for an analysis - can you please elaborate on your thoughts a little bit? Because, maybe I have overlooked something, but it seems to me, a markdown like
Also, when trying out the example here, GitHub also renders italics. So does their parser follow the spec, or not? :) |
Hmm, you're right. I looked further into the specs, and the smart quote does indeed count as a unicode punctuation character, it's in the Final Punctuation (Pf) category. So it should work just like the straight quote. As for your comment above, @pbodnar, about this not being specific to just smart quotes but also to other characters like a-z. I just want to point out that regular letters and punctuation should be handled differently according to the spec, and that mistletoe is probably doing the right thing for the regular letters. In this example, there should only be emphasis if the closing underscore is followed by punctuation (not regular letters), as it is part of a left-flanking delimiter run. The rules are slightly different for underscore-delimited emphasis and asterisk-delimited emphasis, so that explains why the workaround can work. 😃 So in conclusion, I think the problem is that smart quotes (and all other Unicode punctuation characters) should be handled exactly like straight quotes, but they aren't. |
@anderskaplan, it seems you're right with your conclusions as well, thank you. :) So it looks like now we need to find the right place in the code and probably widen the set of "punctuation characters" there if I got it right... |
…wed by some characters. The issue was that smart quotes, as well as any other non-ascii punctuation characters, were not handled like ascii punctuation in the parsing of emphasis/strong tokens. Solved by including all unicode punctuation in the set of punctuation characters.
* Fix for #96, Italics parsing broken when underscore is followed by some characters. The issue was that smart quotes, as well as any other non-ascii punctuation characters, were not handled like ascii punctuation in the parsing of emphasis/strong tokens. Solved by including all unicode punctuation in the set of punctuation characters. * Added test cases for emphasis without punctuation. Expecting different behavior for underscore and asterisk delimiters.
close: resolved by the PR |
Suppose you have a possessive on an italicized word, like a book.
It works as expected, with the book title wrapped in
<em>
.But if the apostrophe is a smart quote, it doesn't apply the same treatment.
I'd be happy to help fix this but can't find where the handling of the first rule is coming from.
Thanks!
The text was updated successfully, but these errors were encountered: