Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Italics parsing broken when underscore is followed by some characters #96

Closed
whatisjasongoldstein opened this issue Mar 29, 2020 · 7 comments
Labels
bug has-workaround A bug that has a workaround.

Comments

@whatisjasongoldstein
Copy link

Suppose you have a possessive on an italicized word, like a book.

text = text="_Book Title_'s author"
mistletoe.markdown(text)
>> "<p><em>Book Title</em>'s author</p>\n"

It works as expected, with the book title wrapped in <em>.

But if the apostrophe is a smart quote, it doesn't apply the same treatment.

text="_Book Title_’s author"
mistletoe.markdown(text)
>> '<p>_Book Title_’s author</p>\n'

I'd be happy to help fix this but can't find where the handling of the first rule is coming from.

Thanks!

@pbodnar pbodnar added the bug label Sep 17, 2021
@pbodnar
Copy link
Collaborator

pbodnar commented Sep 17, 2021

It looks like this is not specific just to smart quotes, but generally also to other characters, like a-z. If the text between underscores should be parsed as an emphasis anytime, then this is a really a bug.

@pbodnar
Copy link
Collaborator

pbodnar commented Sep 18, 2021

I've just found a workaround until this gets fixed: use * instead of _, i. e. *Book Title*’s author.

@pbodnar pbodnar added the has-workaround A bug that has a workaround. label Sep 18, 2021
@pbodnar pbodnar changed the title Smart apostrophes after italics are treated incorrectly Italics parsing broken when underscore is followed by some characters Sep 18, 2021
@anderskaplan
Copy link
Contributor

Looking at the spec, https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis:

It seems the reason you get the italics with an apostrophe is because the apostrophe counts as a Unicode punctuation character. The smart quote doesn't, so the italics shouldn't kick in.

This probably means that the asterisk workaround shouldn't work either 🤷

@pbodnar
Copy link
Collaborator

pbodnar commented Aug 9, 2022

@anderskaplan, thanks for taking time for an analysis - can you please elaborate on your thoughts a little bit?

Because, maybe I have overlooked something, but it seems to me, a markdown like _Book Title_’s author should result in italics for the 1st part, because according to the spec, what follows after the 2nd _ shouldn't make a big difference, to quote what I think are the relevant parts:

  1. https://spec.commonmark.org/0.30/#right-flanking-delimiter-run:

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character.

  1. https://spec.commonmark.org/0.30/#can-close-emphasis:
  1. A single * character can close emphasis iff it is part of a right-flanking delimiter run.

  2. A single _ character can close emphasis iff it is part of a right-flanking delimiter run and either (a) not part of a left-flanking delimiter run or (b) part of a left-flanking delimiter run followed by a Unicode punctuation character.

Also, when trying out the example here, GitHub also renders italics. So does their parser follow the spec, or not? :)

@anderskaplan
Copy link
Contributor

Hmm, you're right. I looked further into the specs, and the smart quote does indeed count as a unicode punctuation character, it's in the Final Punctuation (Pf) category. So it should work just like the straight quote.

As for your comment above, @pbodnar, about this not being specific to just smart quotes but also to other characters like a-z. I just want to point out that regular letters and punctuation should be handled differently according to the spec, and that mistletoe is probably doing the right thing for the regular letters. In this example, there should only be emphasis if the closing underscore is followed by punctuation (not regular letters), as it is part of a left-flanking delimiter run.

The rules are slightly different for underscore-delimited emphasis and asterisk-delimited emphasis, so that explains why the workaround can work. 😃

So in conclusion, I think the problem is that smart quotes (and all other Unicode punctuation characters) should be handled exactly like straight quotes, but they aren't.

@pbodnar
Copy link
Collaborator

pbodnar commented Aug 13, 2022

@anderskaplan, it seems you're right with your conclusions as well, thank you. :)

So it looks like now we need to find the right place in the code and probably widen the set of "punctuation characters" there if I got it right...

anderskaplan added a commit to anderskaplan/mistletoe that referenced this issue Aug 20, 2022
…wed by some characters.

The issue was that smart quotes, as well as any other non-ascii punctuation characters, were not handled like ascii punctuation in the parsing of emphasis/strong tokens.
Solved by including all unicode punctuation in the set of punctuation characters.
pbodnar pushed a commit that referenced this issue Aug 21, 2022
* Fix for #96, Italics parsing broken when underscore is followed by some characters.

The issue was that smart quotes, as well as any other non-ascii punctuation characters, were not handled like ascii punctuation in the parsing of emphasis/strong tokens.
Solved by including all unicode punctuation in the set of punctuation characters.

* Added test cases for emphasis without punctuation. Expecting different behavior for underscore and asterisk delimiters.
@pbodnar
Copy link
Collaborator

pbodnar commented Jan 10, 2023

close: resolved by the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug has-workaround A bug that has a workaround.
Projects
None yet
Development

No branches or pull requests

3 participants