Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[markdown rendering issue] stick words with italic is not working #2040

Open
Canine89 opened this issue Aug 3, 2020 · 4 comments
Open

[markdown rendering issue] stick words with italic is not working #2040

Canine89 opened this issue Aug 3, 2020 · 4 comments

Comments

@Canine89
Copy link

Canine89 commented Aug 3, 2020

Hi iliakan.
I'm participating in the javascript.ko project.
By the way, I found markdown rendering issue.
It may only occur in Korean documents.
See below screenshot.

image

Korean sometimes has to stick words together.
I Hope this helps fix the issue.

@iliakan
Copy link
Member

iliakan commented Aug 3, 2020

Right now italic requires spaces to the both sides of *. So in expressions like 5*2 the star is not considered a special markup character.

I can tweak this rule, but tell me how? We don't want the star * to be mistreated in other situations.

Also, is this a real problem? Can you rephrase Korean?

@Violet-Bora-Lee what you think?

@Violet-Bora-Lee
Copy link
Member

Until now, I've added a whitespace in order to avoid this problem.
It is wrong spell inserting whitespace actually.

Is it easy to implement? If not, I could add some guide on the README file.

@iliakan
Copy link
Member

iliakan commented Aug 4, 2020

Hi,

The question is deeper than one might think.

Some time ago we rewrote our parser to base it on https://github.com/markdown-it/markdown-it, that implements the CommonMark specification.

The italic/bold thing is handled by that parser, according to the spec.

I decided to see what the CommonMark spec says about that case.

At https://commonmark.org/help/tutorial/02-emphasis.html I entered *마크다운(script)*렌더링 (copied-pasted arbitrary Korean chars, otherwise I have no Korean on my keyboard).

And it remained *마크다운(script)*렌더링 (as is), the CommonMark didn't convert it.

So if we want it to be converted, we need to tell the guys who make the CommonMark specification about it. Then they can hopefully update the spec, and then the parser updates too, so everyone's happy.

I suggest going to https://talk.commonmark.org/ and making a topic there, such as "BUG in Korean" and then describe the issue. Maybe there's a way out.

P.S. Please note: the problem occurs only if I put "(script)" in the phrase. Maybe ) has something to do with it.

@spencer246
Copy link

This is not a bug, but an unfortunate (and arguably bad) design choice made by CommonMark. (See the spec.)

tldr: use <em>스크립트(script)</em>라고 or, if you really want to use Markdown syntax, *스크립트(script)*&#8203;라고.


To parse nested emphases such as *emphasized **(strong)** text* efficiently (i.e., without having to looking for pairing delimiters), CommonMark parses */** as an opening or closing delimiter by heuristics, using a preceding character and a following character of each delimiter.

Unfortunately, the heuristics is far from being perfect.

Markdown Rendering
**super-*wo*-man** super-wo-man
*super-*woman *super-*woman

(Example drawn from commonmark/commonmark-spec#643)

The * in -*w is always parsed as an opening delimiter, which makes *super-*woman rendered "incorrectly." (However, this behavior is an intended behavior of CommonMark. It is a part of the spec.)

In practical English or European text, however, the case like *super-*woman is almost nonexistent; one would naturally use *super*-woman instead. However, in CJK text, the unintended side effect of the above heuristics is a real issue, and it has been reported to the CommonMark side several times: link 1(Japanese), link 2(Chinese) since at least 2016.

More regrettably, it seems that the development of CommonMark has been stagnated, so the best bet right now is to use one of the following workarounds:

  1. Use zero-width space (ZWSP, U+200B) character &#8203;:
    *스크립트(script)*&#8203;라고
    If you wish, you may use the more descriptive equivalent &ZeroWidthSpace; or the hex representation &#x200b;.
    Note that, although it is invisible to humans, a zero-width space character is rendered to HTML as a Unicode whitespace character, which is not desirable in text searching, etc.

  2. Use a raw HTML tag: <em>스크립트(script)</em>라고
    (Markdown supports inline HTML tags.)

Personally, I would use the second option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants