Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[For v3+ / breaking change] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always #14936

Open
tats-u opened this issue Jun 11, 2023 · 27 comments · May be fixed by #15081
Labels
lang:markdown Issues affecting Markdown type:bug Issues identifying ugly output, or a defect in the program

Comments

@tats-u
Copy link
Contributor

tats-u commented Jun 11, 2023

Prettier 2.8.8
Playground link

# Options (if any):
--print-width=40 --prose-wrap=always

Input:

English 日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語 English

Output:

English 日本語日本語日本語日本語日本語日
本語日本語日本語日本語日本語日本語日本語
日本語日本語日本語日本語日本語日本語
English

Expected behavior:

English
日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語日本語
English

Almost all Markdown renderers always treat the line breaking as space even if han/kana is included in characters surrounding it.

English 日本語日本語日本語日本語日本語日
本語日本語日本語日本語日本語日本語日本語
日本語日本語日本語日本語日本語日本語
English

English 日本語日本語日本語日本語日本語日 本語日本語日本語日本語日本語日本語日本語 日本語日本語日本語日本語日本語日本 English

Which should Prettier consider is more important, the appearance of formatted documents, or the compatibility with Markdown renderers?

It is OK for me to keep the current behavior and ask authors of renderers to change their behavior.


Memo:

#3026 (the beginning of the nightmare)

#5040 (mitigation for Korean)

#11597 (mitigation)

#14936 (fix for this issue)

@tats-u tats-u changed the title [For v3 / breaking changes] when proseWrap is always [For v3 / breaking changes] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always Jun 11, 2023
@tats-u tats-u changed the title [For v3 / breaking changes] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always [For v3 / breaking change] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always Jun 11, 2023
@tats-u tats-u changed the title [For v3 / breaking change] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always [For v3+ / breaking change] Should we change how to handle line breaking surrounded by han/kana when proseWrap is always Jun 11, 2023
@kachkaev
Copy link
Member

👋 @tats-u, Is #11597 solve your issue? See blog post draft in the diff. This is change planned for 3.0.

@kachkaev kachkaev added the status:awaiting response Issues that require answers to questions from maintainers before action can be taken label Jun 13, 2023
@tats-u
Copy link
Contributor Author

tats-u commented Jun 13, 2023

@github-actions github-actions bot removed the status:awaiting response Issues that require answers to questions from maintainers before action can be taken label Jun 13, 2023
@kachkaev kachkaev added status:needs discussion Issues needing discussion and a decision to be made before action can be taken lang:markdown Issues affecting Markdown labels Jun 13, 2023
@tats-u
Copy link
Contributor Author

tats-u commented Jun 14, 2023

A conforming parser may render a soft line break in HTML either as a line ending or as a space.

A renderer may also provide an option to render soft line breaks as hard line breaks.

https://spec.commonmark.org/0.30/#softbreak

@tats-u
Copy link
Contributor Author

tats-u commented Jun 14, 2023

I tried in CodePen.

HTML:

あ
い
う
a
b
c
あ
。
a。
あ。
あ!
あ!

Looks like in Firefox:

あいう a b c あ。 a。 あ。あ!あ! 

Copy & paste / Looks like in Chrome:

あ い う a b c あ 。 a。 あ。 あ! あ! 

I also previewed this HTML in both browser and got the same result.

<!DOCTYPE html>
<html lang="ja">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document</title>
</head>
<body>
あ
い
う
a
b
c
あ
。
a。
あ。
あ!
あ! 
</body>
</html>

https://infra.spec.whatwg.org/#strip-and-collapse-ascii-whitespace

To strip and collapse ASCII whitespace in a string, replace any sequence of one or more consecutive code points that are ASCII whitespace in the string with a single U+0020 SPACE code point, and then remove any leading and trailing ASCII whitespace from that string.

↑ I couldn't find where it's used.

ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 4, 2023

I (or we) want to have a PR in time for the 3.0 release, or it'll be much more difficult to reach agreement on a fix.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 5, 2023

Omg v3 was suddenly released without any betas. Should we push this back to v4?
I think proseWrap = always is not practical In Chinese and Japanese because there's no soft line breaks stipulated in HTML & Markdown in these languages.
Line breaks are never converted to spaces there.

@fisker fisker added type:bug Issues identifying ugly output, or a defect in the program and removed status:needs discussion Issues needing discussion and a decision to be made before action can be taken labels Jul 7, 2023
@fisker
Copy link
Member

fisker commented Jul 7, 2023

I agree it's a bug. We can only change the space to a new line.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 7, 2023

@fisker Can't we do the opposite like HTML?
I think compatibility with HTML should be observed rather than the number of characters per line.

@fisker
Copy link
Member

fisker commented Jul 7, 2023

I think so.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 7, 2023

@fisker
I couldn't get which your "so" refers to, but:

  • Newline and space will be able to be interchanged with each other unconditionally
  • Continuous Chinese and Japanese won't be broken (like Korean)

This is my plan. Can we go with it?

@fisker
Copy link
Member

fisker commented Jul 7, 2023

I mean

Newline and space will be able to be interchanged with each other unconditionally.


Continuous Chinese and Japanese won't be broken (like Korean)

Make sense.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 7, 2023

Thanks. Now that we have the specs, someone or I can try for a PR.

@fisker
Copy link
Member

fisker commented Jul 7, 2023

Let's wait a bit longer for more maintainers to express their opinions before start.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 7, 2023

Okey dokey.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 7, 2023

By the way, if this is a bug, can we force a patch for it into v3.0.1 or 3.1, without waiting for v4?

@fisker
Copy link
Member

fisker commented Jul 7, 2023

I forgot how it works in v2. Do we insert new lines?

@tats-u
Copy link
Contributor Author

tats-u commented Jul 9, 2023

@fisker Should we still remove trailing or leading full-width spaces (U+3000)?
I agree with the trailing without any conditions and the leading only if recommends users to use the following CSS:

/* or some classes */
p {
    text-indent: -1em;
    margin-left: 1em;
}

@fisker
Copy link
Member

fisker commented Jul 11, 2023

Should we still remove trailing or leading full-width spaces (U+3000)?

I think we need keep it.

@fisker
Copy link
Member

fisker commented Jul 11, 2023

Newline and space will be able to be interchanged with each other unconditionally.

This should be HTML whitespace. They can change.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 16, 2023

@fisker

This should be HTML whitespace. They can change.

I continue with the current specs.

I think we need keep it.

You say we should remove this line, right?

![lastNode.value, node.value].some((value) => /\u3000/.test(value))

@thorn0
Copy link
Member

thorn0 commented Jul 21, 2023

Which should Prettier consider is more important, the appearance of formatted documents, or the compatibility with Markdown renderers?

The CommonMark spec is Prettier's main reference point, so I agree this needs to be changed.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

@thorn0 Thanks for your approval.


![lastNode.value, node.value].some((value) => /\u3000/.test(value))

I found this is innocent.
The real culprits are String.prototype.trim{Start,End}. This seems to remove all type of spaces (Zs in Unicode).

@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

I have started to use the hack since several weeks ago:

((U+FEFF))<SomeComponent />I *can* use Markdown with leading JSX like `<SomeComponent />` in MDX v1!

((U+FEFF)) is the ZERO WIDTH NO-BREAK SPACE.

But Prettier removes the U+FEFF and the text after <SomeComponent /> will be rendered as is (not interpreted as Markdown).

@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

#15081 is now ready for battle tests.

@chalin
Copy link

chalin commented Jun 7, 2024

Any planned progress here and/or workarounds, other than disabling Prettier over zh-* and ja pages? Thanks. /cc @windsonsea

@tats-u
Copy link
Contributor Author

tats-u commented Jun 8, 2024

Unfortunately suspended recently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang:markdown Issues affecting Markdown type:bug Issues identifying ugly output, or a defect in the program
Projects
None yet
5 participants