Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Markdown's newline & space handling conform to web-platform-tests & Firefox's behavior #15081

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tats-u
Copy link
Contributor

@tats-u tats-u commented Jul 11, 2023

Description

Fixes #14936

  • Don't wrap even Chinese & Japanese when proseWrap is always
  • Make newline and whitespace interchangeable even between (Chinese or Japanese) and another character when proseWrap is always or never
  • Don't trim spaces other than ASCII whitespace at the beginning or end of paragraphs

Checklist

  • I’ve added tests to confirm my change works.
  • (If changing the API or CLI) I’ve documented the changes I’ve made (in the docs/ directory).
  • (If the change is user-facing) I’ve added my changes to changelog_unreleased/*/XXXX.md file following changelog_unreleased/TEMPLATE.md.
  • I’ve read the contributing guidelines.

Try the playground for this PR

@tats-u tats-u force-pushed the markdown-newline-html branch 3 times, most recently from 92276e4 to 01d93c6 Compare July 13, 2023 13:57
@tats-u tats-u marked this pull request as ready for review July 16, 2023 14:55
@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from 894c31f to 01eb21b Compare July 22, 2023 11:54
@tats-u tats-u changed the title Make Makdown's newline handling conform to HTML Make Makdown's newline & space handling conform to HTML Jul 22, 2023
@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

Shoud I add changelog_unreleased/markdown/15081.md?

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Can you send a separate PR for this first?

Don't trim spaces other than ASCII whitespace at the beginning or end of paragraphs

And we can use this module to do the html trim.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 22, 2023

@fisker In CommonMark spec, the form feed \f (U+000C) is in the same category as the full-width space (and other spaces in Zs).

https://spec.commonmark.org/0.30/#unicode-whitespace-character

Either way, it is destined to be removed at the stage where the converted HTML is processed, so it may as well be removed at the stage of conversion in Prettier.

https://codepen.io/tats-u/pen/bGQjrvM

I'll reuse that code.

Trailing form feeds don't seem always to be removed.

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Interesting, https://bugs.webkit.org/show_bug.cgi?id=13159 seems we already made mistake in HTML/Handlebars printer.

@fisker
Copy link
Member

fisker commented Jul 22, 2023

Anyway, I think we should reuse htmlWhitespaceUtils, but consider remove \f in future if it really matters.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 23, 2023

@fisker Your change has just been applied.

@tats-u tats-u changed the title Make Makdown's newline & space handling conform to HTML Make Markdown's newline & space handling conform to HTML Jul 23, 2023
@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from 90de55f to 304b1de Compare July 23, 2023 10:58
@tats-u
Copy link
Contributor Author

tats-u commented Jul 24, 2023

I found an interesting description in https://drafts.csswg.org/css-text-3/#line-break-transform

Then any remaining segment break (\n) is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.

NOTE: Historically, HTML and CSS have unconditionally converted segment breaks to spaces, which has prevented content authored in languages such as Chinese from being able to break lines within the source. Thus UA heuristics need to be conservative about where they discard segment breaks even as they strive to improve support for such languages.

What sucks is not the HTML or CSS specs but browsers' implementations.
I have to modify the change log and comments in the source code.

@tats-u tats-u changed the title Make Markdown's newline & space handling conform to HTML Make Markdown's newline & space handling conform to browsers Jul 27, 2023
if (isLink) {
return true;
}
function lineBreakCanBeConvertedToSpace() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should remove this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on whether we'll have to add it again.
I can't predict when all browsers will change their behavior.

@tats-u tats-u marked this pull request as draft July 29, 2023 03:12
Copy link
Member

@fisker fisker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result looks good, thank you!

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker I'll change the behavior.

I looked into the detailed behavior of Firefox.

Table: Is a newline between the followings converted to a space in Firefox?

\ Latin & Korean Chinese & Japanese & CJK punctuations
Latin & Korean Yes Yes
Chinese & Japanese & CJK punctuations Yes No

A newline between [Chinese & Japanese & CJK punctuations] should not be converted to a space.
Once it is removed, these

漢字
だよ

↓ OK

漢字だよ。

↓ STOP!

漢字
だよ。

This will improve the compatibility with the conventional behavior to some extend and reduce browser-defined behaviors.

A space around Chinese (& Japanese) is going not to be converted to a newline. It's not compatible with Firefox.

@fisker
Copy link
Member

fisker commented Jul 29, 2023

A newline between [Chinese & Japanese & CJK punctuations] should not be converted to a space.

Didn't get it.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

[Chinese & Japanese & CJK punctuations]

These seem to be treated as a single group in Firefox.
A newline surrounded only by that group is simply removed (in a view).
Of course, a space is added once copied and pasted on somewhere else.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker
Copy link
Member

fisker commented Jul 29, 2023

Should't we only need make sure the output equivalence in html?

Ah, maybe you are right, in markdown one single new line doesn't generate a space. But should not related to languages.

@tats-u
Copy link
Contributor Author

tats-u commented Jul 29, 2023

@fisker

But should not related to languages.

Does it mean it's better to always treat a newline in Markdown as a space even if surrounded only by han or kana?
Indeed Markdown viewers other than browsers use Blink, but the behavior doesn't look natural as a Chinese or Japanese paragraph for me.
For the first place few people break lines in Chinese or Japanese sentences in Markdown documents, though.

@tats-u
Copy link
Contributor Author

tats-u commented Aug 6, 2023

I found the latest version of CSS is 4 but the specification related to this issue has not been changed.

https://drafts.csswg.org/css-text-4/#white-space-trim

@fisker
Copy link
Member

fisker commented Aug 6, 2023

I'm glad you seem to understand it.

For the record, I don't.

I still think we only need make sure we doesn't change the output "HTML".

a b.

equals to

<p>a b.</p>

and

a
b.

equals to

<p>a
b.</p>

And you are saying they are not equal when we replace a or b with other characters.

This means we also handle it incorrectly in HTML printer. Playground link I think we should deal with the HTML printer first then.

@fisker
Copy link
Member

fisker commented Aug 6, 2023

@thorn0 You are more familiar with the markdown text wrapping. Do you have a comment about the problem here?

@tats-u tats-u closed this Aug 6, 2023
@tats-u tats-u reopened this Aug 6, 2023
@tats-u
Copy link
Contributor Author

tats-u commented Aug 6, 2023

@fisker

And you are saying they are not equal when we replace a or b with other characters.

I'm not saying that for the Markdown-to-HTML conversion. Your interruption for Markdown-to-HTML conversion is correct. All Newline is kept as is every when converted to HTML.
I've been discussing how the HTML is interpreted using CSS by browsers.

This means we also handle it incorrectly in HTML printer.

The current one breaks documents that explain how browsers handle a newline in HTML. I have to add prettier-ignore. I'll add concrete examples later.

The CommonMark specification misunderstands how newline in HTML should be interpreted by browsers. Its author mustn't have read the CSS specification.

A soft line break may be rendered in HTML either as a line ending or as a space. The result will be the same in browsers.

https://spec.commonmark.org/0.30/#soft-line-breaks

The last sentence isn't applicable to Chinese and Japanese documents.

@tats-u
Copy link
Contributor Author

tats-u commented Aug 7, 2023

<p>
  In the CSS specification, it depends on browsers how newlines in HTML are treated. When viewed in Firefox, no extra spaces appear in the following sentences.
</p>
<blockquote>
  <p>
    これは日本語
    の文です。这
    是一个中文句
    子。
  </p>
</blockquote>

@tats-u
Copy link
Contributor Author

tats-u commented Sep 9, 2023

@fisker

https://wpt.fyi/results/css/css-text/line-breaking?label=master&label=experimental&aligned
Test cases beginning with segment-break-transformation- are helpful.
Chrome treats the current behavior as a bug.

@tats-u
Copy link
Contributor Author

tats-u commented Sep 9, 2023

Shall we follow all its tests?

@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from 6032cc7 to e9c58f8 Compare September 16, 2023 12:08
@tats-u
Copy link
Contributor Author

tats-u commented Sep 16, 2023

I wonder why KATAKANA MIDDLE DOT U+30FB (・) has been treated as non-CJK in Prettier.
It is treated as a kind of Japanese letter in Firefox.

<p>中点
・
中点</p>

This is rendered as "中点・中点" in Firefox but Prettier can format the following Markdown as "中点 ・ 中点".

中点
・
中点

@tats-u
Copy link
Contributor Author

tats-u commented Sep 16, 2023

Katakana-Hiragana Double Hyphen U+30A0 (゠) is also excluded from CJK in Prettier even though it's in the Kataka Unicode block.

チンギス゠カン (Genghis Khan)

@tats-u tats-u marked this pull request as ready for review September 16, 2023 13:27
@tats-u tats-u requested a review from fisker September 16, 2023 13:27
@tats-u tats-u force-pushed the markdown-newline-html branch 2 times, most recently from f21da19 to 78d7874 Compare September 23, 2023 13:15
@tats-u tats-u changed the title Make Markdown's newline & space handling conform to browsers Make Markdown's newline & space handling conform to web-platform-tests & Firefox Sep 24, 2023
@tats-u tats-u changed the title Make Markdown's newline & space handling conform to web-platform-tests & Firefox Make Markdown's newline & space handling conform to web-platform-tests & Firefox's behavior Sep 24, 2023
@tats-u
Copy link
Contributor Author

tats-u commented Sep 25, 2023

@fisker I finished to change. You can start a review or merge.

@tats-u
Copy link
Contributor Author

tats-u commented Oct 2, 2023

Is this planned to be included in v3.1?

@sosukesuzuki
Copy link
Member

I think this has too much impact to release as a minor update...

@tats-u
Copy link
Contributor Author

tats-u commented Oct 8, 2023

I'll have to split changes in this PR into about 3 PRs:

  • Don't trim spaces other than ASCII whitespace at the beginning or end of paragraphs
  • Change for this change but don't be a breaking change
  • This change (maybe breaking)

Of course, I want the spec of Prettier to the current changes ultimately. I hope you agree with this plan.

@tats-u
Copy link
Contributor Author

tats-u commented Dec 4, 2023

@fisker @sosukesuzuki Recently I have noticed the plan for the release of v4.
How do you think about this change as a content in v4?

@fisker
Copy link
Member

fisker commented Dec 5, 2023

Current v4 is published for the new CLI test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants