Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(markdown): support CJK and emoji #3026

Merged
merged 11 commits into from
Oct 15, 2017

Conversation

ikatyang
Copy link
Member

@ikatyang ikatyang commented Oct 14, 2017

Fixes #3013

Rules:

  • CJK
    • put line(" " or "\n") between non-cjk and cjk-character
    • put softline("" or "\n") between cjk and cjk
    • put nothing between non-cjk and cjk-punctuation, i.e. they're considered not breakable
  • emoji
    • they're considered 2-char width

Question:

  • Is --split-cjk-text an suitable option name? removed the option and enabled it by default.

cc @azu as you might be interested in this.

src/util.js Outdated
"$1$2"
)
)
.split(/(\s+)/g)
Copy link
Contributor

@azu azu Oct 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/\s/.test(" "); // true
/\s match  (U+3000/Full-width ideographic space)
Because, \s is [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff] in JavaScript.

Current implementaion always replace   to .

Both are used for different usage in Japanese.
Actually, the result of rendering between " " and " " is different.

Example:

A   B(Three full-width space)
A B(Three space)

image

For more details, see Unicode spaces.

Conclusion and My Proposal:

  • Treat full-width space character as a space character is OK
  • Should not replace full-width space( ) to a space( ). Preserve these spaces.

FYI: Treat full-width space( ) as whitespace in JavaScript Source code is OK.
Because ECMAScript treat “Zs” category that include U+3000 as whitespace.

Related:

Many implementations use full-width ideographic space (cl-14) for the one em space appended after dividing punctuation marks (cl-04).
-- https://www.w3.org/TR/2012/NOTE-jlreq-20120403/#en-subheading2_1_6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just treat full-width whitespace as CJK punctuation.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| abc | def | ghi |
| ------ | ------ | ------ |
| 👍👍👍 | 👍👍👍 | 👍👍👍 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

These don't line up in my diff. Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's based on the font, if you use monospace-font that supports CJK character then you'll see:

image

@azz
Copy link
Member

azz commented Oct 14, 2017

Can we just turn splitting on and remove the option? If you don't want your text to wrap you can use <!-- prettier-ignore --> or .prettierignore if the whole file has CJK. There's minimal benefit to running prettier on the file if you don't want it to split the text for you.

@ikatyang
Copy link
Member Author

I'm fine with it but most of the markdown converters do not support CJK splitting, e.g. GitHub.

src/util.js Outdated
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
// `\s` but exclude full-width whitspace (`\u3000`)
.split(
/([ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\ufeff]+)/g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex can be written as /([^\S\u3000]+)/. Not sure if that's better. (It can even be written as /((?:(?!\u3000)\s)+)/, but that's definitely not better!)

(Also note that the g flag is not needed for .split().)

@ikatyang
Copy link
Member Author

@azu any further comments? otherwise I'd like to merge this tomorrow.

@azu
Copy link
Contributor

azu commented Oct 15, 2017

@ikatyang Look good to me.

@azz
Copy link
Member

azz commented Oct 15, 2017

I'm not completely up-to-speed with the nuances of CJK chars but if the tests are passing I'm happy!

@azz
Copy link
Member

azz commented Oct 15, 2017

Oh, I thought you said @azz 😆

@ikatyang
Copy link
Member Author

azz vs azu 😆

@ikatyang ikatyang merged commit c27cc7f into prettier:master Oct 15, 2017
@ikatyang ikatyang deleted the feat/markdown-CJK branch October 15, 2017 04:57
@lock lock bot added the locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting. label Jan 19, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked-due-to-inactivity Please open a new issue and fill out the template instead of commenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants