Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tame wild white spaces #4467

Merged
merged 11 commits into from
May 23, 2023
Merged

Tame wild white spaces #4467

merged 11 commits into from
May 23, 2023

Conversation

zurfyx
Copy link
Member

@zurfyx zurfyx commented May 7, 2023

W3-inspired white space handling on paste

This PR attempts to resolve for once and for all the commonly repeated issues around spacing on paste. Turns out there's a well-defined spec (https://www.w3.org/TR/css-text-3/#white-space-processing) for what seemed to be wild-west across various editor implementations.

case1.mov
Screen.Recording.2023-05-07.at.2.46.13.pm.mov

W3 summary

https://www.w3.org/TR/css-text-3/#white-space-processing

When white-space is pre or similar variants, we don't do any special processing, we assume the incoming text reads as the rendered copy. This is the case for Apple Notes, Chrome, Safari and GDoc.

Otherwise, we transform the text according to collapsible rules. This applies to MS Word, Quip and Notion. Firefox fails W3 rules but it also falls under this bucket.

At a high level, collapsibles mean removing redundant spaces/LF/CRLF/tab:

  • At the start/end of the line
  • Redundant consequent
  • Transformation of LF, CRLF and tab into spaces

The current implementation respects the historical space transform listed in W3 (instead of removal).

BR rule

This PR also touches BR. The spec (https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-br-element) refers to a "single" element but I'm fairly confident this is after the collapsibles.

But this PR is not 100% compliant

There is quite a lot CSS to understand to make it 100% compliant (and some of these properties even go beyond what Lexical supports at this point) but the implemented should work in most cases.

For example, pre, pre-wrap and pre-line differences, understanding the tab-size to determine how it should render, display flex and table, or dropping Space Marks. I didn't prioritize any of the aforementioned because no other editor supports any these, they just handle them gracefully.

Performance

When these rules apply (bucket 2 above), Lexical obviously runs slower as it has to traverse backward and forward accordingly, slice and compute styles.

Pasting the Moby Dick test from Apple Notes results in a 10% regression, from 1.53 to 1.68s on a 2019 MBP.

Edit: dropped getComputedStyles as we don't expect clipboard users to make use of StyleSheets. And, fixed cache.

Pasting the Moby Dick test from Apple Notes results in a 0.7% regression, from 4.27s to 4.30s on a M1 with x6 slowdown.

Before After
Screenshot 2023-05-15 at 2 58 12 PM Screenshot 2023-05-15 at 2 58 08 PM

Bundle size

Arguably this increases the size of TextNode considerably, but eventually (as discussed with @acywatson) all HTML code will be decoupled from Core. Even without this PR, this would have a drastic bundle size decrease for our plain text users.

Implementation: why on TextNode vs a pre-cleanup?

I've discussed this one offline with @fantactuka but ultimately this one is very opinionated. The alternative to this PR would be a cleanup process before passing the DOM Nodes to the LexicalNode importDOM.

The pre-cleanup assumes that the incoming HTML is bad, but it's actually valid correct HTML, just not in the shape we like. We can add this one-off to handle spacing, but then there will be more, ultimately leading to allowing an array of cleanup steps into the plugin when there was no need to do this in the first place.

Instead, the Nodes themselves can take responsibility just like they are doing now. This might not be as optimal as the one-pass approach but the implementation is still fast considering we never look beyond inlines and that, in most cases, they won't be filled with fragmented white spaces, so average case will still be O(N).


Other relevant PRs:

Fixes #4466
Fixes #3677
Fixes #4370

@vercel
Copy link

vercel bot commented May 7, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lexical ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 23, 2023 7:39pm
lexical-playground ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 23, 2023 7:39pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2023
@github-actions
Copy link

github-actions bot commented May 7, 2023

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
packages/lexical/dist/Lexical.js 27.78 KB (+1.94% 🔺) 556 ms (+1.94% 🔺) 294 ms (-5.62% 🔽) 850 ms
packages/lexical-rich-text/dist/LexicalRichText.js 38.71 KB (+1.34% 🔺) 775 ms (+1.34% 🔺) 408 ms (-24.01% 🔽) 1.2 s
packages/lexical-plain-text/dist/LexicalPlainText.js 38.69 KB (+1.34% 🔺) 774 ms (+1.34% 🔺) 231 ms (-59.92% 🔽) 1.1 s

Copy link
Contributor

@fantactuka fantactuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫡

@zurfyx zurfyx merged commit d05ae76 into main May 23, 2023
@zurfyx zurfyx deleted the spaces branch May 23, 2023 20:31
zurfyx added a commit that referenced this pull request May 23, 2023
zurfyx added a commit that referenced this pull request May 23, 2023
This was referenced May 23, 2023
@birtles
Copy link
Contributor

birtles commented May 24, 2023

@zurfyx I hope you don't mind me asking here before filing a new issue. v0.11.0 is breaking some of our copy-paste roundtrip tests where we're using \xA0 ( ) as a non-collapsible whitespace character when copying.

As far as I can tell, the spec you pointed to here doesn't allow collapsing non-breaking space characters. Specifically it has:

Except where specified otherwise, white space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

I couldn't find anywhere that allows collapsing U+00A0.

Looking at the patch, it seems to be using the \s character class in a number of places, e.g.

  textContent = textContent
    .replace(/\r?\n|\t/gm, ' ')
    .replace('\r', '')
    .replace(/\s+/g, ' ');

However, \s expands to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

Is that intended, or shall I file a bug for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
5 participants