Tame wild white spaces #4467

zurfyx · 2023-05-07T19:15:10Z

W3-inspired white space handling on paste

This PR attempts to resolve for once and for all the commonly repeated issues around spacing on paste. Turns out there's a well-defined spec (https://www.w3.org/TR/css-text-3/#white-space-processing) for what seemed to be wild-west across various editor implementations.

case1.mov

Screen.Recording.2023-05-07.at.2.46.13.pm.mov

W3 summary

https://www.w3.org/TR/css-text-3/#white-space-processing

When white-space is pre or similar variants, we don't do any special processing, we assume the incoming text reads as the rendered copy. This is the case for Apple Notes, Chrome, Safari and GDoc.

Otherwise, we transform the text according to collapsible rules. This applies to MS Word, Quip and Notion. Firefox fails W3 rules but it also falls under this bucket.

At a high level, collapsibles mean removing redundant spaces/LF/CRLF/tab:

At the start/end of the line
Redundant consequent
Transformation of LF, CRLF and tab into spaces

The current implementation respects the historical space transform listed in W3 (instead of removal).

BR rule

This PR also touches BR. The spec (https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-br-element) refers to a "single" element but I'm fairly confident this is after the collapsibles.

But this PR is not 100% compliant

There is quite a lot CSS to understand to make it 100% compliant (and some of these properties even go beyond what Lexical supports at this point) but the implemented should work in most cases.

For example, pre, pre-wrap and pre-line differences, understanding the tab-size to determine how it should render, display flex and table, or dropping Space Marks. I didn't prioritize any of the aforementioned because no other editor supports any these, they just handle them gracefully.

Performance

When these rules apply (bucket 2 above), Lexical obviously runs slower as it has to traverse backward and forward accordingly, slice and compute styles.

~~Pasting the Moby Dick test from Apple Notes results in a 10% regression, from 1.53 to 1.68s on a 2019 MBP.~~

Edit: dropped getComputedStyles as we don't expect clipboard users to make use of StyleSheets. And, fixed cache.

Pasting the Moby Dick test from Apple Notes results in a 0.7% regression, from 4.27s to 4.30s on a M1 with x6 slowdown.

Before	After

Bundle size

Arguably this increases the size of TextNode considerably, but eventually (as discussed with @acywatson) all HTML code will be decoupled from Core. Even without this PR, this would have a drastic bundle size decrease for our plain text users.

Implementation: why on TextNode vs a pre-cleanup?

I've discussed this one offline with @fantactuka but ultimately this one is very opinionated. The alternative to this PR would be a cleanup process before passing the DOM Nodes to the LexicalNode importDOM.

The pre-cleanup assumes that the incoming HTML is bad, but it's actually valid correct HTML, just not in the shape we like. We can add this one-off to handle spacing, but then there will be more, ultimately leading to allowing an array of cleanup steps into the plugin when there was no need to do this in the first place.

Instead, the Nodes themselves can take responsibility just like they are doing now. This might not be as optimal as the one-pass approach but the implementation is still fast considering we never look beyond inlines and that, in most cases, they won't be filled with fragmented white spaces, so average case will still be O(N).

Other relevant PRs:

Fixes #4466
Fixes #3677
Fixes #4370

vercel · 2023-05-07T19:15:16Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lexical	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 23, 2023 7:39pm
lexical-playground	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 23, 2023 7:39pm

github-actions · 2023-05-07T19:17:04Z

size-limit report 📦

Path	Size	Loading time (3g)	Running time (snapdragon)	Total time
packages/lexical/dist/Lexical.js	27.78 KB (+1.94% 🔺)	556 ms (+1.94% 🔺)	294 ms (-5.62% 🔽)	850 ms
packages/lexical-rich-text/dist/LexicalRichText.js	38.71 KB (+1.34% 🔺)	775 ms (+1.34% 🔺)	408 ms (-24.01% 🔽)	1.2 s
packages/lexical-plain-text/dist/LexicalPlainText.js	38.69 KB (+1.34% 🔺)	774 ms (+1.34% 🔺)	231 ms (-59.92% 🔽)	1.1 s

fantactuka

🫡

birtles · 2023-05-24T06:27:41Z

@zurfyx I hope you don't mind me asking here before filing a new issue. v0.11.0 is breaking some of our copy-paste roundtrip tests where we're using \xA0 ( ) as a non-collapsible whitespace character when copying.

As far as I can tell, the spec you pointed to here doesn't allow collapsing non-breaking space characters. Specifically it has:

Except where specified otherwise, white space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

I couldn't find anywhere that allows collapsing U+00A0.

Looking at the patch, it seems to be using the \s character class in a number of places, e.g.

  textContent = textContent
    .replace(/\r?\n|\t/gm, ' ')
    .replace('\r', '')
    .replace(/\s+/g, ' ');

However, \s expands to [\f\n\r\t\v\u0020\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

Is that intended, or shall I file a bug for this?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2023

vercel bot deployed to Preview – lexical May 13, 2023 17:47 View deployment

vercel bot deployed to Preview – lexical May 13, 2023 17:50 View deployment

vercel bot deployed to Preview – lexical-playground May 13, 2023 17:52 View deployment

vercel bot deployed to Preview – lexical May 15, 2023 13:27 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 13:29 View deployment

vercel bot deployed to Preview – lexical May 15, 2023 13:30 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 13:31 View deployment

zurfyx marked this pull request as ready for review May 15, 2023 13:33

zurfyx requested review from fantactuka, acywatson, tylerjbainbridge and thegreatercurve as code owners May 15, 2023 13:33

zurfyx and others added 6 commits May 15, 2023 09:52

spaces

c346c5d

cache/cleanup

b82fdbf

.

fb3a975

.

c4351b5

.

399bfb0

fix e2e

c297404

zurfyx force-pushed the spaces branch from d07f14d to c297404 Compare May 15, 2023 14:14

vercel bot deployed to Preview – lexical May 15, 2023 14:15 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 14:16 View deployment

Drop getComputedStyle

bbbad23

vercel bot deployed to Preview – lexical May 15, 2023 18:31 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 18:32 View deployment

fix cache

9029d4f

vercel bot deployed to Preview – lexical May 15, 2023 19:00 View deployment

vercel bot deployed to Preview – lexical-playground May 15, 2023 19:02 View deployment

trying something differenet

9c2f63e

vercel bot deployed to Preview – lexical May 19, 2023 17:53 View deployment

vercel bot deployed to Preview – lexical-playground May 19, 2023 17:55 View deployment

try again

a65fee7

vercel bot deployed to Preview – lexical May 19, 2023 19:16 View deployment

vercel bot deployed to Preview – lexical-playground May 19, 2023 19:18 View deployment

acywatson approved these changes May 23, 2023

View reviewed changes

Merge branch 'main' into spaces

0793cdf

vercel bot deployed to Preview – lexical May 23, 2023 19:37 View deployment

vercel bot deployed to Preview – lexical-playground May 23, 2023 19:39 View deployment

fantactuka approved these changes May 23, 2023

View reviewed changes

zurfyx merged commit d05ae76 into main May 23, 2023

zurfyx deleted the spaces branch May 23, 2023 20:31

zurfyx added a commit that referenced this pull request May 23, 2023

Tame wild white spaces (#4467)

696366f

zurfyx added a commit that referenced this pull request May 23, 2023

Tame wild white spaces (#4467)

f514dec

This was referenced May 23, 2023

v0.11.0 #4538

Closed

v0.11.0 #4539

Merged

birtles mentioned this pull request May 27, 2023

Bug: Lexical collapses non-collapsible whitespace on paste [regression] #4564

Closed

birtles mentioned this pull request Jul 12, 2023

Stop collapsing non-collapsible whitespace on paste #4770

Merged

augustobmoura mentioned this pull request Aug 25, 2023

Bug: HTML serialization and deseralization are not symmetric #4919

Closed

zurfyx mentioned this pull request Sep 5, 2023

Retain TextNode whitespace on export #4969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tame wild white spaces #4467

Tame wild white spaces #4467

zurfyx commented May 7, 2023 •

edited

Loading

vercel bot commented May 7, 2023 •

edited

Loading

github-actions bot commented May 7, 2023 •

edited

Loading

fantactuka left a comment

birtles commented May 24, 2023 •

edited

Loading

Tame wild white spaces #4467

Tame wild white spaces #4467

Conversation

zurfyx commented May 7, 2023 • edited Loading

W3-inspired white space handling on paste

W3 summary

BR rule

But this PR is not 100% compliant

Performance

Bundle size

Implementation: why on TextNode vs a pre-cleanup?

vercel bot commented May 7, 2023 • edited Loading

github-actions bot commented May 7, 2023 • edited Loading

size-limit report 📦

fantactuka left a comment

Choose a reason for hiding this comment

birtles commented May 24, 2023 • edited Loading

zurfyx commented May 7, 2023 •

edited

Loading

vercel bot commented May 7, 2023 •

edited

Loading

github-actions bot commented May 7, 2023 •

edited

Loading

birtles commented May 24, 2023 •

edited

Loading