-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tame wild white spaces #4467
Tame wild white spaces #4467
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
size-limit report 📦
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫡
@zurfyx I hope you don't mind me asking here before filing a new issue. v0.11.0 is breaking some of our copy-paste roundtrip tests where we're using As far as I can tell, the spec you pointed to here doesn't allow collapsing non-breaking space characters. Specifically it has:
I couldn't find anywhere that allows collapsing U+00A0. Looking at the patch, it seems to be using the textContent = textContent
.replace(/\r?\n|\t/gm, ' ')
.replace('\r', '')
.replace(/\s+/g, ' '); However, Is that intended, or shall I file a bug for this? |
W3-inspired white space handling on paste
This PR attempts to resolve for once and for all the commonly repeated issues around spacing on paste. Turns out there's a well-defined spec (https://www.w3.org/TR/css-text-3/#white-space-processing) for what seemed to be wild-west across various editor implementations.
case1.mov
Screen.Recording.2023-05-07.at.2.46.13.pm.mov
W3 summary
https://www.w3.org/TR/css-text-3/#white-space-processing
When white-space is
pre
or similar variants, we don't do any special processing, we assume the incoming text reads as the rendered copy. This is the case for Apple Notes, Chrome, Safari and GDoc.Otherwise, we transform the text according to collapsible rules. This applies to MS Word, Quip and Notion. Firefox fails W3 rules but it also falls under this bucket.
At a high level, collapsibles mean removing redundant spaces/LF/CRLF/tab:
The current implementation respects the historical space transform listed in W3 (instead of removal).
BR rule
This PR also touches BR. The spec (https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-br-element) refers to a "single" element but I'm fairly confident this is after the collapsibles.
But this PR is not 100% compliant
There is quite a lot CSS to understand to make it 100% compliant (and some of these properties even go beyond what Lexical supports at this point) but the implemented should work in most cases.
For example,
pre
,pre-wrap
andpre-line
differences, understanding the tab-size to determine how it should render, display flex and table, or dropping Space Marks. I didn't prioritize any of the aforementioned because no other editor supports any these, they just handle them gracefully.Performance
When these rules apply (bucket 2 above), Lexical obviously runs slower as it has to traverse backward and forward accordingly, slice and compute styles.
Pasting the Moby Dick test from Apple Notes results in a 10% regression, from 1.53 to 1.68s on a 2019 MBP.Edit: dropped
getComputedStyles
as we don't expect clipboard users to make use of StyleSheets. And, fixed cache.Pasting the Moby Dick test from Apple Notes results in a 0.7% regression, from 4.27s to 4.30s on a M1 with x6 slowdown.
Bundle size
Arguably this increases the size of TextNode considerably, but eventually (as discussed with @acywatson) all HTML code will be decoupled from Core. Even without this PR, this would have a drastic bundle size decrease for our plain text users.
Implementation: why on TextNode vs a pre-cleanup?
I've discussed this one offline with @fantactuka but ultimately this one is very opinionated. The alternative to this PR would be a cleanup process before passing the DOM Nodes to the LexicalNode importDOM.
The pre-cleanup assumes that the incoming HTML is bad, but it's actually valid correct HTML, just not in the shape we like. We can add this one-off to handle spacing, but then there will be more, ultimately leading to allowing an array of cleanup steps into the plugin when there was no need to do this in the first place.
Instead, the Nodes themselves can take responsibility just like they are doing now. This might not be as optimal as the one-pass approach but the implementation is still fast considering we never look beyond inlines and that, in most cases, they won't be filled with fragmented white spaces, so average case will still be O(N).
Other relevant PRs:
Fixes #4466
Fixes #3677
Fixes #4370