Skip to content

Filter out invisible unicode characters from text segments#3344

Merged
JiuqingSong merged 6 commits into
masterfrom
u/jisong/filterinvisibleunicode
May 28, 2026
Merged

Filter out invisible unicode characters from text segments#3344
JiuqingSong merged 6 commits into
masterfrom
u/jisong/filterinvisibleunicode

Conversation

@JiuqingSong
Copy link
Copy Markdown
Collaborator

Summary

  • Strip invisible Unicode tag characters (U+E0000–U+EFFFF) inside createText so they cannot survive paste/DOM-to-model conversion. These characters are used to hide instructions/text inside HTML (see https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/) and otherwise leak into the model as normal text.
  • Meaningful invisible characters that fall outside that range (e.g. ZWSP U+200B, ZWJ U+200D, RLO U+202E, PDF U+202C) are preserved.
  • Unit tests in creatorsTest.ts cover mixed/boundary/only-invisible inputs and confirm meaningful invisible chars are untouched. An end-to-end test in endToEndTest.ts verifies a full DOM → Model → DOM/text round-trip strips only the tag range.

Test plan

  • yarn test:fast --testPathPattern=creatorsTest
  • yarn test:fast --testPathPattern=endToEndTest

🤖 Generated with Claude Code

Comment thread packages/roosterjs-content-model-dom/lib/modelApi/creators/createText.ts Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 25, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://microsoft.github.io/roosterjs/pr-preview/pr-3344/

Built to branch gh-pages at 2026-05-28 21:09 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

JiuqingSong and others added 3 commits May 28, 2026 09:57
…mental feature

Move the invisible unicode character stripping logic from createText (always-on)
to addTextSegment, gated by the new 'FilterInvisibleUnicode' experimental feature.
This ensures the behavior only activates when explicitly enabled via EditorOptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JiuqingSong JiuqingSong merged commit 2a95926 into master May 28, 2026
8 checks passed
@JiuqingSong JiuqingSong deleted the u/jisong/filterinvisibleunicode branch May 28, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants