fix(editor): preserve U+00A0 non-breaking space (#3037) by JohnMcLear · Pull Request #7585 · ether/etherpad

JohnMcLear · 2026-04-23T05:57:52Z

Summary

Fixes #3037. Non-breaking spaces (U+00A0) typed, pasted, or imported into a pad were silently normalized to ordinary spaces at every ingestion point, so they never reached the changeset. The bug broke French typography rules and prevented authors from gluing words against line-wrap.

The earlier attempt (#4177) only touched the display path (domline.processSpaces); by the time that code runs, the nbsp has already been destroyed upstream, so it couldn't help. This change fixes the four ingestion-side strip sites and teaches both processSpaces functions to handle nbsp correctly on the display/export side.

Changes

Ingestion fixes — removed .replace(/\xa0/g, ' ') from:

src/node/db/Pad.ts — server-side cleanText (hit by spliceText, default pad content, HTML import)
src/static/js/contentcollector.ts — textify for text pulled out of the DOM
src/static/js/ace2_inner.ts — the editor-side textify
src/static/js/ace2_inner.ts — removed \xa0 from the importText(dontProcess=true) guard regex

Display/export fixes — taught processSpaces to tokenize U+00A0 separately, emit it verbatim as  , and treat it as content (not whitespace) for the run-collapse bookkeeping so adjacent regular-space runs aren't miscounted:

src/static/js/domline.ts (live editor rendering)
src/node/utils/ExportHtml.ts (HTML export)

Why the changeset layer is safe

The changeset wire format parses only the op list (before $) with a regex; the text bank after $ is read by numeric offset/length via StringIterator. U+00A0 is a single BMP code unit, so makeSplice, opsFromText, and pack/unpack all handle it transparently. No storage/migration needed.

All supported DB backends accept U+00A0: MySQL defaults to utf8mb4, Postgres uses text, MSSQL uses NTEXT, others serialize as UTF-8 JSON. Existing pads already store Cyrillic, CJK, and emoji — nbsp is strictly simpler.

Tests

src/tests/backend/specs/Pad.ts — new cleanText cases + spliceText/setText round-trip tests for U+00A0.
src/tests/backend/specs/contentcollector.ts — updated 7 tests whose names said "preserved" but whose assertions encoded the old nbsp→space collapse; they now assert real preservation.
src/tests/backend/specs/api/importexport.ts — updated 7 end-to-end tests (both wantText and wantHTML) to reflect genuine nbsp round-tripping through rehype + contentcollector + ExportHtml.

Manual verification

Confirmed in Firefox: clipboard containing U+00A0 → paste into pad → getText returns c2 a0, getHTML returns 100 km.

Test plan

mocha tests/backend/specs/Pad.ts tests/backend/specs/contentcollector.ts — 128 passing
mocha tests/backend/specs/api/importexport.ts — 108 passing
Full backend suite — 792 passing, 0 failing
tsc --noEmit — clean in our code (pre-existing errors in plugin_packages/zod,ip-address only)
Manual nbsp paste round-trip in Firefox
Manual nbsp paste round-trip in Chrome (may be affected by chromium #346096 paste normalizer — follow-up if so)
Visual wrap-behavior check at narrow viewport

Out of scope (follow-ups)

A dedicated keybinding (e.g. Ctrl+Shift+Space) for reliable nbsp entry, since Shift+Space isn't consistent across browsers/OSes.
Paste-event handler to preserve nbsp via text/plain clipboard data if the Chromium paste bug is still present.

🤖 Generated with Claude Code

Non-breaking spaces were silently normalized to regular spaces at every ingestion point, so typed/pasted/imported nbsps never reached the changeset and users could not glue words against line-wrap in French or other languages that require nbsp typography. Removed the four strip sites that replaced U+00A0 with U+0020: - src/node/db/Pad.ts cleanText - src/static/js/contentcollector.ts textify - src/static/js/ace2_inner.ts textify - src/static/js/ace2_inner.ts importText raw-text guard Updated both processSpaces functions (domline and ExportHtml) to tokenize U+00A0 as a separate unit, emit it verbatim as  , and treat it as content (not whitespace) for the run-collapse bookkeeping so adjacent regular-space runs aren't miscounted. Added backend round-trip tests for spliceText and setText, and extended the cleanText case table. Updated the existing contentcollector and importexport specs whose expectations encoded the previous buggy behavior; they now assert genuine nbsp preservation. Verified manually in Firefox: clipboard U+00A0 → paste → pad → getText returns c2 a0; getHTML emits `100 km`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

JohnMcLear · 2026-04-23T05:58:03Z

/review

qodo-free-for-open-source-projects · 2026-04-23T05:58:08Z

Review Summary by Qodo

Preserve U+00A0 non-breaking space through ingestion and display pipeline

🐞 Bug fix

Walkthroughs

Description

• Remove U+00A0 (non-breaking space) normalization at four ingestion points
  - Pad.cleanText, contentcollector.textify, ace2_inner.textify, ace2_inner.importText
• Update processSpaces functions to tokenize and preserve U+00A0 as &nbsp;
  - Both domline.ts and ExportHtml.ts now treat nbsp as content, not whitespace
• Add backend round-trip tests for spliceText and setText with U+00A0
• Update 14 existing test cases to assert genuine nbsp preservation instead of collapse

Diagram

flowchart LR
  Input["User input/paste/import<br/>with U+00A0"]
  Ingestion["Ingestion layer<br/>cleanText, textify, importText"]
  Changeset["Changeset layer<br/>transparent handling"]
  Display["Display/Export<br/>processSpaces functions"]
  Output["Output<br/>getText/getHTML"]
  
  Input -->|Previously stripped| Ingestion
  Input -->|Now preserved| Ingestion
  Ingestion -->|U+00A0 safe| Changeset
  Changeset -->|Tokenize & emit| Display
  Display -->|&nbsp; in HTML| Output

File Changes

1. src/node/db/Pad.ts 🐞 Bug fix +1/-2

Remove U+00A0 normalization from cleanText

src/node/db/Pad.ts

2. src/node/types/PadType.ts ✨ Enhancement +1/-0

Add spliceText method signature to PadType

src/node/types/PadType.ts

3. src/node/utils/ExportHtml.ts 🐞 Bug fix +9/-3

Preserve U+00A0 in HTML export processSpaces

src/node/utils/ExportHtml.ts

View more (6)

4. src/static/js/ace2_inner.ts 🐞 Bug fix +3/-3

Remove U+00A0 stripping from textify and importText

src/static/js/ace2_inner.ts

5. src/static/js/contentcollector.ts 🐞 Bug fix +0/-1

Remove U+00A0 normalization from textify function

src/static/js/contentcollector.ts

6. src/static/js/domline.ts 🐞 Bug fix +8/-2

Preserve U+00A0 in editor rendering processSpaces

src/static/js/domline.ts

7. src/tests/backend/specs/Pad.ts 🧪 Tests +17/-0

Add U+00A0 round-trip tests for spliceText and setText

src/tests/backend/specs/Pad.ts

8. src/tests/backend/specs/api/importexport.ts 🧪 Tests +12/-12

Update 7 import/export tests to assert nbsp preservation

src/tests/backend/specs/api/importexport.ts

9. src/tests/backend/specs/contentcollector.ts 🧪 Tests +7/-7

Update 7 contentcollector tests to assert nbsp preservation

src/tests/backend/specs/contentcollector.ts

qodo-free-for-open-source-projects · 2026-04-23T05:58:09Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~NBSP lost at boundaries~~ ☑ 🐞 Bug ≡ Correctness

Description

contentcollector.ts only preserves a NBSP run if it is between non-whitespace characters within the
same DOM text node, so a user-intended NBSP at the start/end of a text node is converted back to a
normal space. This breaks NBSP round-tripping when words are split across spans (e.g.,
formatting/attribute boundaries), despite the intent of #3037.

Code

src/static/js/contentcollector.ts[R90-97]

+          .replace(/[ \u00a0]+/g, (run, offset, src) => {
+            const before = offset > 0 ? src[offset - 1] : '';
+            const after = offset + run.length < src.length
+                ? src[offset + run.length] : '';
+            const pureNbsp = !run.includes(' ');
+            const interiorOfWord = /\S/.test(before) && /\S/.test(after);
+            return pureNbsp && interiorOfWord ? run : ' '.repeat(run.length);
+          })

Evidence

The new preservation logic decides whether a [space/NBSP]+ run is “interior” by looking only at
src[offset-1] and src[offset+run.length], which are within the current string passed to
textify(). But collectContent() calls textify() per individual DOM TEXT_NODE and appends the
result to the current line, so before/after do not see characters in adjacent text nodes.
Meanwhile, editor rendering builds content out of multiple <span> runs and applies processSpaces at
the whole-line HTML level, so NBSPs can naturally occur at the beginning/end of a span’s text node;
those NBSPs will be misclassified as non-interior and normalized to a regular space on readback.

src/static/js/contentcollector.ts[79-99]
src/static/js/contentcollector.ts[350-402]
src/static/js/domline.ts[71-80]
src/static/js/domline.ts[188-219]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The NBSP preservation heuristic in `contentcollector` relies on per-string neighbor characters (`before`/`after`) but `textify()` is applied per DOM text node. If a user-intended NBSP sits at a text-node boundary (e.g., `"100"` in one span and `"\u00a0km"` in the next), the heuristic sees `before === ''` or `after === ''` and incorrectly converts the NBSP to a normal space.
### Issue Context
- `collectContent()` processes each `TEXT_NODE` independently and calls `lines.appendText(textify(txt2), ...)`.
- The editor DOM is built from multiple `<span>` runs and `processSpaces()` runs on the full line HTML, so it is normal for the first character of a span’s text node to be a NBSP.
### Fix Focus Areas
- src/static/js/contentcollector.ts[79-99]
- src/static/js/contentcollector.ts[350-402]
### Suggested fix approach
Move the `[ \u00a0]+` run canonicalization out of `textify()` (per-text-node) and into a post-processing step that runs on the fully assembled line string (or provide `textify` with cross-node context such as the last appended character and the next character to be appended). Ensure the transformation remains length-preserving so selection offsets and attribution lengths stay consistent.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~PadType adds spliceText~~ ☑ 📘 Rule violation ⚙ Maintainability

Description

The PR adds spliceText to the exported PadType, expanding the Pad API surface for TS/plugin
consumers, but this PR does not include a corresponding documentation update under doc/. This can
leave integrators unaware of the new method and its expected behavior.

Code

src/node/types/PadType.ts[18]

+  spliceText: (start: number, ndel: number, ins: string, authorId?: string)=>Promise<void>,

Evidence

PR Compliance ID 10 requires documentation updates in the same PR for public API changes. The diff
adds the new exported spliceText method to PadType, but there is no accompanying doc update in
doc/ within this PR.

src/node/types/PadType.ts[18-18]
Best Practice: Repository guidelines

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A new exported Pad API method (`spliceText`) was added to `PadType` without updating the project documentation in `doc/`.
## Issue Context
`PadType` is part of the typed interface for Pad objects, which are exposed to server-side hook/plugin code. Per compliance, public API surface changes should be documented in the same PR.
## Fix Focus Areas
- src/node/types/PadType.ts[15-20]
- doc/api/hooks_server-side.md[267-325]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-23T05:58:09Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

…d-back processSpaces is a lossy one-way display transform: leading/trailing spaces and all-but-the-last of a run get rendered as   so HTML doesn't collapse them. When incorporateUserChanges reads text back from the DOM, those display-artifact nbsps were being stored in the changeset model instead of being normalized back to plain spaces. This broke handleReturnIndentation, whose /^ *(?:)/ regex only matches ASCII spaces: auto-indent after `foo:\n` produced 4 spaces instead of the expected prev-indent (2) + THE_TAB (4) = 6, because the previous line's model had nbsps where it used to have spaces. Fix: in contentcollector.textify, collapse any [ ]+ run back to plain spaces UNLESS the run is pure U+00A0 AND strictly interior to word chars. That preserves user-intended typographic nbsps like "100 km" while undoing the one-way display transform. Updated 7 contentcollector tests and 7 importexport tests whose assertions needed to reflect the new rule (boundary/mixed runs collapse; pure-interior nbsp runs preserve). Fixes the Playwright regression in indentation.spec.ts:117 that the previous commit introduced.

JohnMcLear · 2026-04-23T06:49:38Z

/review

qodo-free-for-open-source-projects · 2026-04-23T06:49:43Z

Persistent review updated to latest commit 52b4411

…er text node Addresses Qodo code review feedback on PR ether#7585. ## Bug fix — nbsp lost at DOM text-node boundary The previous approach ran the "collapse display-artifact nbsp" rule inside textify(), which is called per individual DOM TEXT_NODE. A user-intended nbsp sitting at a text-node boundary (e.g., <span>100</span><span> km </span>) was incorrectly seen as non-interior (before === '' for the second text node) and normalized back to a regular space. Fix: move the canonicalization out of textify() and run it on each fully assembled line string inside cc.finish(). The rule remains: [ ]+ run -> plain spaces UNLESS pure U+00A0 AND strictly interior to non-ws chars It is length-preserving, so attribute offsets and line lengths are unaffected. Added a regression test (contentcollector.spec.ts) for the cross-span case. ## Docs concern Reverted the type-only addition of spliceText to PadType. spliceText is an existing Pad runtime method; the backend test now uses a cast (`(pad as any).spliceText`) so the PR does not expand the declared public type surface, avoiding a separate documentation requirement.

JohnMcLear · 2026-04-23T07:26:37Z

/review

qodo-free-for-open-source-projects · 2026-04-23T07:26:44Z

Persistent review updated to latest commit ae356a5

qodo-free-for-open-source-projects Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread src/static/js/contentcollector.ts Outdated

JohnMcLear merged commit 67e542d into ether:develop Apr 23, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(editor): preserve U+00A0 non-breaking space (#3037)#7585

fix(editor): preserve U+00A0 non-breaking space (#3037)#7585
JohnMcLear merged 3 commits intoether:developfrom
JohnMcLear:fix-nbsp-3037

JohnMcLear commented Apr 23, 2026 •

edited

Loading

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JohnMcLear commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why the changeset layer is safe

Tests

Manual verification

Test plan

Out of scope (follow-ups)

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Great, no issues found!

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JohnMcLear commented Apr 23, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JohnMcLear commented Apr 23, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Apr 23, 2026 •

edited

Loading