Skip to content

fix(editor): preserve U+00A0 non-breaking space (#3037)#7585

Merged
JohnMcLear merged 3 commits intoether:developfrom
JohnMcLear:fix-nbsp-3037
Apr 23, 2026
Merged

fix(editor): preserve U+00A0 non-breaking space (#3037)#7585
JohnMcLear merged 3 commits intoether:developfrom
JohnMcLear:fix-nbsp-3037

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

@JohnMcLear JohnMcLear commented Apr 23, 2026

Summary

Fixes #3037. Non-breaking spaces (U+00A0) typed, pasted, or imported into a pad were silently normalized to ordinary spaces at every ingestion point, so they never reached the changeset. The bug broke French typography rules and prevented authors from gluing words against line-wrap.

The earlier attempt (#4177) only touched the display path (domline.processSpaces); by the time that code runs, the nbsp has already been destroyed upstream, so it couldn't help. This change fixes the four ingestion-side strip sites and teaches both processSpaces functions to handle nbsp correctly on the display/export side.

Changes

Ingestion fixes — removed .replace(/\xa0/g, ' ') from:

  • src/node/db/Pad.ts — server-side cleanText (hit by spliceText, default pad content, HTML import)
  • src/static/js/contentcollector.tstextify for text pulled out of the DOM
  • src/static/js/ace2_inner.ts — the editor-side textify
  • src/static/js/ace2_inner.ts — removed \xa0 from the importText(dontProcess=true) guard regex

Display/export fixes — taught processSpaces to tokenize U+00A0 separately, emit it verbatim as  , and treat it as content (not whitespace) for the run-collapse bookkeeping so adjacent regular-space runs aren't miscounted:

  • src/static/js/domline.ts (live editor rendering)
  • src/node/utils/ExportHtml.ts (HTML export)

Why the changeset layer is safe

The changeset wire format parses only the op list (before $) with a regex; the text bank after $ is read by numeric offset/length via StringIterator. U+00A0 is a single BMP code unit, so makeSplice, opsFromText, and pack/unpack all handle it transparently. No storage/migration needed.

All supported DB backends accept U+00A0: MySQL defaults to utf8mb4, Postgres uses text, MSSQL uses NTEXT, others serialize as UTF-8 JSON. Existing pads already store Cyrillic, CJK, and emoji — nbsp is strictly simpler.

Tests

  • src/tests/backend/specs/Pad.ts — new cleanText cases + spliceText/setText round-trip tests for U+00A0.
  • src/tests/backend/specs/contentcollector.ts — updated 7 tests whose names said "preserved" but whose assertions encoded the old nbsp→space collapse; they now assert real preservation.
  • src/tests/backend/specs/api/importexport.ts — updated 7 end-to-end tests (both wantText and wantHTML) to reflect genuine nbsp round-tripping through rehype + contentcollector + ExportHtml.

Manual verification

Confirmed in Firefox: clipboard containing U+00A0 → paste into pad → getText returns c2 a0, getHTML returns 100 km.

Test plan

  • mocha tests/backend/specs/Pad.ts tests/backend/specs/contentcollector.ts — 128 passing
  • mocha tests/backend/specs/api/importexport.ts — 108 passing
  • Full backend suite — 792 passing, 0 failing
  • tsc --noEmit — clean in our code (pre-existing errors in plugin_packages/zod,ip-address only)
  • Manual nbsp paste round-trip in Firefox
  • Manual nbsp paste round-trip in Chrome (may be affected by chromium #346096 paste normalizer — follow-up if so)
  • Visual wrap-behavior check at narrow viewport

Out of scope (follow-ups)

  • A dedicated keybinding (e.g. Ctrl+Shift+Space) for reliable nbsp entry, since Shift+Space isn't consistent across browsers/OSes.
  • Paste-event handler to preserve nbsp via text/plain clipboard data if the Chromium paste bug is still present.

🤖 Generated with Claude Code

Non-breaking spaces were silently normalized to regular spaces at every
ingestion point, so typed/pasted/imported nbsps never reached the
changeset and users could not glue words against line-wrap in French or
other languages that require nbsp typography.

Removed the four strip sites that replaced U+00A0 with U+0020:
  - src/node/db/Pad.ts cleanText
  - src/static/js/contentcollector.ts textify
  - src/static/js/ace2_inner.ts textify
  - src/static/js/ace2_inner.ts importText raw-text guard

Updated both processSpaces functions (domline and ExportHtml) to tokenize
U+00A0 as a separate unit, emit it verbatim as  , and treat it as
content (not whitespace) for the run-collapse bookkeeping so adjacent
regular-space runs aren't miscounted.

Added backend round-trip tests for spliceText and setText, and extended
the cleanText case table. Updated the existing contentcollector and
importexport specs whose expectations encoded the previous buggy
behavior; they now assert genuine nbsp preservation.

Verified manually in Firefox: clipboard U+00A0 → paste → pad → getText
returns c2 a0; getHTML emits `100 km`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

/review

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Preserve U+00A0 non-breaking space through ingestion and display pipeline

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Remove U+00A0 (non-breaking space) normalization at four ingestion points
  - Pad.cleanText, contentcollector.textify, ace2_inner.textify, ace2_inner.importText
• Update processSpaces functions to tokenize and preserve U+00A0 as &nbsp;
  - Both domline.ts and ExportHtml.ts now treat nbsp as content, not whitespace
• Add backend round-trip tests for spliceText and setText with U+00A0
• Update 14 existing test cases to assert genuine nbsp preservation instead of collapse
Diagram
flowchart LR
  Input["User input/paste/import<br/>with U+00A0"]
  Ingestion["Ingestion layer<br/>cleanText, textify, importText"]
  Changeset["Changeset layer<br/>transparent handling"]
  Display["Display/Export<br/>processSpaces functions"]
  Output["Output<br/>getText/getHTML"]
  
  Input -->|Previously stripped| Ingestion
  Input -->|Now preserved| Ingestion
  Ingestion -->|U+00A0 safe| Changeset
  Changeset -->|Tokenize & emit| Display
  Display -->|&nbsp; in HTML| Output
Loading

Grey Divider

File Changes

1. src/node/db/Pad.ts 🐞 Bug fix +1/-2

Remove U+00A0 normalization from cleanText

src/node/db/Pad.ts


2. src/node/types/PadType.ts ✨ Enhancement +1/-0

Add spliceText method signature to PadType

src/node/types/PadType.ts


3. src/node/utils/ExportHtml.ts 🐞 Bug fix +9/-3

Preserve U+00A0 in HTML export processSpaces

src/node/utils/ExportHtml.ts


View more (6)
4. src/static/js/ace2_inner.ts 🐞 Bug fix +3/-3

Remove U+00A0 stripping from textify and importText

src/static/js/ace2_inner.ts


5. src/static/js/contentcollector.ts 🐞 Bug fix +0/-1

Remove U+00A0 normalization from textify function

src/static/js/contentcollector.ts


6. src/static/js/domline.ts 🐞 Bug fix +8/-2

Preserve U+00A0 in editor rendering processSpaces

src/static/js/domline.ts


7. src/tests/backend/specs/Pad.ts 🧪 Tests +17/-0

Add U+00A0 round-trip tests for spliceText and setText

src/tests/backend/specs/Pad.ts


8. src/tests/backend/specs/api/importexport.ts 🧪 Tests +12/-12

Update 7 import/export tests to assert nbsp preservation

src/tests/backend/specs/api/importexport.ts


9. src/tests/backend/specs/contentcollector.ts 🧪 Tests +7/-7

Update 7 contentcollector tests to assert nbsp preservation

src/tests/backend/specs/contentcollector.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. NBSP lost at boundaries🐞 Bug ≡ Correctness
Description
contentcollector.ts only preserves a NBSP run if it is between non-whitespace characters within the
same DOM text node, so a user-intended NBSP at the start/end of a text node is converted back to a
normal space. This breaks NBSP round-tripping when words are split across spans (e.g.,
formatting/attribute boundaries), despite the intent of #3037.
Code

src/static/js/contentcollector.ts[R90-97]

+          .replace(/[ \u00a0]+/g, (run, offset, src) => {
+            const before = offset > 0 ? src[offset - 1] : '';
+            const after = offset + run.length < src.length
+                ? src[offset + run.length] : '';
+            const pureNbsp = !run.includes(' ');
+            const interiorOfWord = /\S/.test(before) && /\S/.test(after);
+            return pureNbsp && interiorOfWord ? run : ' '.repeat(run.length);
+          })
Evidence
The new preservation logic decides whether a [space/NBSP]+ run is “interior” by looking only at
src[offset-1] and src[offset+run.length], which are within the current string passed to
textify(). But collectContent() calls textify() per individual DOM TEXT_NODE and appends the
result to the current line, so before/after do not see characters in adjacent text nodes.
Meanwhile, editor rendering builds content out of multiple <span> runs and applies processSpaces at
the whole-line HTML level, so NBSPs can naturally occur at the beginning/end of a span’s text node;
those NBSPs will be misclassified as non-interior and normalized to a regular space on readback.

src/static/js/contentcollector.ts[79-99]
src/static/js/contentcollector.ts[350-402]
src/static/js/domline.ts[71-80]
src/static/js/domline.ts[188-219]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The NBSP preservation heuristic in `contentcollector` relies on per-string neighbor characters (`before`/`after`) but `textify()` is applied per DOM text node. If a user-intended NBSP sits at a text-node boundary (e.g., `"100"` in one span and `"\u00a0km"` in the next), the heuristic sees `before === ''` or `after === ''` and incorrectly converts the NBSP to a normal space.
### Issue Context
- `collectContent()` processes each `TEXT_NODE` independently and calls `lines.appendText(textify(txt2), ...)`.
- The editor DOM is built from multiple `<span>` runs and `processSpaces()` runs on the full line HTML, so it is normal for the first character of a span’s text node to be a NBSP.
### Fix Focus Areas
- src/static/js/contentcollector.ts[79-99]
- src/static/js/contentcollector.ts[350-402]
### Suggested fix approach
Move the `[ \u00a0]+` run canonicalization out of `textify()` (per-text-node) and into a post-processing step that runs on the fully assembled line string (or provide `textify` with cross-node context such as the last appended character and the next character to be appended). Ensure the transformation remains length-preserving so selection offsets and attribution lengths stay consistent.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. PadType adds spliceText📘 Rule violation ⚙ Maintainability
Description
The PR adds spliceText to the exported PadType, expanding the Pad API surface for TS/plugin
consumers, but this PR does not include a corresponding documentation update under doc/. This can
leave integrators unaware of the new method and its expected behavior.
Code

src/node/types/PadType.ts[18]

+  spliceText: (start: number, ndel: number, ins: string, authorId?: string)=>Promise<void>,
Evidence
PR Compliance ID 10 requires documentation updates in the same PR for public API changes. The diff
adds the new exported spliceText method to PadType, but there is no accompanying doc update in
doc/ within this PR.

src/node/types/PadType.ts[18-18]
Best Practice: Repository guidelines

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A new exported Pad API method (`spliceText`) was added to `PadType` without updating the project documentation in `doc/`.
## Issue Context
`PadType` is part of the typed interface for Pad objects, which are exposed to server-side hook/plugin code. Per compliance, public API surface changes should be documented in the same PR.
## Fix Focus Areas
- src/node/types/PadType.ts[15-20]
- doc/api/hooks_server-side.md[267-325]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

…d-back

processSpaces is a lossy one-way display transform: leading/trailing
spaces and all-but-the-last of a run get rendered as &nbsp; so HTML
doesn't collapse them. When incorporateUserChanges reads text back from
the DOM, those display-artifact nbsps were being stored in the changeset
model instead of being normalized back to plain spaces.

This broke handleReturnIndentation, whose /^ *(?:)/ regex only matches
ASCII spaces: auto-indent after `foo:\n` produced 4 spaces instead of
the expected prev-indent (2) + THE_TAB (4) = 6, because the previous
line's model had nbsps where it used to have spaces.

Fix: in contentcollector.textify, collapse any [  ]+ run back to
plain spaces UNLESS the run is pure U+00A0 AND strictly interior to
word chars. That preserves user-intended typographic nbsps like
"100 km" while undoing the one-way display transform.

Updated 7 contentcollector tests and 7 importexport tests whose
assertions needed to reflect the new rule (boundary/mixed runs collapse;
pure-interior nbsp runs preserve).

Fixes the Playwright regression in indentation.spec.ts:117 that the
previous commit introduced.
@JohnMcLear
Copy link
Copy Markdown
Member Author

/review

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Persistent review updated to latest commit 52b4411

Comment thread src/static/js/contentcollector.ts Outdated
…er text node

Addresses Qodo code review feedback on PR ether#7585.

## Bug fix — nbsp lost at DOM text-node boundary

The previous approach ran the "collapse display-artifact nbsp" rule inside
textify(), which is called per individual DOM TEXT_NODE. A user-intended
nbsp sitting at a text-node boundary (e.g., <span>100</span><span>&nbsp;km
</span>) was incorrectly seen as non-interior (before === '' for the second
text node) and normalized back to a regular space.

Fix: move the canonicalization out of textify() and run it on each
fully assembled line string inside cc.finish(). The rule remains:

    [  ]+ run  ->  plain spaces
                   UNLESS pure U+00A0 AND strictly interior to non-ws chars

It is length-preserving, so attribute offsets and line lengths are
unaffected.

Added a regression test (contentcollector.spec.ts) for the cross-span
case.

## Docs concern

Reverted the type-only addition of spliceText to PadType. spliceText
is an existing Pad runtime method; the backend test now uses a cast
(`(pad as any).spliceText`) so the PR does not expand the declared
public type surface, avoiding a separate documentation requirement.
@JohnMcLear
Copy link
Copy Markdown
Member Author

/review

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented Apr 23, 2026

Persistent review updated to latest commit ae356a5

@JohnMcLear JohnMcLear merged commit 67e542d into ether:develop Apr 23, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text input not allowing non-breaking space

1 participant