Skip to content

fix(super-editor): preserve generated line breaks in DOCX export#3630

Merged
caio-pizzol merged 4 commits into
mainfrom
caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word
Jun 5, 2026
Merged

fix(super-editor): preserve generated line breaks in DOCX export#3630
caio-pizzol merged 4 commits into
mainfrom
caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word

Conversation

@caio-pizzol
Copy link
Copy Markdown
Contributor

@caio-pizzol caio-pizzol commented Jun 4, 2026

Fixes SD-3278.

Generated multiline text could look correct in SuperDoc but export as raw newlines inside <w:t>, which Word and LibreOffice do not treat as manual line breaks.

This change:

  • converts generated \n / \r\n / \r text into soft line break nodes
  • exports any remaining raw newlines as <w:br/>
  • keeps structural lineBreak edits as soft breaks, not page breaks
  • preserves deleted tracked text as <w:delText> when runs are split around breaks

This also updates the read-side text model for lineBreak nodes. The export fix now creates real lineBreak nodes instead of raw \n text, so search/query/rewrite paths must read those nodes back as \n. Without that, query.match('Alpha\nBeta') and idempotent rewrites over generated multiline text would regress. That is why this PR also touches the search index, the doc-api text resolver, and the rewrite diff/offset accounting.

Follow-ups are tracked separately for import-side normalization and tracked inserted breaks inside <w:ins>.

@linear-code
Copy link
Copy Markdown

linear-code Bot commented Jun 4, 2026

SD-3278

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

The ecma-spec tools are unavailable to me this session (permission denied on every call), so I verified against my knowledge of ECMA-376 Part 1 rather than live schema lookups. Flagging that up front so you can re-run with the spec tools if you want the schema citations confirmed. That said, the elements/attributes touched here are well-established and the diff lines up cleanly with the spec.

Status: PASS

Here's what I checked and why it holds:

w:tw:delText rename inside w:del (del-translator.js:104-121)
w:delText (ECMA-376 §17.3.3.7) is the correct deleted-text counterpart to w:t, and it's the element required for text content inside a <w:del>. Renaming every direct w:t (not just the first) is the right call: w:r's content model (EG_RunInnerContent) permits any number of run-content children in any order, so <w:delText>Alpha</w:delText><w:br/><w:delText>Beta</w:delText> is valid, and a leftover <w:t> inside <w:del> would indeed not be treated as deleted. Leaving w:br/w:tab/w:noBreakHyphen untouched is also correct — the <w:del> wrapper conveys the deletion; those structural atoms have no "deleted" variant. Attributes (e.g. xml:space) carry over fine since w:delText is also CT_Text.

w:br as a soft line break (translate-text-node.js:62)
w:br with no w:type defaults to textWrapping (ST_BrType), i.e. a soft line break — exactly the intent. The test asserting w:type is absent (rather than emitting w:type="page", which is hardBreak) is spec-correct. w:br is valid as a direct child of w:r interleaved with w:t.

xml:space="preserve" on segment w:t (translate-text-node.js:75)
Valid — xml:space is the lone attribute on CT_Text, and gating it on edge-whitespace segments is appropriate. Word collapses leading/trailing whitespace without it, so this is correct preservation behavior.

lineBreak (→ <w:br/>) vs hardBreak (→ <w:br w:type="page"/>) (node-materializer.ts:920)
Preferring lineBreak for a kind: 'lineBreak' item is right — using hardBreak would emit page breaks (w:type="page"), which is a different element semantic. Good catch on the original bug.

One non-blocking note (not a spec violation): the split path replaces the original nodeAttrs with only the xml:space logic. Since CT_Text carries no other attributes, that's harmless — just calling it out so it doesn't surprise anyone expecting other attrs to survive.

If you'd like the schema citations hardened, re-run with the ecma-spec tools authorized and I'll confirm EG_RunInnerContent's child set and ST_BrType's default directly against the XSD.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from 0a2be53 to 8a8d6f5 Compare June 4, 2026 01:13
@caio-pizzol caio-pizzol marked this pull request as ready for review June 4, 2026 01:20
@caio-pizzol caio-pizzol requested a review from a team as a code owner June 4, 2026 01:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a8d6f5379

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from 8a8d6f5 to fa210d4 Compare June 4, 2026 01:37
@caio-pizzol caio-pizzol marked this pull request as draft June 5, 2026 15:41
@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from fa210d4 to 65bf8e3 Compare June 5, 2026 15:58
@caio-pizzol caio-pizzol marked this pull request as ready for review June 5, 2026 17:57
@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from 65bf8e3 to 8f69e54 Compare June 5, 2026 18:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65bf8e3f53

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from 8f69e54 to 7f9c047 Compare June 5, 2026 18:13
…3278)

Multi-line text in text-mode mutations stored newlines as a raw \n inside
one <w:t>, which Word collapses while SuperDoc renders a break. Convert
newlines to lineBreak nodes at creation, split any residual raw newline
into <w:t>/<w:br/> on export, and make the read model agree that a
lineBreak reads as \n so rewrite/search/query stay consistent. Serializes
as a Word-native <w:br/> (ECMA-376 17.3.3.1).

- buildTextWithTabs: normalize \n, \r\n, \r to lineBreak nodes, gated on
  parent admission (probed per edit position) for text*-only parents
- materializeLineBreak: prefer lineBreak over hardBreak (soft, not page)
- getTextNodeForExport: split residual raw newline into <w:t>/<w:br/>
- del-translator: rename every <w:t> in a split run to <w:delText>
- lineBreak.leafText = '\n' so textBetweenWithTabs / charOffsetToDocPos /
  text-offset-resolver read a break as \n; idempotent rewrite no longer
  duplicates it, a rewrite to single-line text removes it
- SearchIndex honors leafText, and a single hit spanning text+lineBreak+
  text coalesces to one contiguous range so query.match('Alpha\nBeta')
  works (block separators still split; D5 guard intact)
- list paragraph beforeinput removes the placeholder break when text is
  typed; visible text models skip tracked-deleted leaf nodes
@caio-pizzol caio-pizzol force-pushed the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch from 7f9c047 to a58352c Compare June 5, 2026 19:07
… (SD-3278)

Typing into a list item that holds only a placeholder break dropped the
caret before the first inserted character, so subsequent native
keystrokes prepended instead of appended ("abcdef" landed as "bcdefa").
Move the selection past the inserted text after the delete+insert.
…s (SD-3278)

Coalesce adjacent search segments only when they are both offset-contiguous
(same hit) and document-adjacent (segment.docFrom === current.to). This
merges text + lineBreak + text within one run into a single range without
bridging a skipped/tracked-deleted leaf or a run boundary, so the
downstream D5 contiguity guard still rejects genuinely separate edits.
The span-rewrite path got the same parentAllowsLineBreak probe as the
rewrite/insert paths but had no newline test, though its comment claimed
coverage. Add two cases: a single '\n' in a normal parent mints one
lineBreak (no hardBreak, no raw newline text node), and the same into a
text*-only total-page-number falls back to literal text with no lineBreak.
@caio-pizzol caio-pizzol merged commit 7f1c984 into main Jun 5, 2026
77 of 79 checks passed
@caio-pizzol caio-pizzol deleted the caio/sd-3278-docx-export-collapses-generated-line-breaks-in-word branch June 5, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants