Skip to content

fix(super-converter): bibliography/index/TOA field-code import fidelity (SD-3066)#3565

Open
tupizz wants to merge 1 commit into
tadeu/sd-3005-feature-bibliographyfrom
tadeu/sd-3066-field-code-import
Open

fix(super-converter): bibliography/index/TOA field-code import fidelity (SD-3066)#3565
tupizz wants to merge 1 commit into
tadeu/sd-3005-feature-bibliographyfrom
tadeu/sd-3066-field-code-import

Conversation

@tupizz
Copy link
Copy Markdown
Contributor

@tupizz tupizz commented May 29, 2026

Note

πŸ“š Stacked PRs β€” review bottom-up

Order PR Scope
1️⃣ #3538 Single-paragraph BIBLIOGRAPHY/INDEX/TOA crash (SD-3005)
2️⃣ #3565 super-editor β€” field-code import fidelity + DRY πŸ‘ˆ this PR
3️⃣ #3566 pm-adapter β€” render fields inside documentPartObject

Review order: #3538 β†’ #3565 β†’ #3566. Each PR's base auto-retargets to main as the one below it merges.


Summary

Real Word documents (a marked-citations TOA doc and a native Insert→Bibliography doc) surfaced several import defects in the block field-code family. This fixes them and unifies the shared shape behind one set of helpers (DRY/KISS).

Stacked on #3538. Base will retarget to main once #3538 merges. Rendering of these fields is in the follow-up PR (pm-adapter).

Linear: SD-3066 (parent of SD-3005)

Fixes

  • Multi-run instruction corruption β€” fragments were joined with an injected separator space, mangling instructions Word splits across runs (XE " Building Standard "). Join verbatim.
  • Table of Authorities dropped on import β€” no sd:tableOfAuthorities v2 importer existed, so the node was silently discarded. Registered tableOfAuthoritiesImporter alongside index/bibliography.
  • Crash on nested-SDT bibliography β€” a content control wrapping a block field imported as inline structuredContent; inside a block-only documentPartObject this threw "Invalid content for node type documentPartObject" and the editor failed to mount. Classify an SDT whose content is a block field as structuredContentBlock.
  • Bibliography token round-trip β€” bibliography neither captured nor replayed instructionTokens (unlike index/toa), so split BIBLIOGRAPHY instructions collapsed on export. Wired through preprocessor β†’ extension attr β†’ encode/decode.

Refactors (DRY/KISS)

  • buildBlockFieldNode β€” shared by the bibliography/index/toa preprocessors.
  • wrapParagraphsAsComplexField β€” shared by their translator decoders (mirrors the existing inline-field helpers).
  • BLOCK_FIELD_XML_NAMES β€” one source of truth for the paragraph importer + SDT classifier.

Test plan

  • REDβ†’GREEN unit tests for each fix + the new shared helpers
  • pnpm --filter super-editor exec vitest run src/editors/v1/core/super-converter β€” 2990 pass
  • Browser-verified: the TOA doc and the bibliography doc now import without crashing

…ty (SD-3066)

Real Word documents surfaced several import defects in the block field-code
family (BIBLIOGRAPHY, INDEX, XE, TOA). This addresses them and unifies the
shared shape behind one set of helpers.

Fixes:
- Multi-run instruction aggregation joined fragments with an injected separator
  space, corrupting instructions Word splits across runs (e.g.
  `XE " Building Standard "`). Join verbatim; literal spacing is preserved.
- Table of Authorities was dropped on import: the v2 importer had no
  `sd:tableOfAuthorities` handler, so the node was silently discarded. Register
  tableOfAuthoritiesImporter alongside index/bibliography.
- A content control wrapping a block field imported as an inline
  `structuredContent` node; inside a block-only documentPartObject this threw
  "Invalid content for node type documentPartObject" and the editor failed to
  mount. Classify an SDT whose content is a block field as
  structuredContentBlock.
- Bibliography neither captured nor replayed instructionTokens (unlike
  index/toa), so split BIBLIOGRAPHY instructions did not round-trip. Add the
  attribute and wire it through preprocessor, encode and decode.

Refactors (DRY/KISS):
- Extract buildBlockFieldNode (shared by the bibliography/index/toa
  preprocessors) and wrapParagraphsAsComplexField (shared by their translator
  decoders), mirroring the existing inline-field helpers.
- Centralize BLOCK_FIELD_XML_NAMES so the paragraph importer and SDT classifier
  agree on which sd:* nodes are block content.

Adds RED→GREEN unit tests for each fix and the new shared helpers. Full
super-converter suite passes (2990).

Linear: SD-3066 (parent of SD-3005)
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 29, 2026

SD-3066

@github-actions
Copy link
Copy Markdown
Contributor

The ecma-spec MCP tools require permission grants that are being declined in this session, so I couldn't run the live spec lookups. I reviewed the OOXML surface against ECMA-376 from knowledge instead β€” and importantly, this PR's spec-relevant footprint is small, since most of the churn is in SuperDoc's internal sd:* representation (not OOXML).

Status: PASS

Here's what I checked and why it's clean:

Real OOXML elements/attributes touched β€” all valid and used correctly:

  • w:fldChar with w:fldCharType set to begin / separate / end (in build-block-field-paragraphs.js and the translators). These are exactly the three ST_FldCharType values, and the begin β†’ instruction β†’ separate β†’ result β†’ end ordering produced by wrapParagraphsAsComplexField is the correct complex-field structure (spec).
  • w:fldSimple with its w:instr attribute (tableOfAuthoritiesImporter.test.js / new fldSimple INDEX test). w:instr is the required attribute on CT_SimpleField and it's present (spec).
  • w:instrText carrying field code text, including xml:space="preserve" on split instruction runs β€” correct, since the SD-3066 fix relies on preserving literal whitespace that Word stores inside each run (spec).
  • w:r, w:p, w:pPr, w:t, w:tab β€” all used in their normal roles; begin/separate are spliced after w:pPr, which respects paragraph content ordering.

Field codes / switches referenced in tests are all real Word fields and valid switches: BIBLIOGRAPHY \l, INDEX \c, TOA \h \c \p, XE, HYPERLINK (spec).

The sd:* elements (sd:bibliography, sd:index, sd:tableOfAuthorities, sd:tableOfContents, sd:indexEntry) and the instruction / instructionTokens attributes are SuperDoc's intermediate representation, not OOXML β€” so there's nothing for the spec to constrain there. They're consumed by the preprocessors/translators and never emitted to the .docx.

No non-existent OOXML elements/attributes, no missing required attributes (notably w:fldSimple/@w:instr and w:fldChar/@w:fldCharType are always set), and no incorrect defaults.

One caveat for transparency: I'd recommend re-running this with the ecma-spec tools enabled if you want the schema-graph confirmation on record β€” my pass is based on ECMA-376 knowledge, not a live lookup this session. But the changes are a refactor + round-trip-fidelity fix, and the OOXML they generate is structurally sound.

@tupizz tupizz marked this pull request as ready for review May 29, 2026 14:53
@tupizz tupizz requested a review from a team as a code owner May 29, 2026 14:53
@tupizz tupizz self-assigned this May 29, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 24 files

Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant