fix(math): split multi-char m:r text per character to match Word (SD-2632) by caio-pizzol · Pull Request #2875 · superdoc-dev/superdoc

caio-pizzol · 2026-04-20T20:56:56Z

Word's OMML→MathML converter classifies each character in an m:r run individually — consecutive digits group into one <mn>, each operator character becomes its own <mo>, everything else becomes its own <mi>. SuperDoc was emitting the full run text as a single element, so →∞ came out as <mi>→∞</mi> and lost operator spacing and semantics.

tokenizeMathText does per-char classification with digit grouping (one optional decimal point). convertMathRun returns a DocumentFragment when a run splits, so siblings flow directly into the parent <mrow> with no extra wrapper.
m:fName is the documented exception — multi-letter function names like sin/lim stay whole. convertFunction routes m:r children through a new convertMathRunWhole helper, and collapseFunctionNameBases re-merges the base slot of any structural element Word nested inside m:fName (e.g. m:limLow → <munder>).
MathObjectConverter's return widens from Element|null to Node|null so the fragment return works without casting. All 19 other converters already return Element, which is assignable.
Drops U+221E (∞) from OPERATOR_CHARS — it's a constant per Word, not an operator.

Verified against Word's own OMML2MML.XSL across 15+ input shapes — byte-identical output for bare math, and matching whole-name behavior inside m:fName.

Rejected: narrowing the split to "only when the run contains an operator" (Option B in investigation). It would have fixed the reported →∞ case without cascading, but left SuperDoc rendering <mi>sin</mi> vs Word <mi>s</mi><mi>i</mi><mi>n</mi> for bare function names — and downstream rendering issues (operator spacing inside the run) would keep surfacing. Chose full parity.
Review: the collapseFunctionNameBases pass is the subtlest part — it walks the post-split tree and re-merges <mi> siblings in structural bases. Confirm the BASE_BEARING_ELEMENTS set covers every shape Word nests under m:fName that we've actually seen. Also worth checking that a same-variant requirement is correct (currently only merges when all siblings share the same mathvariant).
Verified: 172/172 unit tests, 79/79 behavior tests, type-check clean. Pre-existing 3 m:groupChr tests updated — their munder.querySelector('mo') descendant-search now picks up split operators inside the base expression, so they were re-scoped to :scope > mo (the group character, which is what they were actually testing).

…2632) Word's OMML2MML.XSL classifies each character in an m:r run individually — digits group into a single <mn> (with one optional decimal point between digits), operator characters each become their own <mo>, and everything else becomes its own <mi>. SuperDoc was emitting the entire run text as one element, so runs like "→∞" or "x+1" rendered as a single <mi>, losing operator spacing and semantics. tokenizeMathText implements the per-character classification. convertMathRun returns a single Element for one-atom runs and a DocumentFragment when multiple atoms are emitted, so siblings flow directly into the parent's <mrow> without an extra wrapper. m:fName is the documented exception — Word keeps multi-letter function names like "sin" or "lim" as one <mi> inside the function-name slot. convertFunction routes m:r children through convertMathRunWhole (no splitting), and a new collapseFunctionNameBases pass re-merges the base slot of any structural MathML element (munder/mover/msub/…) that Word nested inside m:fName — without this, "lim" inside m:limLow would incorrectly split to three <mi>. Also drops U+221E (∞) from OPERATOR_CHARS — it's a mathematical constant per Word's XSL, not an operator. MathObjectConverter's return type widens from Element|null to Node|null so convertMathRun can return a DocumentFragment. All other converters already return Element, which is assignable to Node — no other changes. Verified against real Word-native fixtures: `→∞` in the limit-tests fixture case 1 now renders as <mi>n</mi><mo>→</mo><mi>∞</mi> (matches Word OMML2MML.XSL byte-for-byte), and nested limits keep their function names intact. Ref ECMA-376 §22.1.2.116, Annex L.6.1.13, §22.1.2.58.

linear · 2026-04-20T20:56:59Z

SD-2632 Math: multi-char operator strings render as `<mi>` instead of splitting into `<mo>` / `<mi>`

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ceec55257

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

codecov-commenter · 2026-04-20T21:03:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…pts (SD-2632) Follow-up to /review feedback backed by fresh Word OMML2MML.XSL evidence: - `convertMathRunAsFunctionName` (renamed from `convertMathRunWhole` since "whole" no longer fits) now groups consecutive non-digit / non-operator characters into one <mi> while still splitting digits and operators. Word's XSL for `<m:fName><m:r>log_2</m:r></m:fName>` produces `<mi>log</mi><mo>_</mo><mn>2</mn>` — not `<mi>l</mi><mi>o</mi><mi>g</mi>…`. - `BASE_BEARING_ELEMENTS` gains `mmultiscripts` — Word emits it when an `m:sPre` sits inside `m:fName`; our base-collapse pass needs to know to merge the first-child <mi> run. - CONTRIBUTING.md now documents the widened `Node | null` return type. Tests added: - Direct `tokenizeMathText` edge cases: `.5` / `5.` / `1.2.3` / `2x+1` / consecutive operators / empty / standalone ∞. - m:fName mixed-content: `log_2` stays `<mi>log</mi><mo>_</mo><mn>2</mn>`. - Base collapse inside nested `m:sSub` under `m:fName`. - Base collapse inside nested `m:sPre` (mmultiscripts) under `m:fName`. - Behavior test tightened to pin the full 3-atom sequence for `n→∞`. Disputed during review and deferred with evidence: - Opus claim that standalone `m:limLow` with "lim" base regresses to italic: Word XSL itself splits "lim" per-char in that shape (with or without m:sty=p), so our output matches Word. - Codex claim that Arabic-Indic digits should be `<mn>`: Word XSL also classifies them as `<mi>`, so our behavior matches. - Non-BMP surrogate-pair support: edge case in extreme mathematical alphanumerics; Word XSL itself errored on U+1D465. Separate ticket worth.

Addresses Codex bot review on PR #2875. Astral-plane mathematical alphanumerics (e.g. U+1D465 mathematical italic x, U+1D7D9 mathematical double-struck 1) are UTF-16 surrogate pairs. Walking text by code unit split them into two half-pair <mi> atoms with invalid content. `codePointUnitLength` returns 2 when the current position starts a surrogate pair so tokenizeMathText and tokenizeFunctionNameText step across the full code point.

…s-render-as-mi-instead-of

superdoc-bot · 2026-04-21T10:17:11Z

🎉 This PR is included in vscode-ext v2.3.0-next.31

superdoc-bot · 2026-04-21T10:17:19Z

🎉 This PR is included in @superdoc-dev/react v1.2.0-next.28

The release is available on GitHub release

superdoc-bot · 2026-04-21T10:19:05Z

🎉 This PR is included in superdoc-cli v0.7.0-next.32

The release is available on GitHub release

superdoc-bot · 2026-04-21T10:19:37Z

🎉 This PR is included in superdoc v1.26.0-next.31

The release is available on GitHub release

superdoc-bot · 2026-04-21T10:20:59Z

🎉 This PR is included in superdoc-sdk v1.6.0-next.30

superdoc-bot Bot added review: thorough review: careful and removed review: thorough labels Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread packages/layout-engine/painters/dom/src/features/math/converters/math-run.ts Outdated

caio-pizzol and others added 3 commits April 20, 2026 18:18

Merge branch 'main' into caio/sd-2632-math-multi-char-operator-string…

54462b1

…s-render-as-mi-instead-of

caio-pizzol enabled auto-merge April 20, 2026 21:54

caio-pizzol added this pull request to the merge queue Apr 20, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 20, 2026