v0.13.2 — HTML5 normalizer (seven rules) and substitution architecture
Added
- HTML5 normalizer for
htmlToOdt()—htmlToOdt(),markdownToOdt(), and the underlyingparseHtml()now run input through a Tier 1 normalizer before parsing. The normalizer applies seven spec-grounded text transformations: empties<script>and<style>content; lowercases the doctype declaration; quotes unquoted boolean attributes (e.g.<input checked>→<input checked="">); quotes unquoted attribute values (e.g.<a href=foo>→<a href="foo">); self-closes 14 HTML5 void elements; decodes ~2,120 HTML5 named entities to Unicode; and escapes lone&in attribute values (e.g.href="?a=1&b=2"→href="?a=1&b=2"). Good HTML5 input that previously produced silent empty output now converts correctly. Default behaviour for editor-generated polyglot HTML is unchanged — the normalizer is idempotent on already-polyglot input. - Substitution architecture —
htmlToOdt()andmarkdownToOdt()accept newnormalizerandparseroptions. Passfalseto skip normalization (when input is known polyglot/XHTML), or pass a custom function to substitute either stage.tiptapToOdt()does not expose these hooks because TipTap input is a JSON tree, not an HTML string. SeeADAPTERS.mdat the repo root for the architecture, naming conventions, and a worked parse5 adapter example. odfKitNormalizer— the default normalizer, exported from the root and from the newodf-kit/html-normalizersub-export. The seven individual rules are also exported:selfCloseVoidElements,decodeNamedEntities,emptyRawTextElements,lowercaseDoctype,quoteUnquotedBooleanAttributes,quoteUnquotedAttributeValues,escapeAttributeValueAmpersands.odfKitParser— the default parser (aParser-conforming wrapper around the existingparseXml), exported from the root.- Public types —
ParsedHtmlTree,Parser,Normalizerexported from the root for adapter authors.NormalizerOptionandParserOptionare available via theodf-kit/typespath for symmetric architectural use. OdtBaseOptions— base interface with shared fields (page format, orientation, margins, metadata, image resolution).HtmlToOdtOptionsextends it and addsnormalizerandparser.TiptapToOdtOptionsextendsOdtBaseOptionsdirectly. No user-facing API change for code using the existing options.odf-kit/html-normalizersub-export added.ADAPTERS.mdat the repo root — documents the substitution architecture: philosophy, six naming conventions, the two-direction adapter principle, skip semantics, contract specifications, versioning promise, sibling-package design, and a complete worked parse5 adapter example.- 183 new tests (1307 total, 28 test suites).
Changed
parseXmlnow fails loudly on malformed input — five tightenings: detects unclosed elements at end-of-input; rejects malformed attribute syntax that the normalizer didn't cover; rejects unescaped&in attribute values not followed by a valid XML entity or numeric reference; rejects]]>in text content outside CDATA sections; rejects mismatched closing tags. Previous behaviour was silent wrong output. Code that worked in v0.13.1 continues to work in v0.13.2 — the new errors surface latent bugs in inputs that were producing incorrect output. Treat the new errors as bugs in your input that v0.13.2 makes visible.package.json—lexicalentry intypesVersionshad a missing leading./; corrected.
Migration
No code changes required for typical users. Default behaviour is preserved for editor-generated and polyglot input, and existing options work unchanged. Three situations may warrant attention:
- If your
htmlToOdt()calls were silently producing empty or wrong output on hand-written HTML5, v0.13.2 fixes that automatically — the seven-rule normalizer handles void elements, named entities, boolean attributes, unquoted attribute values, ampersands in URLs, and other HTML5-vs-XHTML differences before parsing. - If your input was triggering one of the malformed-input cases above, you'll now see an explicit error. This is the intended behavior — the previous silent corruption is the bug being fixed.
- If you were calling
tiptapToOdt()withnormalizerorparserin the options object, those properties were always ignored at runtime and are now compile-time errors. Remove them.