Skip to content

fix(walker): surface parser warnings for malformed storage XML (#144)#162

Merged
pchuri merged 2 commits intomainfrom
fix/walker-parser-warnings-144
May 3, 2026
Merged

fix(walker): surface parser warnings for malformed storage XML (#144)#162
pchuri merged 2 commits intomainfrom
fix/walker-parser-warnings-144

Conversation

@pchuri
Copy link
Copy Markdown
Owner

@pchuri pchuri commented May 1, 2026

Closes #144

Summary

  • htmlparser2 in xmlMode silently auto-closes unbalanced tags. The walker had no hook to expose this, so a page like <p>not closed <strong>bold round-tripped without any signal that the parser had repaired it.
  • Switched walk() from parseDocument to Parser+DomHandler so we can intercept onclosetag(name, isImplied) and accumulate each implicit close on walker.warnings.
  • Self-closing XML tags (<br/>, <ri:attachment/>) also arrive with isImplied=true. We distinguish them by tracking each open event's (startIndex, endIndex) — a self-closing tag's open and close share the same range, while a genuinely auto-closed tag's close lands at a later position.
  • MacroConverter.storageToMarkdown gains an optional onWarnings(warnings) callback. Existing string return shape is unchanged; the callback only fires when warnings are non-empty.
  • CONFLUENCE_CLI_VERBOSE=1 also writes a one-line warning to stderr per implicit close.

What's not in this PR

  • The "page X processed with parser warnings" line in the export CLI summary — that's a separate change touching bin/confluence.js / exportPage flow control, and it deserves its own review. This PR just makes the behavior observable; surfacing it in the CLI is a follow-up.

Test plan

  • npx jest — 597 tests pass (586 existing + 11 new in tests/storage-walker-warnings.test.js)
  • npx eslint lib/storage-walker.js lib/macro-converter.js tests/storage-walker-warnings.test.js clean
  • Existing storage-walker-parity corpus unchanged (no output drift)
  • New cases cover: well-formed input, self-closing XML tags (no false positives), unclosed inline, crossed nesting, unbalanced macro body, warnings reset between walks, verbose env-var path, MacroConverter callback integration

…rage XML

Closes #144

htmlparser2 in xmlMode silently auto-closes unbalanced tags, so a
malformed page like `<p>not closed <strong>bold` round-trips without
any signal that content was repaired. Switch walk() from parseDocument
to Parser+DomHandler so we can intercept onclosetag(name, isImplied)
and accumulate every implicit close on walker.warnings.

Self-closing XML tags (`<br/>`, `<ri:attachment/>`) also arrive with
isImplied=true, so we distinguish them by tracking each open event's
index range: a self-closing tag's open and close events share the
same (startIndex, endIndex), while a genuinely auto-closed tag's
close lands at a later position.

MacroConverter.storageToMarkdown gains an optional onWarnings(warnings)
callback so callers can surface diagnostics without changing the
existing string return shape. CONFLUENCE_CLI_VERBOSE=1 also writes a
one-line warning to stderr per implicit close.

Tests cover well-formed input (no warnings), self-closing tags
(no false positives), unclosed inline tags, crossed nesting,
unbalanced macro bodies, the verbose env-var path, and the
MacroConverter callback integration.
@pchuri pchuri self-assigned this May 3, 2026
- Drop redundant withStartIndices/withEndIndices: false on DomHandler
  — both already default to undefined.
- Forward onopentag/onclosetag with (...args) so future htmlparser2
  argument additions reach the underlying handler unchanged.
- Add regression coverage for two parser edge cases that the new
  openStack logic must tolerate: empty input and an orphan </p> close
  tag with no preceding open. Both produce zero warnings and must not
  pop undefined off the stack.
@pchuri pchuri merged commit 995cdaa into main May 3, 2026
6 checks passed
@pchuri pchuri deleted the fix/walker-parser-warnings-144 branch May 3, 2026 02:11
github-actions Bot pushed a commit that referenced this pull request May 3, 2026
## [2.1.12](v2.1.11...v2.1.12) (2026-05-03)

### Bug Fixes

* **walker:** surface parser warnings for malformed storage XML ([#144](#144)) ([#162](#162)) ([995cdaa](995cdaa))
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

🎉 This PR is included in version 2.1.12 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Walker silently truncates malformed storage XML with no diagnostic

1 participant