fix(walker): surface parser warnings for malformed storage XML (#144)#162
Merged
fix(walker): surface parser warnings for malformed storage XML (#144)#162
Conversation
…rage XML Closes #144 htmlparser2 in xmlMode silently auto-closes unbalanced tags, so a malformed page like `<p>not closed <strong>bold` round-trips without any signal that content was repaired. Switch walk() from parseDocument to Parser+DomHandler so we can intercept onclosetag(name, isImplied) and accumulate every implicit close on walker.warnings. Self-closing XML tags (`<br/>`, `<ri:attachment/>`) also arrive with isImplied=true, so we distinguish them by tracking each open event's index range: a self-closing tag's open and close events share the same (startIndex, endIndex), while a genuinely auto-closed tag's close lands at a later position. MacroConverter.storageToMarkdown gains an optional onWarnings(warnings) callback so callers can surface diagnostics without changing the existing string return shape. CONFLUENCE_CLI_VERBOSE=1 also writes a one-line warning to stderr per implicit close. Tests cover well-formed input (no warnings), self-closing tags (no false positives), unclosed inline tags, crossed nesting, unbalanced macro bodies, the verbose env-var path, and the MacroConverter callback integration.
- Drop redundant withStartIndices/withEndIndices: false on DomHandler — both already default to undefined. - Forward onopentag/onclosetag with (...args) so future htmlparser2 argument additions reach the underlying handler unchanged. - Add regression coverage for two parser edge cases that the new openStack logic must tolerate: empty input and an orphan </p> close tag with no preceding open. Both produce zero warnings and must not pop undefined off the stack.
|
🎉 This PR is included in version 2.1.12 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #144
Summary
htmlparser2inxmlModesilently auto-closes unbalanced tags. The walker had no hook to expose this, so a page like<p>not closed <strong>boldround-tripped without any signal that the parser had repaired it.walk()fromparseDocumenttoParser+DomHandlerso we can interceptonclosetag(name, isImplied)and accumulate each implicit close onwalker.warnings.<br/>,<ri:attachment/>) also arrive withisImplied=true. We distinguish them by tracking each open event's(startIndex, endIndex)— a self-closing tag's open and close share the same range, while a genuinely auto-closed tag's close lands at a later position.MacroConverter.storageToMarkdowngains an optionalonWarnings(warnings)callback. Existing string return shape is unchanged; the callback only fires when warnings are non-empty.CONFLUENCE_CLI_VERBOSE=1also writes a one-line warning to stderr per implicit close.What's not in this PR
bin/confluence.js/exportPageflow control, and it deserves its own review. This PR just makes the behavior observable; surfacing it in the CLI is a follow-up.Test plan
npx jest— 597 tests pass (586 existing + 11 new intests/storage-walker-warnings.test.js)npx eslint lib/storage-walker.js lib/macro-converter.js tests/storage-walker-warnings.test.jscleanstorage-walker-paritycorpus unchanged (no output drift)