fix(walker): preserve /// on storage ↔ markdown round-trip by pchuri · Pull Request #167 · pchuri/confluence-cli

pchuri · 2026-05-03T09:18:55Z

Summary

Picks option 2 from #155 (selective tag whitelist) for , , , . These four inline tags now survive storage ↔ markdown ↔ storage end-to-end. Closes #155.

What changed

lib/storage-walker.js — adds the four tags to the existing <details>/<summary> raw-HTML pass-through case, so storage→markdown emits e.g. H2O instead of dropping the wrapper to H2O.
lib/macro-converter.js — wraps markdownToStorage / markdownToNativeStorage with a stash/restore that hides whitelisted tags from MarkdownIt (which is constructed with html: false) and re-injects them after rendering. Code fences and inline code are stashed first, so a literal  inside a backtick block stays escaped as text rather than being passed through.
tests/macro-converter.test.js — 11 new tests covering walker output for each tag, full markdown→storage→markdown round-trip, attribute preservation, the <script> escape (proves the whitelist boundary holds), and the inline-code / fenced-code carve-outs.

Why this approach over the alternatives in the issue

Option 1 (html: true on MarkdownIt) — too broad. It would let arbitrary user HTML reach Confluence storage XHTML and rely on Confluence's sanitizer as the safety boundary. The four tags above are the only documented information loss; opening the whole HTML grammar to address it is disproportionate.
Option 3 (status quo) — leaves real information loss for chemistry/math docs (/) and any / use, including content created via the Confluence editor.
Option 2 (this PR) — fixes the four tags the issue identifies, keeps the safety boundary surgical, and re-uses the STASH_DELIM pattern already in setupConfluenceMarkdownExtensions. No new dependencies.

<details>/<summary> were mentioned parenthetically in the issue but deferred — they're block-level and need different placeholder shaping to avoid -wrap interactions. Out of scope here.

Verification

npx jest          # 612 passing (was 601; +11 new)
npx eslint lib/macro-converter.js lib/storage-walker.js tests/macro-converter.test.js   # clean

Reproduction from the issue, after the fix:

const c = new MacroConverter({ isCloud: true });
c.storageToMarkdown('<p>H<sub>2</sub>O</p>');                  // 'H<sub>2</sub>O'
c.markdownToStorage(c.storageToMarkdown('<p>H<sub>2</sub>O</p>'));  // '<p>H<sub>2</sub>O</p>\n'
c.markdownToStorage('<script>alert(1)</script>');              // '<p>&lt;script&gt;alert(1)&lt;/script&gt;</p>\n'
c.markdownToStorage('`<u>x</u>`');                             // '<p><code>&lt;u&gt;x&lt;/u&gt;</code></p>\n'

Test plan

New unit tests for each of the four tags (storage→markdown direction)
Round-trip tests for  and 
Whitelist boundary test: <script> is still escaped
Code carve-out tests:  inside inline code and inside fenced code is preserved as literal text
markdownToNativeStorage shares the same passthrough policy
Full npx jest suite (612 passing, no regressions)

…und-trip Markdown has no native syntax for these inline tags, so the walker dropped them on storage→markdown and MarkdownIt (html: false) escaped them back to literal <...> on markdown→storage. Both directions now round-trip: - storage-walker: emit raw HTML for the four tags, mirroring <details>/<summary>. - macro-converter: stash whitelisted tags around MarkdownIt's render so they survive the html: false escape, while code fences and inline code still treat them as literal text. Other HTML (e.g. <script>) remains escaped. Closes #155

Self-review follow-up for #167: - Remove `String(markdown)` wrap in `_renderMarkdownToHtml`. The original pre-PR code passed `markdown` straight to `markdown-it.render`, which throws on non-string input. Coercing turns `undefined` / `null` / `123` into `'undefined'` etc. — silent corruption that is worse than the prior fail-fast behavior. - Add round-trip parity tests for `` and ``; the original commit only exercised `` and ``. - Add an explicit test that the walker strips attributes on whitelisted tags (mirrors the existing `<details>` / `<summary>` precedent and is asymmetric with the markdown→storage path that preserves them).

Self-review #2 caught a data-loss regression introduced by the original PR: a 4-space-indented `x` ended up emitted as `<![CDATA[]]>` — the entire code body silently dropped. Root cause: `CODE_BLOCK_RE` only matched fenced (```/~~~) and inline backtick code, so the passthrough stash replaced `` and `` inside indented code with placeholders. MarkdownIt then rendered them as code- body text, but the post-render restore re-injected raw `` into the `<pre><code>` body. htmlparser2 re-parsed that as a real tag, and convertCodeBlock — which only collects direct text children — discarded everything inside the `` wrapper. Fix: detect code regions via MarkdownIt's tokenizer (`code_block` and `fence` tokens, plus a regex for `code_inline` since MarkdownIt does not expose source positions for that one). The tokenizer correctly distinguishes indented code from list-item continuations that happen to align to four spaces, which a regex-only approach can't do. - New `_findCodeRanges` returns merged char ranges of all code regions. - `_renderMarkdownToHtml` slices around those ranges and only stashes HTML in non-code prose. - Drops the now-unused `CODE_BLOCK_RE` constant; inline-code detection moved into `INLINE_CODE_RE`. - Adds two regression tests: indented code preservation, and list-item continuation NOT being treated as code.

Two regressions reported in external review of #167: 1. `0">x` collapsed to `` — the body part of `PASSTHROUGH_TAG_RE` was `[^>]*`, so the first `>` inside a quoted attribute value terminated the match prematurely. The resulting half-tag was stashed, the rest of the input went to markdown-it as plain text, and the final HTML had nothing left to render. 2. Markdown autolinks like `<u@example.com>` and `<sub:foo>` were stashed as if they were `` / `` tags. The `\b` word boundary after the tag name happily matched `u` followed by `@` or `sub` followed by `:`, so linkify never got a chance to convert them and the output ended up with bogus `<u@example.com></u@example.com>` pairs in place of the expected mailto link. Fix the regex on two axes: - Replace `\b` with `(?=[\s/>])` — only HTML tag delimiters (whitespace, `/`, `>`) end the tag name, so autolink shapes are left for markdown-it. This also tightens the harmless-but-loose match of custom-element-shaped names like `<u-foo>`. - Replace `[^>]*` with `(?:"[^"]*"|'[^']*'|[^>])*` so a `>` inside a quoted attribute value doesn't close the tag. Tests cover: quoted `>` round-trip, single-quoted attribute (normalized on output), email and URI autolinks, and hyphenated custom-element names being escaped rather than passed through.

## [2.3.1](v2.3.0...v2.3.1) (2026-05-04) ### Bug Fixes * **walker:** preserve /// on storage ↔ markdown round-trip ([#167](#167)) ([a6857db](a6857db)), closes [#155](#155) [#2](#2)

github-actions · 2026-05-04T00:31:50Z

🎉 This PR is included in version 2.3.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

pchuri added 4 commits May 3, 2026 18:18

pchuri merged commit a6857db into main May 4, 2026
6 checks passed

github-actions Bot added the released label May 4, 2026

pchuri mentioned this pull request May 6, 2026

Add markdown-it plugin for Confluence storage format conversion #172

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(walker): preserve <u>/<sub>/<sup>/<mark> on storage ↔ markdown round-trip#167

fix(walker): preserve <u>/<sub>/<sup>/<mark> on storage ↔ markdown round-trip#167
pchuri merged 4 commits into
mainfrom
fix/preserve-u-sub-sup-mark-roundtrip

pchuri commented May 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pchuri commented May 3, 2026

Summary

What changed

Why this approach over the alternatives in the issue

Verification

Test plan

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant