Skip to content

html-to-markdown.js: same fenced-code indent corruption as walker (#139) #146

@pchuri

Description

@pchuri

Background

#139 fixes StorageWalker.cleanup() so it no longer strips leading whitespace and collapses inline multi-space inside fenced code blocks. The same regex pipeline still exists on the legacy path:

  • lib/html-to-markdown.js:146markdown.replace(/^[ \t]+(?!([\>]|[*+-] |\d+[.)] ))/gm, '')`
  • and the surrounding cleanup chain (trailing-whitespace strip, blank-line collapse, multi-space collapse)

This path is still reachable: confluence-client.js:1203 exposes htmlToMarkdown(html) as a public method on the client.

Reproduction

const { htmlToMarkdown } = require('./lib/html-to-markdown');
const html = '<pre><code class="language-python">def foo():\n    return 1</code></pre>';
console.log(htmlToMarkdown(html));

Expected: indented body preserved.
Actual: leading 4 spaces stripped (same shape of corruption as #139).

Why now

#139 already proved out the fix shape (split on triple-backtick fences, apply cleanup only to non-fenced segments). Porting the same approach over to html-to-markdown.js is a small, contained follow-up so the legacy path doesn't silently corrupt code that round-trips through confluenceClient.htmlToMarkdown().

Source

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions