fix: preserve line breaks when converting HTML to markdown by hodlen · Pull Request #79 · pchuri/confluence-cli

hodlen · 2026-03-16T16:26:01Z

Pull Request Template

Description

Fixes content being dropped and block elements running together when converting Confluence storage format to markdown (read --format markdown).

Root causes:

 regex was missing the s (dotAll) flag — paragraph content with embedded newlines was silently dropped
Block elements (paragraphs, code blocks, lists, tables) used no surrounding newlines, causing adjacent blocks to concatenate without separation

Fix: each block element now emits \n…\n, so adjacent blocks naturally produce a blank line between them.

Type of Change

Bug fix (non-breaking change which fixes an issue)

Testing

Tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

Copilot

Pull request overview

This PR fixes Markdown output formatting issues when converting Confluence storage-format HTML to Markdown, specifically preserving multi-line paragraph content and ensuring block-level elements don’t concatenate without blank lines.

Changes:

Wrap Confluence code-macro conversions in surrounding newlines so adjacent blocks naturally separate.
Update  conversion to use the dotAll regex flag to preserve paragraph content containing embedded newlines, and emit surrounding newlines.
Add unit tests covering block separation (code/mermaid/lists/tables) and multi-line paragraph preservation for both storageToMarkdown() and htmlToMarkdown().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/confluence-client.test.js	Adds regression tests for multi-line paragraphs and blank-line separation between block elements.
lib/confluence-client.js	Adjusts code-macro and paragraph conversions to preserve line breaks and introduce blank-line separation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hodlen · 2026-03-18T13:40:24Z

lib/confluence-client.js

    // Convert Confluence code macros to markdown
    markdown = markdown.replace(/<ac:structured-macro ac:name="code"[^>]*>[\s\S]*?<ac:parameter ac:name="language">([^<]*)<\/ac:parameter>[\s\S]*?<ac:plain-text-body><!\[CDATA\[([\s\S]*?)\]\]><\/ac:plain-text-body>[\s\S]*?<\/ac:structured-macro>/g, (_, lang, code) => {
-      return `\`\`\`${lang}\n${code}\n\`\`\``;
+      return `\n\`\`\`${lang}\n${code}\n\`\`\`\n`;
    });

    // Convert code macros without language parameter
    markdown = markdown.replace(/<ac:structured-macro ac:name="code"[^>]*>[\s\S]*?<ac:plain-text-body><!\[CDATA\[([\s\S]*?)\]\]><\/ac:plain-text-body>[\s\S]*?<\/ac:structured-macro>/g, (_, code) => {
-      return `\`\`\`\n${code}\n\`\`\``;
+      return `\n\`\`\`\n${code}\n\`\`\`\n`;
    });


I think this is out of the change's scope.

pchuri

Thanks for the PR! The dotAll flag fix on the  regex is a great catch — silently dropping multi-line paragraph content was a subtle but impactful bug. The test coverage is thorough too, with both per-element and complex integration cases.

A few observations:

1. Inconsistent block separation for lists and tables

Code blocks and  now emit \n…\n, but <ul>, <ol>, and <table> still use only a leading \n (e.g. '\n' + listItems). This works today because the preceding  contributes its trailing \n, but if two block elements appear back-to-back without a  in between (e.g. a list immediately followed by a table), there won't be a blank line separating them. Applying the same \n…\n pattern to all block elements would make the output more robust and the code more consistent.

2. Code block content can be mutated by `htmlToMarkdown()`

(Also flagged by Copilot) storageToMarkdown() converts code macros into fenced Markdown blocks before htmlToMarkdown() runs its catch-all HTML tag stripping (/<(?!\/?(details|summary)\b)[^>]+>/g). This means any <div>, , etc. inside code examples will be silently removed. Not necessarily in scope for this PR, but worth a follow-up — e.g. replacing fenced blocks with placeholder tokens before the HTML strip pass and restoring them afterward.

3. Minor: leading `\n` on first ``

Adding a leading \n to every  means the very first element produces an extra newline at the start of the output. The final markdown.trim() handles this, so there's no user-visible issue — just something to be aware of in the intermediate state.

Overall this is a solid fix. Once the list/table block separation consistency (item 1) is addressed (or confirmed acceptable), this looks good to merge.

hodlen · 2026-03-18T15:04:56Z

Thanks for the thorough review!

Per observations 1, the implicit trailing \n from each list item means '\n' + listItems already produces '\n…\n' — same shape as paragraphs and tables. Adjacent blocks (e.g. a list followed by a table) naturally combine to \n\n in between without a  intermediary. We can add a test to verify this behavior and avoid regression if necessary.

Observations 2 and 3 are both valid, but they're beyond the scope of this fix and would warrant a more substantial restructuring of the conversion pipeline. Happy to track them as separate issues if that's useful.

pchuri

Good point on the list items — the implicit trailing newline does give us the same shape, so no change needed there. And agreed on 2 & 3 being separate concerns. LGTM!

## [1.27.4](v1.27.3...v1.27.4) (2026-03-19) ### Bug Fixes * preserve line breaks when converting HTML to markdown ([#79](#79)) ([c39f388](c39f388))

github-actions · 2026-03-19T00:52:17Z

🎉 This PR is included in version 1.27.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

fix: preserve line breaks when converting HTML to markdown

f1db140

pchuri requested a review from Copilot March 17, 2026 01:50

Copilot started reviewing on behalf of pchuri March 17, 2026 01:50 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

pchuri reviewed Mar 18, 2026

View reviewed changes

pchuri approved these changes Mar 19, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

pchuri merged commit c39f388 into pchuri:main Mar 19, 2026
10 checks passed

github-actions bot pushed a commit that referenced this pull request Mar 19, 2026

chore(release): 1.27.4 [skip ci]

9bd6896

## [1.27.4](v1.27.3...v1.27.4) (2026-03-19) ### Bug Fixes * preserve line breaks when converting HTML to markdown ([#79](#79)) ([c39f388](c39f388))

github-actions bot added the released label Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve line breaks when converting HTML to markdown#79

fix: preserve line breaks when converting HTML to markdown#79
pchuri merged 1 commit intopchuri:mainfrom
hodlen:fix/md-line-break

hodlen commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

hodlen Mar 18, 2026

Uh oh!

pchuri left a comment

Uh oh!

hodlen commented Mar 18, 2026

Uh oh!

pchuri left a comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hodlen commented Mar 16, 2026

Pull Request Template

Description

Type of Change

Testing

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

hodlen Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

pchuri left a comment

Choose a reason for hiding this comment

1. Inconsistent block separation for lists and tables

2. Code block content can be mutated by htmlToMarkdown()

3. Minor: leading \n on first <p>

Uh oh!

hodlen commented Mar 18, 2026

Uh oh!

pchuri left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2. Code block content can be mutated by `htmlToMarkdown()`

3. Minor: leading `\n` on first `<p>`