Describe the bug
When converting a page to markdown with confluence read <pageId> -f markdown, internal page links that have a custom display text (via <ac:link-body>) are silently removed instead of being preserved as text. This causes table cells (and other elements) that contain only such links to appear empty in the output.
The storageToMarkdown method handles two <ac:link> patterns:
- External URL links with
<ac:plain-text-link-body> → converted to [text](url)
- Internal page links without a link body → converted to
[Page Title]
But internal page links with <ac:link-body> (the format Confluence uses when the author gives a link custom display text) aren't matched by either pattern, so they fall through to the catch-all removal on line 1428:
markdown = markdown.replace(/<ac:link>[\s\S]*?<\/ac:link>/g, '');
Additionally, that catch-all only matches <ac:link> with no attributes. Links that carry attributes like ac:anchor, ac:local-id, or ac:card-appearance (e.g. <ac:link ac:anchor="...">) are not caught by this regex either, so they survive as raw HTML in the output.
To Reproduce
Any Confluence page with a table where cells contain internal links with custom display text. For example, storage format like:
<ac:link>
<ri:page ri:content-title="Some Long Page Title" ri:version-at-save="28" />
<ac:link-body>Short Name</ac:link-body>
</ac:link>
confluence read <pageId> -f markdown
The cell containing that link will be empty in the markdown table.
Expected behavior
The link's display text should be preserved. The above example should produce Short Name in the markdown output.
Environment (please complete the following information):
- confluence-cli version: 1.30.0
- Node.js version: v22.20.0
- OS: macOS
Suggested fix
Add a regex before the catch-all that extracts the display text from <ac:link-body>, and widen the catch-all to match <ac:link> tags with attributes:
// Convert internal page links with custom link body text
markdown = markdown.replace(
/<ac:link[^>]*>[\s\S]*?<ac:link-body>([\s\S]*?)<\/ac:link-body>[\s\S]*?<\/ac:link>/g,
'$1'
);
// Remove any remaining ac:link tags that weren't matched
markdown = markdown.replace(/<ac:link[^>]*>[\s\S]*?<\/ac:link>/g, '');
Describe the bug
When converting a page to markdown with
confluence read <pageId> -f markdown, internal page links that have a custom display text (via<ac:link-body>) are silently removed instead of being preserved as text. This causes table cells (and other elements) that contain only such links to appear empty in the output.The
storageToMarkdownmethod handles two<ac:link>patterns:<ac:plain-text-link-body>→ converted to[text](url)[Page Title]But internal page links with
<ac:link-body>(the format Confluence uses when the author gives a link custom display text) aren't matched by either pattern, so they fall through to the catch-all removal on line 1428:Additionally, that catch-all only matches
<ac:link>with no attributes. Links that carry attributes likeac:anchor,ac:local-id, orac:card-appearance(e.g.<ac:link ac:anchor="...">) are not caught by this regex either, so they survive as raw HTML in the output.To Reproduce
Any Confluence page with a table where cells contain internal links with custom display text. For example, storage format like:
The cell containing that link will be empty in the markdown table.
Expected behavior
The link's display text should be preserved. The above example should produce
Short Namein the markdown output.Environment (please complete the following information):
Suggested fix
Add a regex before the catch-all that extracts the display text from
<ac:link-body>, and widen the catch-all to match<ac:link>tags with attributes: