Skip to content

fix: escape pipe characters in CSV table cells#2066

Open
zekiyemeral wants to merge 2 commits into
microsoft:mainfrom
zekiyemeral:fix/csv-pipe-escaping
Open

fix: escape pipe characters in CSV table cells#2066
zekiyemeral wants to merge 2 commits into
microsoft:mainfrom
zekiyemeral:fix/csv-pipe-escaping

Conversation

@zekiyemeral
Copy link
Copy Markdown

Summary

This pull request fixes a Markdown table formatting issue in CsvConverter.

When a CSV cell contains a literal pipe character (|), the generated Markdown table can be parsed incorrectly because the pipe is treated as a column separator. This change escapes pipe characters inside CSV cell values before joining them into Markdown table rows.

Changes

  • Added a helper function to escape literal | characters in Markdown table cells.
  • Applied escaping to CSV header cells.
  • Applied escaping to CSV data cells.
  • Improved CSV to Markdown table output for values containing pipe characters.

Why This Matters

CSV files may contain pipe characters in formulas, command examples, regular expressions, SQL snippets, and other real-world data. Without escaping, the generated Markdown output can become structurally incorrect.

Closes #2019

Copilot AI review requested due to automatic review settings June 3, 2026 15:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Add Markdown safety when converting CSV content to Markdown tables by escaping pipe characters.

Changes:

  • Introduced a helper to escape | characters in table cells.
  • Applied escaping to header and data rows during Markdown table generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/markitdown/src/markitdown/converters/_csv_converter.py
Comment on lines 92 to +93
return DocumentConverterResult(markdown=result)

No newline at end of file
@zekiyemeral
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@trippinganymess
Copy link
Copy Markdown

The solution looks good from pipes, but I think we should also considers, rogue newlines and carriage returns which can easily be introduced into the files breaking the table structure.

check my PR for the same issue #2061 and lmk what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: CsvConverter produces broken Markdown tables when cell values contain pipe characters (|)

3 participants