Description
The CsvConverter builds Markdown table rows by joining raw cell values with | but applies no escaping. If any cell contains a literal | character, it gets treated as a column separator — silently corrupting the table structure.
Steps to Reproduce
Save this as test.csv:
name,formula
OR gate,A | B
AND gate,A & B
Then run:
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("test.csv")
print(result.text_content)
Actual Output (Broken)
| name | formula |
| --- | --- |
| OR gate | A | B |
| AND gate | A & B |
The row OR gate | A | B now has 4 pipe-delimited segments instead of 2. Every Markdown renderer misparsed this as extra columns.
Expected Output
| name | formula |
| --- | --- |
| OR gate | A \| B |
| AND gate | A & B |
Root Cause
In _csv_converter.py, rows are built with no sanitisation:
markdown_table.append("| " + " | ".join(row) + " |")
The fix is to escape | → \| inside every cell value before joining:
def _escape_cell(value: str) -> str:
value = value.replace("|", r"\|")
return value
markdown_table.append("| " + " | ".join(_escape_cell(c) for c in row) + " |")
Why This Matters
markitdown's primary purpose is feeding clean structured text to LLMs. A silently corrupted table is worse than an error — the LLM receives structurally wrong data with no warning.
Pipes appear naturally in real-world CSV data: command-line examples, math formulas, regex patterns, SQL queries, and so on.
Environment
- markitdown latest
main branch
- Python 3.10+
- Reproducible with any CSV containing
| in a data cell
Description
The
CsvConverterbuilds Markdown table rows by joining raw cell values with|but applies no escaping. If any cell contains a literal|character, it gets treated as a column separator — silently corrupting the table structure.Steps to Reproduce
Save this as
test.csv:Then run:
Actual Output (Broken)
The row
OR gate | A | Bnow has 4 pipe-delimited segments instead of 2. Every Markdown renderer misparsed this as extra columns.Expected Output
Root Cause
In
_csv_converter.py, rows are built with no sanitisation:The fix is to escape
|→\|inside every cell value before joining:Why This Matters
markitdown's primary purpose is feeding clean structured text to LLMs. A silently corrupted table is worse than an error — the LLM receives structurally wrong data with no warning.
Pipes appear naturally in real-world CSV data: command-line examples, math formulas, regex patterns, SQL queries, and so on.
Environment
mainbranch|in a data cell