Skip to content

[BUG]: CsvConverter produces broken Markdown tables when cell values contain pipe characters (|) #2019

@Prasad-Pitke

Description

@Prasad-Pitke

Description

The CsvConverter builds Markdown table rows by joining raw cell values with | but applies no escaping. If any cell contains a literal | character, it gets treated as a column separator — silently corrupting the table structure.

Steps to Reproduce

Save this as test.csv:

name,formula
OR gate,A | B
AND gate,A & B

Then run:

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("test.csv")
print(result.text_content)

Actual Output (Broken)

| name | formula |
| --- | --- |
| OR gate | A | B |
| AND gate | A & B |

The row OR gate | A | B now has 4 pipe-delimited segments instead of 2. Every Markdown renderer misparsed this as extra columns.

Expected Output

| name | formula |
| --- | --- |
| OR gate | A \| B |
| AND gate | A & B |

Root Cause

In _csv_converter.py, rows are built with no sanitisation:

markdown_table.append("| " + " | ".join(row) + " |")

The fix is to escape |\| inside every cell value before joining:

def _escape_cell(value: str) -> str:
    value = value.replace("|", r"\|")
    return value

markdown_table.append("| " + " | ".join(_escape_cell(c) for c in row) + " |")

Why This Matters

markitdown's primary purpose is feeding clean structured text to LLMs. A silently corrupted table is worse than an error — the LLM receives structurally wrong data with no warning.

Pipes appear naturally in real-world CSV data: command-line examples, math formulas, regex patterns, SQL queries, and so on.

Environment

  • markitdown latest main branch
  • Python 3.10+
  • Reproducible with any CSV containing | in a data cell

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions