Skip to content

Feature Request: Add JSON Output Option #2029

@nayar-900

Description

@nayar-900

Summary

Currently, MarkItDown converts supported documents into Markdown format only. It would be useful to provide an optional JSON output format that preserves document structure, such as headings, paragraphs, tables, images, and metadata.

Motivation

Many developers use MarkItDown as part of automated processing pipelines and LLM workflows. Structured JSON output would allow easier integration with downstream applications without requiring additional parsing of Markdown.

Proposed Solution

Add a command-line option such as:

markitdown document.pdf --output-format json

Example output:

{
"title": "Sample Document",
"sections": [
{
"heading": "Introduction",
"content": "..."
}
]
}
Benefits
Easier integration with AI pipelines
Structured document representation
Better support for data extraction workflows
Reduced need for custom Markdown parsing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions