New Feature PR: Streaming live parsing via gRPC for agents #2110

krickert · 2026-06-11T10:07:57Z

krickert
Jun 11, 2026

TL;DR

Sick of waiting for PDF pages to comeback as markdown?

Well that's fixed in this PR #2109

Want to setup mTLS? You got it....

Want to use this in any of 12 popular languages ... we got you...

Here's the PR to bring in the gRPC server to markitdown:

#2109

Give it a try! there's samples. Now you can start the embeddings before the full document is done. Speed up time and money for RAG.

gRPC server opens up "free" mTLS, HTTP2 streaming
PPTX and PDF page-by-page parsing
Strongly typed responses (which means you can ditch your markdown parser)
12 client languages for markitdown

I've finished a gRPC interface for MarkItDown and opened a PR. The full implementation is working, tested, and you can find some examples in the PR.

The problem and details

MarkItDown today is batch-oriented. A caller submits a document, waits for the entire conversion to finish, and receives the complete result. For large PDFs and long presentations this means several seconds of dead time before downstream systems can do anything with the output. Services that index, summarize, or render progressively are blocked on the slowest page of the document.

What this adds

A gRPC service with three RPCs on top of the existing conversion engine:

Convert returns the full Markdown in one response, equivalent to the current API but with the generated clients for up to 12 languages.
ConvertStream returns the Markdown as an ordered stream of chunks.
ConvertDocumentStream returns typed structural elements (headings, tables with parsed cells, lists with parsed items, code blocks, images, and so on) so consumers can process document structure without re-parsing Markdown.

PDFs are processed page by page and PPTX slide by slide. On a 120-page PDF, time-to-first-chunk drops from about 2.8 seconds to about 0.08 seconds, with byte-identical output. This can greatly reduce processing time for large pipelines for large PPT and PDFs

sequenceDiagram
    participant Client
    participant Server as gRPC Server
    participant Conv as PDF/PPTX Converter

    Client->>Server: ConvertDocumentStream (incremental=true)
    Server-->>Client: started
    loop each page or slide
        Server->>Conv: convert one page/slide
        Conv-->>Server: markdown fragment
        Server-->>Client: typed elements for that page
    end
    Server-->>Client: completed

Design constraints followed

I tried to be minimally invasive on the code level - so to stream I just took the PPT and PDF converters and exposed the loop and kept the same API structure. Converters are now stream-capable.

The DocumentConverter contract and the plugin API are unchanged. The streaming logic lives in a separate experimental package (markitdown.streaming) that reuses the existing converters' extraction code behind its own controller. The one core edit is a behavior-preserving refactor in the PPTX converter that extracts per-slide conversion into its own method, which has been verified as byte-identical against the existing test vectors.

Everything experimental is opt-in and clearly marked. Defaults are byte-stable with current behavior. gRPC itself is an optional extra (pip install markitdown[grpc]), so the core package gains no new dependencies.

Operationally the server ships with the standard pieces: health checking and reflection (works with Kubernetes probes and grpcurl out of the box), proper gRPC status code mapping instead of leaked tracebacks, a non-localhost bind warning matching the MCP server, and a 100 MiB default message limit so real documents work without tuning.

Test coverage is 376 passing via hatch test, including byte-exact parity assertions between incremental and whole-document output for PPTX and table-bearing PDFs.

Why start with PDF and PPTX

Both formats have natural structural boundaries (pages, slides) and existing converter logic that could be reused without rewriting anything. DOCX, HTML, and XLSX don't have an equally clean incremental boundary in the current implementation, so they fall back to whole-document conversion transparently through the same API (I can help fix this). If the approach is well received, the streaming layer can grow format by format without forcing changes to the converter architecture.

Try it!

This can greatly reduce your pipeline times to process since it yields page-by-page without the overhead of a heavy processor. $$ savings on your azure bill.
Any concerns with the experimental markitdown.streaming package living inside the core package?
Is there interest in extending incremental conversion to other formats if this lands?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Feature PR: Streaming live parsing via gRPC for agents #2110

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

New Feature PR: Streaming live parsing via gRPC for agents #2110

Uh oh!

Uh oh!

krickert Jun 11, 2026

TL;DR

The problem and details

What this adds

Design constraints followed

Why start with PDF and PPTX

Try it!

Replies: 0 comments

krickert
Jun 11, 2026