Skip to content

hmsoft0815/mlcmarkitdown

Repository files navigation

MLC MarkItDown MCP Server

A robust, high-performance MCP server for converting various document formats to Markdown, featuring smart artifact integration and real-time progress reporting.

MLC MarkItDown Server

A Go-based server wrapper for Microsoft's markitdown library.

Setup

For detailed instructions on how to set up the Python environment (with or without Docker), please refer to the Setup Guide (SETUP.md).

Quick Start with Docker

The easiest way to run the server is using Docker, as it includes all system dependencies and the correct Python environment:

# Build the image
docker build -t mlc-markitdown .

# Run the container
docker run -d \
  -p 9591:9591 \
  -e ARTIFACT_GRPC_ADDR=host.docker.internal:9590 \
  mlc-markitdown

Architecture

The following diagram illustrates how the MLC MarkItDown server integrates with the Python environment and the optional Artifact Server.

graph TD
    Client["MCP Client / Proxy"] -- "1. Convert Request" --> GOS["mlc-markitdown (Go Server)"]
    GOS -- "2. Execute" --> SHIM["Python Shim"]
    SHIM -- "3. Convert" --> MID["MarkItDown Library"]
    MID -- "4. Markdown Content" --> SHIM
    SHIM -- "5. Return Result" --> GOS
    
    subgraph Integration ["Secondary Integration"]
        GOS -- "6. Create Artifact (gRPC)" --> ART["mlcartifact Server"]
        ART -- "7. Artifact Metadata" --> GOS
    end
    
    GOS -- "8. Final Response + URI" --> Client
Loading

Features

  • Document Conversion: Uses Microsoft's markitdown library to convert PDF, Word, Excel, PowerPoint, HTML, CSV, Images, and Audio.
  • Smart Storage: Automatically detects large outputs (default > 10,000 characters) and saves them as artifacts instead of flooding the LLM context.
  • Structured Results: Returns both a human-readable summary and a structured JSON metadata block for every created artifact.
  • Artifact Chaining: Can convert documents already stored in the artifact-server via artifactId.
  • Progress Tracking: Emits real-time progress notifications during long conversion processes.

Artifact Integration Logic

This server acts as a producer for the mlcartifact service.

sequenceDiagram
    participant LLM
    participant MarkItDownGo
    participant PythonShim
    participant ArtifactServer

    LLM->>MarkItDownGo: markitdown__convert(uri="big.pdf")
    MarkItDownGo->>PythonShim: execute markitdown
    PythonShim-->>MarkItDownGo: markdown_content (e.g. 200KB)
    Note over MarkItDownGo: 200KB > Threshold (10KB)
    MarkItDownGo->>ArtifactServer: Write(big.md, content)
    ArtifactServer-->>MarkItDownGo: artifact_id: 12345
    MarkItDownGo-->>LLM: [Preview + Link] AND [JSON Metadata]
Loading

Tools

markitdown__convert__mlc

Converts a file path or URL to Markdown.

  • Arguments:
    • uri (string, required): Path to local file or remote URL.
    • force_artifact (bool, optional): If true, always saves to artifact storage regardless of size.

markitdown__convert_artifact__mlc

Converts a document that is already stored in the artifact store.

  • Arguments:
    • artifactId (string, required): ID of the source artifact.
    • output_filename (string, optional): Desired name for the resulting MD artifact.

markitdown__quick_inspect__mlc

Quickly retrieves metadata about a document without performing full conversion.

  • Arguments:
    • uri (string, required): Path to file.

Response Strategy

1. Human-Readable Notice

When a file is saved as an artifact, the LLM MUST NOT provide a download link. Instead, it should include a clear notice:

"The complete file is available in the artifact server under id = {id}"

2. Structured Metadata

The tool result will contain an additional TextContent item with a JSON object:

{
  "artifact": {
    "id": "12345",
    "filename": "document.md",
    "mime_type": "text/markdown",
    "size_bytes": 10240,
    "source": "mlc-markitdown",
    "expires_at": "2026-03-12T05:25:28Z"
  }
}

Configuration

The server requires access to a Python environment with the markitdown package installed:

pip install markitdown

Transport Support

Supports stdio, sse, and streamable HTTP transport modes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2026 Michael Lechner

About

mcp server to convert varius documnets to md format, with integration for the artifact mcp server

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors