Skip to content

feketerj/MarkItDown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MarkItDown — Universal Markdown Converter

A premium web application that converts any file to clean Markdown. Powered by Microsoft MarkItDown.

Python FastAPI License


What It Does

Drop any file into the web UI and get clean, structured Markdown back in seconds. The conversion engine is Microsoft's MarkItDown library — the most popular file-to-Markdown converter on GitHub (119k+ stars).

Supported Formats

Category Formats
Documents PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx/.xls), Outlook (.msg), EPUB
Web HTML, HTM
Data CSV, JSON, XML
Images JPEG, PNG, GIF, BMP, TIFF, WebP (EXIF metadata + OCR)
Archives ZIP (recursive content extraction)
Audio WAV, MP3 (requires speech transcription API key)
Text TXT, Markdown, reStructuredText

How It Works

Architecture

Browser (HTML/CSS/JS)
    │
    ├── Drag-and-drop file upload
    │   ↓
    ├── POST /api/convert (multipart form)
    │   ↓
FastAPI Server (server.py)
    │
    ├── Receives file → writes to temp file
    ├── Passes temp file path to MarkItDown engine
    ├── MarkItDown detects format by extension
    ├── Delegates to format-specific converter:
    │   ├── PDF  → pdfminer.six / pdfplumber
    │   ├── DOCX → mammoth
    │   ├── PPTX → python-pptx
    │   ├── XLSX → openpyxl / pandas
    │   ├── HTML → markdownify (html→md)
    │   ├── Images → EXIF extraction + optional OCR
    │   ├── Audio → SpeechRecognition (optional)
    │   └── Text formats → direct read
    ├── Returns structured Markdown text
    ├── Cleans up temp file
    │   ↓
    └── JSON response → browser
        │
        ├── Raw Markdown displayed (left pane)
        ├── markdown-it renders HTML preview (right pane)
        ├── Copy to clipboard / Download as .md
        └── Saved to localStorage history

Data Flow

  1. Upload — User drags a file or clicks to browse. The file is sent as a multipart/form-data POST to /api/convert.

  2. Temp File — FastAPI receives the upload, writes it to a temporary file with the original extension preserved (MarkItDown needs a file path to detect format).

  3. ConversionMarkItDown.convert(path) is called. Internally, MarkItDown uses magika for content-type detection and routes to the appropriate format handler. Each handler extracts text and structure, normalizing it to Markdown with headings, tables, lists, and links preserved.

  4. Response — The server returns a JSON payload with the Markdown text, filename, file size, character count, and conversion time. The temp file is deleted in a finally block.

  5. Preview — The browser receives the Markdown and renders it two ways simultaneously:

    • Raw panetextContent assignment (safe, no XSS)
    • Preview panemarkdown-it parses and renders to HTML
  6. Export — User can copy to clipboard (Clipboard API with execCommand fallback) or download as .md (Blob URL).

  7. History — Each conversion is saved to localStorage (last 20 entries) with full Markdown content for instant recall.

Key Technical Details

  • No database — All state is client-side (localStorage). Server is stateless.
  • Temp file cleanup — Guaranteed via try/finally block. Files are never persisted.
  • 50MB limit — Enforced both client-side (pre-upload check) and server-side (post-read check).
  • Local-only by default — Launchers bind to 127.0.0.1 and use a fixed app port.
  • Hot reload opt-in — Set APP_RELOAD=1 before running python server.py.

Quick Start

Prerequisites

  • Python 3.10+ (tested on 3.12)
  • pip

Install

git clone https://github.com/YOUR_USER/MarkItDown.git
cd MarkItDown
pip install -r requirements.txt

Run

start.bat

Open http://127.0.0.1:8000 in your browser. Use stop.bat to close the background server.

Usage

  1. Drag any supported file onto the drop zone (or click to browse)
  2. View results in split pane — raw Markdown (left) and rendered preview (right)
  3. Copy to clipboard or Download as .md
  4. History — access past conversions from the sidebar

Project Structure

MarkItDown/
├── server.py              # FastAPI backend — file upload, MarkItDown conversion, static serving
├── requirements.txt       # Python dependencies
└── static/
    ├── index.html         # Single-page application — drag/drop, results, history sidebar
    ├── css/
    │   └── style.css      # Design system — dark mode, glassmorphism, responsive
    └── js/
        └── app.js         # Client logic — upload, preview, clipboard, download, history

API Reference

GET /api/formats

Returns the list of supported input formats.

Response:

{
  "formats": [
    { "ext": ".pdf", "label": "PDF", "icon": "📄", "category": "Documents" },
    ...
  ]
}

POST /api/convert

Converts an uploaded file to Markdown.

Request: multipart/form-data with a file field.

Response:

{
  "success": true,
  "filename": "report.pdf",
  "extension": ".pdf",
  "file_size": 245760,
  "markdown": "# Report Title\n\nContent here...",
  "markdown_length": 4521,
  "conversion_time": 1.23
}

Error Response (413):

{ "detail": "File too large. Maximum size is 50MB" }

Tech Stack

Layer Technology Purpose
Backend FastAPI Async web framework
Server Uvicorn ASGI server
Engine MarkItDown File → Markdown conversion
Frontend Vanilla HTML/CSS/JS Zero-dependency UI
Preview markdown-it Markdown → HTML rendering
Fonts Inter + JetBrains Mono Typography

License

MIT

About

Universal Markdown Converter -- convert any file to Markdown. Powered by Microsoft MarkItDown.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors